# Data Story Report

## 1. Motivation

- **What are the datasets**  
We are using the following datasets:
  - CPH Districts (`bydel.geojson` [Open Data DK - Bydele](https://www.opendata.dk/city-of-copenhagen/bydele))

  - Car ownership (`KKBIL1: Privatbiler efter distrikt og drivmiddel` [Købanhavns Kommunes Statistikbank](https://kk.statistikbank.dk))

  - Parking Spaces (`Parkeringspladser` [Open Data DK - Parkeringspladser](https://www.opendata.dk/city-of-copenhagen/parkeringspladser))

  - Parking Counts (`Parkeringstællinger` [Open Data DK - Parkeringstællinger](https://www.opendata.dk/city-of-copenhagen/parkeringstaellinger))

- **Why did you choose this/these particular dataset(s)?**  
Parking is a big stress point for a lot of motorists, especially in larger cities. This have been and should be a concern for legislators and city planners and we wanted to see if there has been positive or negative changes over the years. The world is changing and so are the cities. Maybe parking opportunities are shrinking, to try and motivate people to use more greener transportation. Or perhaps there is more parking, but it has been obfuscated leaving car owners frustrated. To figure out how parking has changed and if there really are fewer parking opportunities in Copenhagen, we will look at a broad view of the city and explore if and why parking has changed over the last decades.  

- **What was your goal for the end user's experience?**  
We want any site visitor to easily get a sense of the overall goal of the website upon immediately entering the site. It should be clear that the project is exploring parking in Copenhagen and how it has evolved. The website is not a place to *find* an actual parking spot, but it should enlighten people as to *why* they cannot find parking in the city. The tabs should allow for quick access to specific topics and allows the user to quickly find their specific interest on the topic. 

---

## 2. Basic Stats – Understanding the Dataset

### Data Cleaning and Preprocessing

We did individual preprocessing and data cleaning for both datasets. These can be seen in the Data_overview notebook.
In brief, we:
- Removed NaN rows
- Renamed column headers to English terms (e.g. "vejkode" -> "street_code")
- Aligned row string values that were typos or could not be matched despite concerning the same (e.g. "Kgs. Enghave" -> "Vesterbro-Kongens Enghave")
- Removed columns we do not use for our calculations

For the Car Ownership, we also include the start year and end year specifically since it is not included in the data by default. 

#### Overview of datasets

**GeoJSON CPH Districts**

Dataset: GeoJSON of CPH
- File Size: 244.71 KB

GeoJSON feature properties:
- id
- bydel_nr
- navn
- areal_m2
- ogc_fid

**Car Ownership**

Dataset: Car-ownership in CPH
- Size of csv: 261.61 KB
- No cols: 6
- No rows: 4846

Column names:
- FuelType      : <class 'str'>
- CityDistrict  : <class 'str'>
- LocalDistrict : <class 'str'>
- Neighbourhood : <class 'str'>
- Year          : <class 'numpy.uint16'>
- NoOfVehicles  : <class 'numpy.int64'>

**Parking Spaces**

Dataset: Parking Spaces in CPH
- Size of csv: 57.80 MB
- No cols: 13
- No rows: 587286

Column names:
- FID                         : <class 'str'>
- district                    : <class 'str'>
- street_code                 : <class 'numpy.int64'>
- street_name                 : <class 'str'>
- no_of_spaces                : <class 'numpy.int64'>
- parking_type                : <class 'str'>
- changed_to_electric_parking : <class 'numpy.bool'>
- restriction                 : <class 'numpy.bool'>
- restriction_type            : <class 'str'>
- restriction_text            : <class 'str'>
- year_creation               : <class 'str'>
- year_correction             : <class 'str'>
- year_active                 : <class 'str'>

In CPH, there are registered 124265 parking spaces.
This includes all restricted parking spaces, betalingszoner, etc.
Note, that these are only public street parking spaces, i.e. no parking buildings.

**GeoJSON Parking Spaces**

Dataset: GeoJSON with Parking Spaces
- Size of csv: 19.07 MB

**GeoJSON CSV Parking Spaces**

Dataset: DataFrame with Parking Spaces
- Size of csv: 4.54 MB
- No cols: 8
- No rows: 28717

Column names:
- FID             : <class 'str'>
- district        : <class 'str'>
- street_code     : <class 'numpy.int64'>
- street_name     : <class 'str'>
- no_of_spaces    : <class 'numpy.int64'>
- year_creation   : <class 'str'>
- year_correction : <class 'str'>
- coordinates     : <class 'list'>

**Parking Counts**

Dataset: Parking Counts in CPH
- Size of csv: 16.46 MB
- No cols: 14
- No rows: 37910

Column names:
- street_name         : <class 'str'>
- year                : <class 'numpy.int64'>
- month               : <class 'numpy.int64'>
- legal_p_at_12       : <class 'numpy.int64'>
- legal_p_at_17       : <class 'numpy.int64'>
- legal_p_at_22       : <class 'numpy.int64'>
- parked_cars_at_12   : <class 'numpy.int64'>
- parked_cars_at_17   : <class 'numpy.int64'>
- parked_cars_at_22   : <class 'numpy.int64'>
- occupancy_at_12_pct : <class 'numpy.float64'>
- occupancy_at_17_pct : <class 'numpy.float64'>
- occupancy_at_22_pct : <class 'numpy.float64'>
- wkb_geometry        : <class 'str'>
- district            : <class 'str'>

### Exploratory Data Analysis

- _[Insert key stats, bullet points, or findings here]_
- _[Include or describe key visualizations such as histograms, box plots, correlations, etc.]_

---

## 3. Data Analysis

- _[Describe the analytical methods used: descriptive, inferential, or predictive]_
- _[Summarize insights discovered]_

---

## 4. Genre

- **Which genre of data story did you use?**  
Slideshow. We find the Slideshow type to be great for interactivity as it allows the user to quickly go to the specific parts of the analysis that they want to dive into. By separating our webpage into tabs, we can serve the information in smaller and easier to digest portions. It also serves the additional advantage of allowing us to separate our "story" into "chapters" which allows us to clearly show different angles of the problem without confusing the user(s).

- **Visual Narrative Tools (Segal & Heer - Figure 7):**  
  - Consistent Visual Platform
  - Zooming
  - Feature Distinction

- **Narrative Structure Tools (Segal & Heer - Figure 7):**  
  - Linear
  - User Directed Path (User can skip slides)
  - Hover Highlighting, Details
  - Filtering, Selection
  - Captions / Headlines
  - Annotations
  - Accompanying Article (citations)
  - Introductory Text


---

## 5. Visualizations

- _[Describe the charts, graphs, maps...]_
- **Why are they appropriate?**  


---

## 6. Discussion – Critical Reflection

- **What went well?**  
The project went well in many aspects. Our group work especially was organised and the work was well-divided and timely finished. We had weekly meetings to discuss the next steps and that helped the process tremendously. Most of our visualisations were inspired or based on the work we had done in previous notebooks which made made it easier to quickly make looks quite nice. Overall, we are happy with how the webpage and notebooks ended up looking.


- **What is still missing / could be improved?**  
We have been in communications with the people managing both datasets, and from these dialogues we have gathered that perhaps the data is not as complete as it seems and therefore some of our hypotheses or conclusions might not be perfect. It would be nice if we had time to compare our finding to either other datasets or some other national overview. We would also have loved to further improve the overall style of the webpage but chose to focus more so on the visualisations of the data, but it would have been cool to better align the style of the visualisations and the webpage. 

---

## 7. Contributions

| Team Member        | Primary Responsibilities                            |
|--------------------|------------------------------------------------------|
| [Aleksandar (s194066)]          | Car ownership dataset, Parking spaces dataset, Folium interactive map |
| [Paula (s242926)]               | Parking count dataset, Parking spaces dataset, Bokeh interactive plot |
| [Victor (s204475)]              | Website creation and Setup, time series, Video Filming and Editing, Narrative direction, Research and sources |

> Note: Each member took the lead on different components while maintaining collaborative input across all parts of the project.
