# Data Overview from opendata.dk

The available parking spaces and their occupancy are counted in Copenhagen and the data sets are available on [Opendata](opendata.dk). There are two datasets available: 
- **Parking spaces**: The data set shows legal parking spaces during the day (7am-6pm) at street level (on public and private shared roads), parking spaces in publicly owned parking facilities as well as parking spaces without a parking system. Parking options for electric cars, shared cars, taxis and disabled drivers appear. Also includes parking spaces reserved for embassies and consulates.
- **Parking counts**: Parking counts on roads/road sections are conducted twice in March and October at 12:00, 17:00 and 22:00 in selected areas. Parking occupancy rates are calculated based on the parking counts.

The data sets will be analyzed and preprocessed in the following cells.

## Preamble

In [166]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re


In [None]:
# Set pandas display options to show all columns for .head command
pd.set_option("display.max_columns", None)  # Show all columns
pd.set_option("display.width", None)        # Auto-detect the display width
pd.set_option("display.max_colwidth", None) # Show full content of each column

In [6]:
# Loading the data and files
data_path = os.path.abspath(os.path.join(os.pardir, "projectData"))

# Load csv data into pandas dataframe
parking_spaces_name = "parking_spaces.csv"
parking_spaces_path = os.path.join(data_path, parking_spaces_name)
df = pd.read_csv(parking_spaces_path)

## Cleaning the data

### Data overview

The dataset contains 26 columns and 28762 rows.

| Original Column name              | Renamed Column name   | Description   |
| ------                            | ------                | ------        |
|**FID**                            | NaN                   | Identifier for Parking Space Object strictly used for this dataset|
|**vejkode**                        | NaN                   | Numeric Identifier for a given street |
|**vejnavn**                        | NaN                   | Street name|
|**antal_pladser**                  | NaN                   | Number of parking spaces registered under Parking Space Object |
|**restriktion**                    | NaN                   | Restriction boolean indicator |
|**vejstatus**                      | NaN                   | Street type, e.g. public road or private area |
|**vejside**                        | NaN                   | Side of the road, i.e. even or odd numbered house numbers |
|**bydel**                          | NaN                   | District, e.g. Nørrebro, Valby, etc. |
|**p_ordning**                      | NaN                   | Parking Space Object type, e.g. Electric car parking w. charger, Handicap, Blue zone, etc. |
|**p_type**                         | NaN                   | Parking Space Object physical type, e.g. Marked parking, 45deg marked parking, |
|**p_statu**                        | NaN                   | Status of whether the parking spot is "created" or "temporarily out of service" |
|**rettelsedato**                   | NaN                   | Date of correction |
|**oprettelsesdato**                | NaN                   | Date of creation |
|**x**                              | NaN                   | Individual Text description for Parking Space Object type, e.g. license plate for private handicap parking spot |
|**id**                             | NaN                   | Unique Identifer for each Parking Space Object |
|**taelle_id**                      | NaN                   | Identifier for internal counting/tally procedures
|**startdato_midlertidigt_nedlagt** | NaN                   | Start date for registered Out of Service (only NaN in dataset) |
|**slutdato_midlertidigt_nedlagt**  | NaN                   | End date for registered Out of Service (only NaN and a single entry in dataset) |
|**restriktionstype**               | NaN                   | Restriction type name |
|**restriktionstekst**              | NaN                   | Restriction description, e.g. "8-18" if parking not allowed in given hours |
|**taelle_note**                    | NaN                   | Text notes for internal counting/tally procedures |
|**delebilsklub**                   | NaN                   | Car sharing company |
|**aendring_p_ordning**             | NaN                   | Indicator if parking space has been converted to Electric parking w.o. charger |
|**uuid**                           | NaN                   | Unique Identifer for each Parking Space Object |
|**ogc_fid**                        | NaN                   | Numeric Identifier for Parking Space Object strictly used for this dataset|
|**wkb_geometry**                   | NaN                   | Location of Parking Space Object, in the form of a MultiLineString with altitude/longitude coordinates |

In [34]:
for col in df.columns:
    l = list(df[col].unique())
    l_len = len(l)
    if l_len > 10:
        l = l[:10]
    print(col, "\n", l, "\nLen:\n", l_len, "\n")

FID 
 ['p_pladser.1', 'p_pladser.2', 'p_pladser.3', 'p_pladser.4', 'p_pladser.5', 'p_pladser.6', 'p_pladser.7', 'p_pladser.8', 'p_pladser.9', 'p_pladser.10'] 
Len:
 28762 

vejkode 
 [np.int64(4), np.int64(8), np.int64(72), np.int64(12), np.int64(16), np.int64(20), np.int64(24), np.int64(28), np.int64(32), np.int64(36)] 
Len:
 2008 

vejnavn 
 ['Abel Cathrines Gade', 'Abildgaardsgade', 'Ahrenkildes Allé', 'Abildhøj', 'Abrikosvej', 'Absalonsgade', 'Adelgade', 'Admiralgade', 'Adriansvej', 'Agerbo'] 
Len:
 2008 

antal_pladser 
 [np.float64(10.0), np.float64(4.0), np.float64(1.0), np.float64(2.0), np.float64(3.0), np.float64(11.0), np.float64(13.0), np.float64(8.0), np.float64(5.0), np.float64(7.0)] 
Len:
 71 

restriktion 
 ['nej', 'ja', nan] 
Len:
 3 

vejstatus 
 ['Offentlig vej', 'Privat fællesvej', 'Privat vej', nan, 'Privat fællesvej §10 stk3', 'Privat fællessti'] 
Len:
 6 

vejside 
 ['Lige husnr.', 'Ulige husnr.', nan, 'Midt i gaden', 'P-område/areal'] 
Len:
 5 

bydel 
 ['Vesterb

In [None]:
# Drop redundant columns
df_dropped = df.drop(
    columns=[
        "FID", # Identifier for Parking Space Object strictly used for this dataset
        #"vejkode", # Numeric Identifier for a given street 
        #"vejnavn", # Street name
        #"antal_pladser", # Number of parking spaces registered under Parking Space Object
        #"restriktion", # Restriction boolean indicator
        "vejstatus", # Street type, e.g. public road or private area
        "vejside", # Side of the road, i.e. even or odd numbered house numbers
        #"bydel", # District, e.g. Nørrebro, Valby, etc. 
        #"p_ordning", # Parking Space Object type, e.g. Electric car parking w. charger, Handicap, Blue zone, etc.
        "p_type", # Parking Space Object physical type, e.g. Marked parking, 45deg marked parking, 
        #"p_status", # Status of whether the parking spot is "created" or "temporarily out of service" 
        #"rettelsedato", # Date of correction
        #"oprettelsesdato", # Date of creation
        "x", # Individual Text description for Parking Space Object type, e.g. license plate for private handicap parking spot 
        "id", # Unique Identifer for each Parking Space Object
        "taelle_id", # Identifier for internal counting/tally procedures
        "startdato_midlertidigt_nedlagt", # Start date for registered Out of Service (only NaN in dataset)
        "slutdato_midlertidigt_nedlagt", # End date for registered Out of Service (only NaN and a single entry in dataset)
        #"restriktionstype", # Restriction type name
        #"restriktionstekst", # Restriction description, e.g. "8-18" if parking not allowed in given hours
        #"taelle_note", # Text notes for internal counting/tally procedures
        "delebilsklub", # Car sharing company
        #"aendring_p_ordning", # Indicator if parking space has been converted to Electric parking w.o. charger
        "uuid", # Unique Identifer for each Parking Space Object
        "ogc_fid", # Numeric Identifier for Parking Space Object strictly used for this dataset
        #"wkb_geometry", # Location of Parking Space Object, in the form of a MultiLineString with altitude/longitude coordinates 
    ], errors='ignore')

In [154]:
# Drop any rows which do not contain quantifiable informations
df_dropped = df_dropped.dropna(subset=[
    "antal_pladser",
    "bydel",
    "wkb_geometry"
])

In [155]:
# Drop duplicates based on location coordinates
subset = df_dropped[
    df_dropped.duplicated(subset=['wkb_geometry'])
]

df_dropped = pd.concat([df_dropped, subset, subset]).drop_duplicates(keep=False)

In [164]:
df_dropped["wkb_geometry"][0]

'MULTILINESTRING ((12.558951218164538 55.67151987967718, 12.559712354787063 55.67130454392531))'

In [3]:
# Loading the data set
cleaned_data_path = os.path.join(data_path, "parking_spaces.csv")
df_spaces = pd.read_csv(cleaned_data_path)
df_spaces.columns = df_spaces.columns.str.strip()

# Drop redundant columns and assign the result back to df
df = df_spaces.drop(columns=["FID", "vejside", "bemaerkning", "taelle_id", "startdato_midlertidigt_nedlagt",
                      "slutdato_midlertidigt_nedlagt", "restriktionstype", "restriktionstekst", "taelle_note",
                      "delebilsklub", "aendring_p_ordning", "x", "uuid", "ogc_fid"],  errors='ignore')

# Replace NaN with empty string and -1

# Rename/translate the columns to English to be more readable
df_spaces.rename(columns={
    "vejkode": "street_code", 
    "vejnavn": "street_name", 
    "antal_pladser": "num_spaces", 
    "restriktion": "restriction", 
    "vejstatus": "street_status", 
    "bydel": "district", 
    "p_ordning": "parking_order", 
    "p_type": "parking_type", 
    "p_status": "parking_status", 
    "rettelsedato": "correction_date", 
    "oprettelsesdato": "creation_date", 
    "id": "id", 
    "wkb_geometry": "wkb_geometry"
}, inplace=True)

for col in df_spaces.columns:
    unique_vals = df_spaces[col].unique()
    print(f"Column '{col}' has {len(unique_vals)} unique entries:")
    print(unique_vals)
    print("-" * 40)

NameError: name 'data_path' is not defined

In [None]:
# Save the cleaned data to a new CSV file
df_spaces.to_csv(os.path.join(data_path, "cleaned_parking_spaces.csv"), index=False)

df_spaces.head()

In [None]:
# Ensure the 'num_spaces' column is numeric; convert if needed:
df_spaces['num_spaces'] = pd.to_numeric(df_spaces['num_spaces'], errors='coerce')

# ---------------------------
# Graph 1: Count of Records by District
# ---------------------------
district_counts = df_spaces['district'].value_counts()

plt.figure(figsize=(10, 6))
district_counts.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title("Count of Records by District")
plt.xlabel("District")
plt.ylabel("Number of Records")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

# ---------------------------
# Graph 2: Total Number of Parking Spaces by District
# ---------------------------
spaces_by_district = df_spaces.groupby('district')['num_spaces'].sum()
# Sorting the values (optional) to see the lower-to-higher sum distribution.
spaces_by_district = spaces_by_district.sort_values()

plt.figure(figsize=(10, 6))
spaces_by_district.plot(kind='bar', color='lightgreen', edgecolor='black')
plt.title("Total Number of Parking Spaces by District")
plt.xlabel("District")
plt.ylabel("Total Parking Spaces")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()

# ---------------------------
# Graph 3: Count of Records by Parking Type
# ---------------------------
# (Assuming the column is named 'parking_type'; adjust if it is different.)
parking_type_counts = df_spaces['parking_type'].value_counts()

plt.figure(figsize=(10, 6))
parking_type_counts.plot(kind='bar', color='salmon', edgecolor='black')
plt.title("Count of Records by Parking Type")
plt.xlabel("Parking Type")
plt.ylabel("Number of Records")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()


In [None]:
# Convert 'creation_date' to datetime and extract the year.
df_spaces['correction_date'] = pd.to_datetime(df_spaces['correction_date'], errors='coerce')
df_spaces['correction_year'] = df_spaces['correction_date'].dt.year

# Group by the creation year and sum the total number of parking spaces.
spaces_by_year = df_spaces.groupby('correction_year')['num_spaces'].sum().sort_index()

# Plot a bar chart to visualize the total parking spaces per year.
plt.figure(figsize=(10, 6))
spaces_by_year.plot(kind='bar', color='lightcoral', edgecolor='black')
plt.title("Total Number of Parking Spaces per Year")
plt.xlabel("Year")
plt.ylabel("Total Parking Spaces")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()