# Data overview for Final Project - Parking Spaces in CPH

The available parking spaces and their occupancy are counted in Copenhagen and the data sets are available on [Opendata](opendata.dk). There are two datasets available: 
- **Parking spaces**: The data set shows legal parking spaces during the day (7am-6pm) at street level (on public and private shared roads), parking spaces in publicly owned parking facilities as well as parking spaces without a parking system. Parking options for electric cars, shared cars, taxis and disabled drivers appear. Also includes parking spaces reserved for embassies and consulates.

The data sets will be analyzed and preprocessed in the following cells.

## Preamble

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re


In [2]:
# Set pandas display options to show all columns for .head command
pd.set_option("display.max_columns", None)  # Show all columns
pd.set_option("display.width", None)        # Auto-detect the display width
pd.set_option("display.max_colwidth", None) # Show full content of each column

In [3]:
# Loading the data and files
data_path = os.path.abspath(os.path.join(os.pardir, "projectData"))

# Load csv data into pandas dataframe
parking_spaces_name = "parking_spaces.csv"
parking_spaces_path = os.path.join(data_path, parking_spaces_name)
df = pd.read_csv(parking_spaces_path)

## Cleaning the data

### Data overview

The original dataset from Opendata contains 26 columns and 28762 rows.

| Original Column name              | Renamed Column name   | Description   |
| ------                            | ------                | ------        |
|**FID**                            | NaN                   | Identifier for Parking Space Object strictly used for this dataset|
|**vejkode**                        | street_code           | Numeric Identifier for a given street |
|**vejnavn**                        | street_name           | Street name|
|**antal_pladser**                  | no_of_spaces          | Number of parking spaces registered under Parking Space Object |
|**restriktion**                    | restriction           | Restriction boolean indicator |
|**vejstatus**                      | NaN                   | Street type, e.g. public road or private area |
|**vejside**                        | NaN                   | Side of the road, i.e. even or odd numbered house numbers |
|**bydel**                          | district              | District, e.g. Nørrebro, Valby, etc. |
|**p_ordning**                      | parking_type          | Parking Space Object type, e.g. Electric car parking w. charger, Handicap, Blue zone, etc. |
|**p_type**                         | NaN                   | Parking Space Object physical type, e.g. Marked parking, 45deg marked parking, |
|**p_status**                       | NaN                   | Status of whether the parking spot is "created" or "temporarily out of service" |
|**rettelsedato**                   | date_correction       | Date of correction |
|**oprettelsesdato**                | date_creation         | Date of creation |
|**x**                              | NaN                   | Individual Text description for Parking Space Object type, e.g. license plate for private handicap parking spot |
|**id**                             | NaN                   | Unique Identifer for each Parking Space Object |
|**taelle_id**                      | NaN                   | Identifier for internal counting/tally procedures
|**startdato_midlertidigt_nedlagt** | NaN                   | Start date for registered Out of Service (only NaN in dataset) |
|**slutdato_midlertidigt_nedlagt**  | NaN                   | End date for registered Out of Service (only NaN and a single entry in dataset) |
|**restriktionstype**               | restriction_type      | Restriction type name |
|**restriktionstekst**              | restriction_text      | Restriction description, e.g. "8-18" if parking not allowed in given hours |
|**taelle_note**                    | NaN                   | Text notes for internal counting/tally procedures |
|**delebilsklub**                   | NaN                   | Car sharing company |
|**aendring_p_ordning**             | changed_parking_type  | Indicator if parking space has been converted to Electric parking w.o. charger |
|**uuid**                           | NaN                   | Unique Identifer for each Parking Space Object |
|**ogc_fid**                        | NaN                   | Numeric Identifier for Parking Space Object strictly used for this dataset|
|**wkb_geometry**                   | NaN                   | Location of Parking Space Object, in the form of a MultiLineString with altitude/longitude coordinates |

In [4]:
# Quick overview of some value examples
for col in df.columns:
    l = list(df[col].unique())
    l_len = len(l)
    if l_len > 10:
        l = l[:10]
    print(col, "\n", l, "\nLen:\n", l_len, "\n")

FID 
 ['p_pladser.1', 'p_pladser.2', 'p_pladser.3', 'p_pladser.4', 'p_pladser.5', 'p_pladser.6', 'p_pladser.7', 'p_pladser.8', 'p_pladser.9', 'p_pladser.10'] 
Len:
 28762 

vejkode 
 [np.int64(4), np.int64(8), np.int64(72), np.int64(12), np.int64(16), np.int64(20), np.int64(24), np.int64(28), np.int64(32), np.int64(36)] 
Len:
 2008 

vejnavn 
 ['Abel Cathrines Gade', 'Abildgaardsgade', 'Ahrenkildes Allé', 'Abildhøj', 'Abrikosvej', 'Absalonsgade', 'Adelgade', 'Admiralgade', 'Adriansvej', 'Agerbo'] 
Len:
 2008 

antal_pladser 
 [np.float64(10.0), np.float64(4.0), np.float64(1.0), np.float64(2.0), np.float64(3.0), np.float64(11.0), np.float64(13.0), np.float64(8.0), np.float64(5.0), np.float64(7.0)] 
Len:
 71 

restriktion 
 ['nej', 'ja', nan] 
Len:
 3 

vejstatus 
 ['Offentlig vej', 'Privat fællesvej', 'Privat vej', nan, 'Privat fællesvej §10 stk3', 'Privat fællessti'] 
Len:
 6 

vejside 
 ['Lige husnr.', 'Ulige husnr.', nan, 'Midt i gaden', 'P-område/areal'] 
Len:
 5 

bydel 
 ['Vesterb

In [5]:
# Drop redundant columns
df_dropped = df.drop(
    columns=[
        "FID",
        #"vejkode",
        #"vejnavn",
        #"antal_pladser",
        #"restriktion",
        "vejstatus",
        "vejside",
        #"bydel",
        #"p_ordning",
        "p_type",
        "p_status",
        #"rettelsedato",
        #"oprettelsesdato",
        "x",
        "id",
        "taelle_id",
        "startdato_midlertidigt_nedlagt",
        "slutdato_midlertidigt_nedlagt",
        #"restriktionstype",
        #"restriktionstekst",
        "taelle_note",
        "delebilsklub",
        #"aendring_p_ordning",
        "uuid",
        "ogc_fid",
        #"wkb_geometry",
    ], errors='ignore')

In [6]:
# Drop any rows which do not contain quantifiable informations
df_dropped = df_dropped.dropna(subset=[
    "antal_pladser",
    "bydel",
    "wkb_geometry"
])

### Renaming columns and changing the types

In [7]:
# Rename columns to English and more specific naming
cols_rename = {
    "vejkode": "street_code",
    "vejnavn": "street_name", 
    "antal_pladser": "no_of_spaces",
    "restriktion": "restriction",
    "bydel": "district",
    "p_ordning": "parking_type",
    "rettelsedato": "date_correction", 
    "oprettelsesdato": "date_creation",
    "restriktionstype": "restriction_type",
    "restriktionstekst": "restriction_text",
    "aendring_p_ordning": "changed_parking_type",
}

df_renamed = df_dropped.copy()
df_renamed.rename(columns=cols_rename, inplace=True)

#--------------------------------------------------------------------------
# Convert to bool values and fill NaN with False
df_renamed["restriction"] = df_renamed["restriction"].fillna(False)
df_renamed["restriction"] = df_renamed["restriction"].map({"ja": True, "nej": False})

#--------------------------------------------------------------------------
# Reorder the columns for better reading
df_renamed = df_renamed[[
    "street_code", 
    "street_name",
    "district", 
    "no_of_spaces", 
    "parking_type",
    "changed_parking_type", 
    "restriction", 
    "restriction_type", 
    "restriction_text",
    "date_creation",
    "date_correction",
    "wkb_geometry"
]]

#--------------------------------------------------------------------------
# Define the conversion dictionary
convert_dict = {
    "street_code": int, 
    "street_name": str,
    "district": str, 
    "no_of_spaces": int, 
    "parking_type": str,
    "changed_parking_type": str, 
    "restriction": bool, 
    "restriction_type": str, 
    "restriction_text": str,
    "wkb_geometry": str   
}

# Convert columns using the dictionary
df_renamed = df_renamed.astype(convert_dict)

### Extract year of creation and correction

In [8]:
# Manual extraction of years from creation and correction date columns
df_renamed["year_creation"] = df_renamed["date_creation"].str.slice(0, 4).astype(int)
df_renamed["year_correction"] = df_renamed["date_correction"].str.slice(0, 4).astype(int)

df_renamed = df_renamed.drop(columns=["date_creation", "date_correction"], errors='ignore')

### Convert geometry data for easier manipulation

In [9]:
# Drop duplicates based on location coordinates
subset = df_renamed[
    df_renamed.duplicated(subset=['wkb_geometry'])
]

df_renamed = pd.concat([df_renamed, subset, subset]).drop_duplicates(keep=False)

In [10]:
# Helper function to convert the Multilinestring format to a list of tuples
def multilinestring_to_tuplelist(s):
    matches = re.findall(r'[-\d\.]+ [-\d\.]+', s)
    coord_list = []
    for match in matches:
        lon, lat = map(float, match.split())
        coord_list.append((lat, lon))
    return coord_list

In [11]:
# Create new columns with parking space coordinates as list of tuples
df_renamed["coordinates"] = df_renamed["wkb_geometry"].apply(multilinestring_to_tuplelist)

# Drop the original geometry column
df_renamed = df_renamed.drop(columns=["wkb_geometry"], errors='ignore')

### Extract geographical data for separate dataset

In [12]:
df_geo = df_renamed[[
    "street_code",
    "street_name",
    "district",
    "no_of_spaces",
    "year_creation",
    "year_correction",
    "coordinates"
]].copy()

df_geo.reset_index(drop=True, inplace=True)

In [13]:
# Store geographical dataframe to new csv file
cleaned_name_geo = "parking_space_locations.csv"
cleaned_path_geo = os.path.join(data_path, cleaned_name_geo)
df_geo.to_csv(cleaned_path_geo, index=False)

In [14]:
# Drop coordinates in initial dataframe
df_renamed = df_renamed.drop(columns=["coordinates"], errors='ignore')

### Generate rows for each year a parking space has existed

In [15]:
# Traverse dataframe and duplicate entries for each year since they were registered as created
data = []
for index, row in df_renamed.iterrows():
    start_y = row["year_creation"]
    end_y = 2025
    for y in range(start_y, end_y+1):
        data.append([*row.tolist(), y])

# Create dataframe from generated data
df_yearly = pd.DataFrame(columns=[*list(df_renamed.columns), "year_active"], data=data)
df_yearly.reset_index(drop=True, inplace=True)

In [16]:
# Store yearly dataframe to new csv file
cleaned_name_yearly = "parking_space_yearly_entries.csv"
cleaned_path_yearly = os.path.join(data_path, cleaned_name_yearly)
df_yearly.to_csv(cleaned_path_yearly, index=False)