# <div style="padding:2rem;font-size:100%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:grey'> PARKING ANALYSIS PREDICTOR</span></b> </div>

This is a collaborative group project done at the end of Phase 4 of Moringa School's Data Science program. The team members of this group include:
1. [Mwiti Mwongo](https://github.com/M13Mwongo)
2. [Grace Mutuku](https://github.com/GraceKoki)
3. [Joy Ogutu](https://github.com/Ogutu01)
4. [Ezra Kipchirchir](https://github.com/dev-ezzy)
5. [Mary Gaceri](https://github.com/MaryGaceri)

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'> Problem statement</span></b> </div>

Traffic is a nightmare, am I right? You can’t drive anywhere without being stuck in traffic for a while, especially in Nairobi. What makes it worse is that a lot of times during high-traffic periods, such as the mornings and evenings, there is a high likelihood of missing out on your desired parking spot that is near your office, especially when looking at county-run parking.
This application hopes to predict the parking patterns and likelihood of having available parking spots in certain areas at a given time of the day.


## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'> Objectives</span></b> </div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>Main Objective:</span></div>

- Predict the availability of parking in a given area in Nairobi

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'>Proposed Solution</span></b> </div>

Here we created a model that would predict the availability of parking spots across Nairobi. Our procedure for doing this is as follows, starting with the data sourcing and culminating in the model creation and deployment.

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**0. Preliminaries**</span></div>

#### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#9DF7E5;height:35px;width:3px;margin:0 1rem 0 0;border-radius:2rem'/><span>**a) Imports & OOP**</span></div>

The necessary libraries were first imported.

In [1]:
# Importing necessary libraries
# Basics
import pandas as pd
import numpy as np
import itertools
import requests
import json
from io import StringIO
from datetime import datetime, timedelta
from requests import api

# Visualization libraries
import matplotlib.pyplot as plt
%matplotlib inline 
import plotly.express as px
import seaborn as sns
import matplotlib.patches as mpatches
from matplotlib.pylab import rcParams
import time

# Modeling libraries
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima.model import ARIMA        
from sklearn.metrics import mean_squared_error, r2_score
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf, adfuller
from sklearn.linear_model import LassoLarsCV
from sklearn.model_selection import TimeSeriesSplit 
from pmdarima import auto_arima      

from prophet import Prophet 

#Model deployment libraries
import joblib    


# Warnings
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
warnings.filterwarnings('ignore')

# Custom Options for displaying rows.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns',100)

In object-oriented programming (OOP), classes serve as blueprints, dividing code into modular components that contain data and actions. They encourage encapsulation, abstraction, and inheritance in code, making it more modular, readable, and manageable. Classes offer an organized way to developing and implementing code, promoting clarity and efficiency in software development.

Consequently, the following classes were implemented and defined below:
1. Data Sourcing
2. Data Understanding
3. Data Preprocessing
4. Data Analysis
5. Modelling

In [2]:
class DataSourcing:
  def __init__(self,df_carparks,df_carpark_structure,df_carpark_history,df_holidays,df_coords):
    
    self.carparks_all = df_carparks.sort_values(by='facility_id')
    self.carpark_structure = df_carpark_structure.sort_values(by='facility_id')
    self.carpark_history = df_carpark_history.sort_values(by='facility_id')
    self.holidays = df_holidays
    self.facility_coordinates = df_coords

In [17]:
class DataUnderstanding(DataSourcing):
  def __init__(self, data_sourcing_object):
    if (isinstance(data_sourcing_object, DataSourcing)):
      self.carparks_all = data_sourcing_object.carparks_all
      self.carpark_structure = data_sourcing_object.carpark_structure
      self.carpark_history = data_sourcing_object.carpark_history
      self.holidays = data_sourcing_object.holidays
      self.facility_coordinates = data_sourcing_object.facility_coordinates
    else:
      raise TypeError('data_sourcing_object must be an instance of DataSourcing')

  def carpark_names(self):
    return self.carparks_all
  
  def carpark_details(self):
    message = f"""
    There are {self.carparks_all.shape[0]} carparks in the dataset.
    
    The highest number of parking spots available is {self.carpark_structure['spots'].max()}, found at {self.carpark_structure.loc[self.carpark_structure['spots'].idxmax(), 'facility_name']}.
    
    The lowest number of parking spots available is {self.carpark_structure['spots'].min()}, found at {self.carpark_structure.loc[self.carpark_structure['spots'].idxmin(), 'facility_name']}.
    
    There are {len(self.carpark_structure.columns)} columns in the dataset: namely {self.carpark_structure.columns.to_list()}
    """
    print(message)
    return None

  def examine_carpark_history(self):
    print(" ################### Details about the data ################### \n ")
    print(f"The dataset is a DataFrame with {self.carpark_history.shape[0]} rows and {self.carpark_history.shape[1]} columns\n")
    print("Columns of the dataset:", self.carpark_history.columns.to_list())
    print("\nFirst 5 records of the dataset ")
    display(self.carpark_history.head())
    
    # Display information about the dataset
    print("\nData information")
    display(self.carpark_history.info())
    print("\nNull Values ")
    display(self.carpark_history.isnull().sum())
    # print("\nDuplicate Values ")
    # print(self.carpark_history.duplicated(), 'duplicate values')
    display(self.carpark_history.describe())
  
    print('\nData Details')
    print(f'Number of unique Parking Facilities:', self.carpark_history.facility_name.nunique())
    # print(f'Number of unique days:', self.carpark_history.date.nunique())
    
    return None


In [4]:
class DataPreprocessing(DataUnderstanding):
  def __init__(self, data_understanding_object):
    if (isinstance(data_understanding_object, DataUnderstanding)):
      self.carparks_all = data_understanding_object.carparks_all
      self.carpark_structure = data_understanding_object.carpark_structure
      self.carpark_history = data_understanding_object.carpark_history
      self.holidays = data_understanding_object.holidays
      self.facility_coordinates = data_understanding_object.facility_coordinates
    else:
      raise TypeError('data_sourcing_object must be an instance of DataSourcing')

  def drop_duplicate_carpark_history(self):
    return self.carpark_history.drop_duplicates()

  def separate_zones_from_carpark_history(self):

    # Converting the zones column to its own dataframe
    df_zones = pd.DataFrame(columns=['spots', 'zone_id', 'zone_name', 'parent_zone_id', 'occupancy.loop','occupancy.monthlies','occupancy.open_gate','occupancy.total','occupancy.transients'])
    rename_format = {
        0: 'spots',
        1: 'zone_id',
        2: 'zone_name',
        3: 'parent_zone_id',
        4: 'occupancy_loops',
        5: 'occupancy_total',
        6: 'occupancy_monthlies',
        7: 'occupancy_open_gate',
        8: 'occupancy_transients'
    }

    zones_list = []

    for index,row in self.carpark_history.iterrows():
        # Normalize values in each record in zones column
        df_zone = pd.json_normalize(row['zones'])
        
        zones_list.append(df_zone)

    # Concatendating zones list
    df_zones = pd.concat(zones_list, ignore_index=True)
    
    # Keeping necessary columns
    self.carpark_history_zones_only = df_zones[['zone_id','occupancy_total']]

    return self.carpark_history_zones_only
  
  def merge_zones_and_carpark_history(self):
    # TODO - TEST THIS FUNCTION TO MAKE SURE IT WORKS
    # Creating merged dataframe
    df = pd.merge(self.carpark_history, self.carpark_history_zones_only, how='outer',left_index=True, right_index=True)
    
    # Dropping the zones column now that the data is merged
    df.drop(columns=['zones'],inplace=True)
    
    # Renaming the spots column to total_parking_spots
    df.rename(columns={'spots':'total_parking_spots'},inplace=True)
    
    # Ensuring the occupancy_total and total_parking_spots are integers
    df['occupancy_total'] = df['occupancy_total'].astype(int)
    df['total_parking_spots'] = df['total_parking_spots'].astype(int)
    
    # Assigning the df to self.carpark_history
    self.carpark_history = df

    return self.carpark_history

  def merge_coords_and_carpark_history(self):
    # Creating merged dataframe
    merged_df = self.carpark_history.merge(self.facility_coordinates, on='facility_id',how='left')
    
    # Update 'longitude' and 'latitude' columns where the condition is met
    merged_df['longitude'].combine
    
    self.facility_coordinates['longitude'] = merged_df['longitude'].combine_first(self.facility_coordinates['longitude'])
    self.facility_coordinates['latitude'] = merged_df['latitude'].combine_first(self.facility_coordinates['latitude'])
    
    self.carpark_history = merged_df
    
    return self.carpark_history
  
  def save_dataframe_to_parquet(self,dataframe,path):
    dataframe.to_parquet(path)
    print("File saved!")
    return None

  def drop_facility_ids(self):
    # Drop records where facility_id is between 486 and 490
    self.carpark_structure = self.carpark_structure[~ self.carpark_structure['facility_id'].between(486, 490)]

    # Drop records where facility_id is between 1 and 5
    self.carpark_structure = self.carpark_structure[~ self.carpark_structure['facility_id'].between(1, 5)]
    
    return self.carpark_structure
  
  def extract_date_time_dayOfWeek(self):
    # Extracting the date from MessageDate
    self.carpark_history['date'] = self.carpark_history['message_date'].str.split('T',expand=True)[0]
    # Extracting the time from MessageDate
    self.carpark_history['time'] = self.carpark_history['message_date'].str.split('T',expand=True)[1]
    # Using the newly created fields to establish the day of the week
    self.carpark_history['day_of_Week'] = pd.to_datetime(self.carpark_history['date']).dt.day_name()
      
    return self.carpark_history


  def categorize_carpark_time(self):
    self.carpark_history['time_category'] = self.carpark_history['time'].apply(lambda x: self.categorize_time(x.hour))
    
    return self.carpark_history

  def drop_columns_in_zones(self):
    # Drop columns in zones
    self.carpark_history_with_zones = self.carpark_history_with_zones.drop(columns=['zones_id','occupancy_total'])
  
  # Helper functions that do not directly modify content in the object instance  
  def categorize_time(hour):
    if 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 18:
        return 'Afternoon'
    elif 18 <= hour < 24:
        return 'Evening'
    else:
        return 'Night'

In [5]:
# WARN - Potential issue with how the data analysis constructor has been made. Test it out and find out
class DataAnalysis(DataPreprocessing):
  def __init__(self, data_understanding_object):
    super().__init__(data_understanding_object)


In [6]:
class Modelling(DataAnalysis):
  def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)

#### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#9DF7E5;height:35px;width:3px;margin:0 1rem 0 0;border-radius:2rem'/><span>**b) The Process of Fetching Data**</span></div>

Our data was sourced from the Transport for New South Wales(TfNSW) website, more speficially, from their [Car Park API](https://opendata.transport.nsw.gov.au/dataset/car-park-api).


The API - whose base URL was `https://api.transport.nsw.gov.au/v1/carpark` - had two endpoints:
1. `{baseURL}?facility={facility_id}` - Containts one optional variable ***facility_id***. Returns occupancy details of a car park based on a facility ID. If the facility ID specified, a list of facility names with their ID will be returned.
2. `{baseURL}}/history?facility={facility_id}&eventdate={date_in_question}` - Contains two mandatory variables, ***facility_id*** and ***date_in_question*** formatted as *YYYY-MM-DD*. Returns historical occupancy details of a car park based on a facility ID
and event date. 

Our intention was to use this API to fetch six months' worth of historical parking data. An extensive time period would lead to a proper understanding of parking habits across a wide array of conditions while factoring in social events, public holidays, school holidays and even leave days of employees.

The team came up with code to automatically make requests to the API, and save this information in a dataframe. However, after further study of the API's structure and the data being received, the team saw it best to have these requests made once and the resulting data stored in json files, which can be read by pandas.

The function below was used to retrieve car park data from the TfNSW API and saves it to a JSON file. It will then read the JSON file into a dataframe, rename the columns as they come with no name from the API

```python
def get_carparks_list():
  dotenv.load_dotenv('.env')
  # path to json file created/saved
  carparks_file_path = './data/carparks_original.json'
  # Delete any existing file at carparks path
  os.remove(carparks_file_path) if os.path.exists(carparks_file_path) else None

  # Creating header for request
  headers = {
      "Authorization": f"apikey {os.environ.get('apikey')}"
  }
  # Specifying url to get carparks
  url_carparks = 'https://api.transport.nsw.gov.au/v1/carpark'

  list_of_carparks = requests.get(url_carparks, headers=headers).json()

  df_carparks = pd.DataFrame.from_dict(list_of_carparks, orient='index')
  # Resetting the index to label the columns afterwards
  df_carparks = df_carparks.reset_index()
  df_carparks.columns = ['facility_id', 'CarParkName']

  # Deleting old file
  os.remove(carparks_file_path) if os.path.exists(carparks_file_path) else None

  # Creating new file with updated column titles
  pd.DataFrame.to_json(df_carparks, carparks_file_path)

  print('File created and updated successfully.')
  return
```

Having the names of the various facilites, the structure of each of the carparks was investigated. It was noted that each car park can have a different configuration, where each facility may have one or more car parks, and each car park may have one or more zones as depicted below.

<div style="text-align:center">
<img src='./images/carpark_structure.png' alt='Carpark structure'>
</div>

Knowing this, the function below was created to fetch the individual details of the carparks - using the JSON file just created - to properly scrutinise their structure. This would then be saved in its own JSON file named `carpark_structure.json` for future reference.

```python
def get_carpark_structure(path_to_carpark_json_file):
  # Delete file found at same path
  os.remove('./data/carpark_structure.json') if os.path.exists('./data/carpark_structure.json') else None

  # Add file to dataframe
  df_carparks = pd.read_json(path_to_carpark_json_file)
  # Initialise array that will hold information
  carpark_details_array = []

  # Loop through carparks to get information
  for index, row in df_carparks.iterrows():
    facility = row['facility_id']
    url = f'https://api.transport.nsw.gov.au/v1/carpark?facility={facility}'

    # Creating header for request
    headers = {
        "Authorization": f"apikey {os.environ.get('apikey')}"
    }
    # Make request
    response = requests.get(url, headers=headers).json()

    # Add to array
    carpark_details_array.append(response)

  # Store information in JSON file
  with open('./data/carpark_structure.json', 'w') as f:
    json.dump(carpark_details_array, f)
  # Create dataframe and return it
  return pd.DataFrame(carpark_details_array)
```

Having done that, a new function - named `date_getter` - was created to give a list of all the days in a given time period. This function generates a list of dates based in the input time delta based, taking a time delta as an argument and returns a list of dates in the format "YYY-MM-DD".

This would be useful as carpark history for each of the carparks within a given time delta would be needed.

```python
def date_getter(td):
    # Array that stores the dates to be searched for
    date_period_list = []

    # The last date to be searched for
    cutoff_date = datetime(2023, 12, 31)
    target_date = cutoff_date - td

    # Ensure that records of each day are obtained
    delta = timedelta(days=1)

    while target_date <= cutoff_date:
        date_period_list.append(target_date.strftime("%Y-%m-%d"))
        target_date += delta

    return date_period_list
```

Having a date function, a new function (`get_carpark_history`) was made to fetch the carpark history of a particular facility across a range of dates.

This function is used to get carpark history data for a specific facility and dates, taking the name of the carpark facility and the list of dates for which to retrieve carpark history data as arguments. It returns a dataFrame containing the carpark history data, while saving the data into a file.

```python
def get_carpark_history(facility, dates_array):

    # Initialize data array
    data_array = []

    # Define the path for the JSON file
    json_file_path = f"./data/carpark history/facility_{facility}.json"

    # Set the request header
    headers = {
        "Authorization": f"apikey {os.environ.get('apikey')}"
    }

    # Delete the file if it exists
    if os.path.exists(json_file_path):
        os.remove(json_file_path)

    # Make a request for each date and aggregate the data
    for date in dates_array:
        url = f'https://api.transport.nsw.gov.au/v1/carpark/history?facility={facility}&eventdate={date}'
        response = requests.get(url, headers=headers).json()

        if data_array == []:
            data_array = response
        else:
            data_array = data_array + response

    # Save the data to a JSON file
    with open(json_file_path, 'w') as f:
        json.dump(data_array, f)

    # Read the JSON file
    with open(json_file_path) as f:
        data = json.load(f)

    # Convert the read data into a pandas DataFrame
    return pd.DataFrame(data)
```

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**1. Data Sourcing:**</span></div>

Our data was sourced from the Transport for New South Wales(TfNSW) website, more speficially, from their [Car Park API](https://opendata.transport.nsw.gov.au/dataset/car-park-api).

The API - whose base URL was `https://api.transport.nsw.gov.au/v1/carpark` - had two endpoints:
1. `{baseURL}?facility={facility_id}` - Containts one optional variable ***facility_id***. Returns occupancy details of a car park based on a facility ID. If the facility ID specified, a list of facility names with their ID will be returned.
2. `{baseURL}}/history?facility={facility_id}&eventdate={date_in_question}` - Contains two mandatory variables, ***facility_id*** and ***date_in_question*** formatted as *YYYY-MM-DD*. Returns historical occupancy details of a car park based on a facility ID
and event date. 

Data was sourced over a 6 month period, from the beginning of July 2023 to 31st December 2023. A loop was created for each facility using the given date range, and the `get_carpark_history` function was run within that loop. The respective files that were saved contained the parking history of that facility for the 6-month time period (found in *./data/carpark_history_6_months/facility_<<facility_id>>*). However, in a bid to simplify the starting point and to ensure that one dataframe is used as our starting point, the code below was implemented to read all the data from the various parquet files and put it in one file, from which the one dataframe was created.

```python
df = pd.DataFrame()

for file in os.listdir('data/carpark_history_6_months'):
  df_file = pd.read_parquet('data/carpark_history_6_months/' + file)
    
    if file == 'facility_6.parquet':
      df = df_file
    else:
      df = pd.concat([df,df_file]).reset_index(drop=True)

# Save to parquet
df.to_parquet('data/carpark_history_6_months.parquet')
```

The parquet file was chosen due as its columnar storage format is highly efficient for both reading and writing large datasets due to its compression and columnar layout.

In [7]:
df_carpark_history = pd.read_parquet('./data/carpark_history_6_months.parquet')

Despite there being 38 facilities in total, data was read from 28 of them. Facilities 1-5 did not have any historical data for the period in question, while facilities 486-490 were noted as having inaccurate data by the data providers. Thus, both these categories of facilities were omitted. 

Moving on, the data containing the parking lot structure as well as the parking lot names can now be converted to a dataframe. 

In [8]:
df_carparks = pd.read_json('./data/carparks_original.json')
df_carpark_structure = pd.read_json('./data/carpark_structure.json')

Furthermore, the files containing the geolocation coordinates of each parking facility, as well as the public holiday information for Australia were loaded into their own respective dataframes. This information will come in handy later on.

In [9]:
# Creating the dataframe for the holidays
df_holidays = pd.read_csv('./data/NSW_holidays_2023.csv')

# Dataframe for the geocoordinates
df_coords = pd.read_json('./data/coords.json')

Having done this, all the dataframes can now be passed onto the DataSourcing class

In [10]:
data_sourcing = DataSourcing(df_carparks,df_carpark_structure,df_carpark_history,df_holidays,df_coords)

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**2. Data Understanding:**</span></div>

The identification, gathering, and cursory analysis of the data in this part will be carried out by:

- Gathering preliminary data, which has been put into a JSON file.
- Describing the data that we have at our disposal.
- Looking for patterns and correlations in the data.
- Confirming the accuracy of the data.

Instantiating the data understanding class

In [18]:
data_understanding = DataUnderstanding(data_sourcing)

Having done this, a general summary of the all the carparks' parking history is outlined:

In [19]:
data_understanding.examine_carpark_history()

 ################### Details about the data ################### 
 
The dataset is a DataFrame with 2924545 rows and 10 columns

Columns of the dataset: ['tsn', 'time', 'spots', 'zones', 'ParkID', 'occupancy', 'MessageDate', 'facility_id', 'facility_name', 'tfnsw_facility_id']

First 5 records of the dataset 


Unnamed: 0,tsn,time,spots,zones,ParkID,occupancy,MessageDate,facility_id,facility_name,tfnsw_facility_id
0,207210,741448862,213,"[{'occupancy': {'loop': None, 'monthlies': Non...",1,"{'loop': None, 'monthlies': None, 'open_gate':...",2023-07-01T00:01:02,6,Gordon Henry St North Car Park,207210TPR001
45565,207210,751976024,213,"[{'occupancy': {'loop': None, 'monthlies': Non...",1,"{'loop': None, 'monthlies': None, 'open_gate':...",2023-10-30T21:13:44,6,Gordon Henry St North Car Park,207210TPR001
45566,207210,751976564,213,"[{'occupancy': {'loop': None, 'monthlies': Non...",1,"{'loop': None, 'monthlies': None, 'open_gate':...",2023-10-30T21:22:44,6,Gordon Henry St North Car Park,207210TPR001
45567,207210,751976893,213,"[{'occupancy': {'loop': None, 'monthlies': Non...",1,"{'loop': None, 'monthlies': None, 'open_gate':...",2023-10-30T21:28:13,6,Gordon Henry St North Car Park,207210TPR001
45568,207210,751978322,213,"[{'occupancy': {'loop': None, 'monthlies': Non...",1,"{'loop': None, 'monthlies': None, 'open_gate':...",2023-10-30T21:52:02,6,Gordon Henry St North Car Park,207210TPR001



Data information
<class 'pandas.core.frame.DataFrame'>
Index: 2924545 entries, 0 to 2924544
Data columns (total 10 columns):
 #   Column             Dtype 
---  ------             ----- 
 0   tsn                int32 
 1   time               int32 
 2   spots              int32 
 3   zones              object
 4   ParkID             int32 
 5   occupancy          object
 6   MessageDate        object
 7   facility_id        int32 
 8   facility_name      object
 9   tfnsw_facility_id  object
dtypes: int32(5), object(5)
memory usage: 189.7+ MB


None


Null Values 


tsn                  0
time                 0
spots                0
zones                0
ParkID               0
occupancy            0
MessageDate          0
facility_id          0
facility_name        0
tfnsw_facility_id    0
dtype: int64

Unnamed: 0,tsn,time,spots,ParkID,facility_id
count,2924545.0,2924545.0,2924545.0,2924545.0,2924545.0
mean,873850.8,750118100.0,766.3994,1.0,20.13059
std,899447.5,4247952.0,505.2506,0.0,7.783199
min,207210.0,741448800.0,42.0,1.0,6.0
25%,217933.0,746832000.0,373.0,1.0,15.0
50%,275010.0,750456200.0,700.0,1.0,20.0
75%,2126158.0,753681200.0,1057.0,1.0,27.0
max,2155384.0,757346200.0,1884.0,1.0,33.0



Data Details
Number of unique Parking Facilities: 28


We can further look at the names and facility_ids of the various carparks.

In [20]:
data_understanding.carpark_names()

Unnamed: 0,facility_id,CarParkName
0,1,Tallawong Station Car Park (historical only)
11,2,Kellyville Station Car Park (historical only)
22,3,Bella Vista Station Car Park (historical only)
27,4,Hills Showground Station Car Park (historical ...
33,5,Cherrybrook Station Car Park (historical only)
34,6,Gordon Henry St North Car Park
35,7,Kiama Car Park
36,8,Gosford Car Park
37,9,Revesby Car Park
1,10,Warriewood Car Park


A closer look at the detailed structure of the carparks was also done.

In [21]:
data_understanding.carpark_structure

Unnamed: 0,tsn,time,spots,zones,ParkID,occupancy,MessageDate,facility_id,facility_name,tfnsw_facility_id
0,2155384,742877319,1004,"[{'spots': '152', 'zone_id': 'CPS-CUD1', 'occu...",1,"{'loop': None, 'total': '981', 'monthlies': No...",2023-07-17T12:48:39,1,Tallawong Station Car Park,2155384CCP001
11,2155382,742877319,1374,"[{'spots': '368', 'zone_id': 'CPS-KVE1', 'occu...",1,"{'loop': None, 'total': '1363', 'monthlies': N...",2023-07-17T12:48:39,2,Kellyville Station Car Park,2155382CCP001
22,2153478,742877319,800,"[{'spots': '800', 'zone_id': 'CPS-BLV', 'occup...",1,"{'loop': None, 'total': '314', 'monthlies': No...",2023-07-17T12:48:39,3,Bella Vista Station Car Park,2153478CCP001
27,2154392,742877319,600,"[{'spots': '600', 'zone_id': 'CPS-SHW', 'occup...",1,"{'loop': None, 'total': '532', 'monthlies': No...",2023-07-17T12:48:39,4,Hills Showground Station Car Park,2154392CCP001
33,2126158,742877319,400,"[{'spots': '400', 'zone_id': 'CPS-CHE', 'occup...",1,"{'loop': None, 'total': '400', 'monthlies': No...",2023-07-17T12:48:39,5,Cherrybrook Station Car Park,2126158CCP001
34,207210,760434253,213,"[{'spots': '213', 'zone_id': '1', 'occupancy':...",1,"{'loop': None, 'total': '56', 'monthlies': Non...",2024-02-05T18:44:13,6,Gordon Henry St North Car Park,207210TPR001
35,253330,760434246,42,"[{'spots': '42', 'zone_id': '1', 'occupancy': ...",1,"{'loop': None, 'total': '0', 'monthlies': None...",2024-02-05T18:44:06,7,Kiama Car Park,253330TPR001
36,225040,760434254,1057,"[{'spots': '1057', 'zone_id': '1', 'occupancy'...",1,"{'loop': None, 'total': '368', 'monthlies': No...",2024-02-05T18:44:14,8,Gosford Car Park,225040TPR001
37,221210,760434255,934,"[{'spots': '934', 'zone_id': '1', 'occupancy':...",1,"{'loop': '321462', 'total': '168', 'monthlies'...",2024-02-05T18:44:15,9,Revesby Car Park,221210TPR001
1,2101131,760433800,244,"[{'spots': '244', 'zone_id': '1', 'occupancy':...",1,"{'loop': None, 'total': '75', 'monthlies': Non...",2024-02-05T18:36:40,10,Warriewood Car Park,2101131TPR001


A summary of the carparks was shown as well

In [22]:
data_understanding.carpark_details()


    There are 38 carparks in the dataset.
    
    The highest number of parking spots available is 1884, found at Leppington Car Park.
    
    The lowest number of parking spots available is 42, found at Kiama Car Park.
    
    There are 10 columns in the dataset: namely ['tsn', 'time', 'spots', 'zones', 'ParkID', 'occupancy', 'MessageDate', 'facility_id', 'facility_name', 'tfnsw_facility_id']
    


### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**3. Data Preprocessing:**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**4. Explorative Data Analysis & Visualisation**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**5. Modelling**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**6. Deployment**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**7. Conclusion**</span></div>

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'>Conclusion</span></b> </div>