# <div style="padding:2rem;font-size:100%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:grey'> PARKING ANALYSIS PREDICTOR</span></b> </div>

This is a collaborative group project done at the end of Phase 4 of Moringa School's Data Science program. The team members of this group include:
1. [Mwiti Mwongo](https://github.com/M13Mwongo)
2. [Grace Mutuku](https://github.com/GraceKoki)
3. [Joy Ogutu](https://github.com/Ogutu01)
4. [Ezra Kipchirchir](https://github.com/dev-ezzy)
5. [Mary Gaceri](https://github.com/MaryGaceri)

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'> Problem statement</span></b> </div>

Traffic is a nightmare, am I right? You can’t drive anywhere without being stuck in traffic for a while, especially in Nairobi. What makes it worse is that a lot of times during high-traffic periods, such as the mornings and evenings, there is a high likelihood of missing out on your desired parking spot that is near your office, especially when looking at county-run parking.
This application hopes to predict the parking patterns and likelihood of having available parking spots in certain areas at a given time of the day.


## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'> Objectives</span></b> </div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>Main Objective:</span></div>

- Predict the availability of parking in a given area in Nairobi

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'>Proposed Solution</span></b> </div>

Here we created a model that would predict the availability of parking spots across Nairobi. Our procedure for doing this is as follows, starting with the data sourcing and culminating in the model creation and deployment.

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**0. Preliminaries**</span></div>

#### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#9DF7E5;height:35px;width:3px;margin:0 1rem 0 0;border-radius:2rem'/><span>**a) Imports & OOP**</span></div>

The necessary libraries were first imported.

In [1]:
# Importing necessary libraries
# Basics
import pandas as pd
import numpy as np
import itertools
import requests
import json
from io import StringIO
from datetime import datetime, timedelta
from requests import api

# Visualization libraries
import matplotlib.pyplot as plt
%matplotlib inline 
import plotly.express as px
import seaborn as sns
import matplotlib.patches as mpatches
from matplotlib.pylab import rcParams
import time

# Modeling libraries
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima.model import ARIMA        
from sklearn.metrics import mean_squared_error, r2_score
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import acf, pacf, adfuller
from sklearn.linear_model import LassoLarsCV
from sklearn.model_selection import TimeSeriesSplit 
from pmdarima import auto_arima      

from prophet import Prophet 

#Model deployment libraries
import joblib    


# Warnings
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
warnings.filterwarnings('ignore')

# Custom Options for displaying rows.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns',100)

In object-oriented programming (OOP), classes serve as blueprints, dividing code into modular components that contain data and actions. They encourage encapsulation, abstraction, and inheritance in code, making it more modular, readable, and manageable. Classes offer an organized way to developing and implementing code, promoting clarity and efficiency in software development.

Consequently, the following classes were implemented and defined below:
1. Data Sourcing
2. Data Preprocessing
3. Data Analysis
4. Modelling

In [2]:
class DataSourcing:
  def __init__(self,df_carparks,df_carpark_structure,df_carpark_history):
    
    self.carparks_all = df_carparks.sort_values(by='facility_id')
    self.carpark_structure = df_carpark_structure.sort_values(by='facility_id')
    self.carpark_history = df_carpark_history.sort_values(by='facility_id')
  
  def get_carparks_df(self):
    """
    Get the carparks from the instance variable `carparks_all`.
    
    Returns:
        list: List of carparks.
    """
    return self.carparks_all
  def get_carpark_details_df(self):
    """
    Get the carpark details.

    Returns:
        dict: The carpark details.
    """
    return self.carpark_structure
  def get_carpark_history_df(self):
    """
    Get the car park history.

    Returns:
        carpark_history: The history of the car park.
    """
    return self.carpark_history
  
class DataPreprocessing:
  def __init__(self)->None:
    pass
  
  def extract_date_time(message_date):
    date = message_date.split('T')[0]
    time = message_date.split('T')[1]
    return date,time

class DataAnalysis:
  pass

class Modelling:
  pass

#### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#9DF7E5;height:35px;width:3px;margin:0 1rem 0 0;border-radius:2rem'/><span>**b) The Process of Fetching Data**</span></div>

Our data was sourced from the Transport for New South Wales(TfNSW) website, more speficially, from their [Car Park API](https://opendata.transport.nsw.gov.au/dataset/car-park-api).


The API - whose base URL was `https://api.transport.nsw.gov.au/v1/carpark` - had two endpoints:
1. `{baseURL}?facility={facility_id}` - Containts one optional variable ***facility_id***. Returns occupancy details of a car park based on a facility ID. If the facility ID specified, a list of facility names with their ID will be returned.
2. `{baseURL}}/history?facility={facility_id}&eventdate={date_in_question}` - Contains two mandatory variables, ***facility_id*** and ***date_in_question*** formatted as *YYYY-MM-DD*. Returns historical occupancy details of a car park based on a facility ID
and event date. 

Our intention was to use this API to fetch six months' worth of historical parking data. An extensive time period would lead to a proper understanding of parking habits across a wide array of conditions while factoring in social events, public holidays, school holidays and even leave days of employees.

The team came up with code to automatically make requests to the API, and save this information in a dataframe. However, after further study of the API's structure and the data being received, the team saw it best to have these requests made once and the resulting data stored in json files, which can be read by pandas.

The function below was used to retrieve car park data from the TfNSW API and saves it to a JSON file. It will then read the JSON file into a dataframe, rename the columns as they come with no name from the API

```py
def get_carparks_list():
  dotenv.load_dotenv('.env')
  # path to json file created/saved
  carparks_file_path = './data/carparks_original.json'
  # Delete any existing file at carparks path
  os.remove(carparks_file_path) if os.path.exists(carparks_file_path) else None

  # Creating header for request
  headers = {
      "Authorization": f"apikey {os.environ.get('apikey')}"
  }
  # Specifying url to get carparks
  url_carparks = 'https://api.transport.nsw.gov.au/v1/carpark'

  list_of_carparks = requests.get(url_carparks, headers=headers).json()

  df_carparks = pd.DataFrame.from_dict(list_of_carparks, orient='index')
  # Resetting the index to label the columns afterwards
  df_carparks = df_carparks.reset_index()
  df_carparks.columns = ['facility_id', 'CarParkName']

  # Deleting old file
  os.remove(carparks_file_path) if os.path.exists(carparks_file_path) else None

  # Creating new file with updated column titles
  pd.DataFrame.to_json(df_carparks, carparks_file_path)

  print('File created and updated successfully.')
  return
```

However, despite having a complete list of all the carparks, not all the carparks will be used. As per the API, there are certain carparks that will not have accurate information for a certain column which will be needed. As such, there is no need to keep records for these carparks and they will be removed from the `carparks_original.json` file. 

These carparks have the facility_ids ranging between `486` and `490` (inclusive). The function below was used to remove these records.

```python
# Dropping rows 28,29,30,31,32
df_carpark_details.drop(index=[28,29,30,31,32],inplace=True)

# Resetting the index
df_carpark_details.reset_index(drop=True,inplace=True)

# Sorting by facility_id
df_carpark_details.sort_values(by='facility_id',inplace=True)
df_carpark_details
```

Having the names of the various facilites, the structure of each of the carparks was investigated. It was noted that each car park can have a different configuration, where each facility may have one or more car parks, and each car park may have one or more zones as depicted below.

<div style="text-align:center">
<img src='./images/carpark_structure.png' alt='Carpark structure'>
</div>

Knowing this, the function below was created to fetch the individual details of the carparks - using the JSON file just created - to properly scrutinise their structure. This would then be saved in its own JSON file named `carpark_structure.json` for future reference.

```py
def get_carpark_structure(path_to_carpark_json_file):
  # Delete file found at same path
  os.remove('./data/carpark_structure.json') if os.path.exists('./data/carpark_structure.json') else None
  # Add file to dataframe
  df_carparks = pd.read_json(path_to_carpark_json_file)
  # Initialise array that will hold information
  carpark_details_array = []

  # Loop through carparks to get information
  for index, row in df_carparks.iterrows():
    facility = row['facility_id']
    url = f'https://api.transport.nsw.gov.au/v1/carpark?facility={facility}'

    # Creating header for request
    headers = {
        "Authorization": f"apikey {os.environ.get('apikey')}"
    }
    # Make request
    response = requests.get(url, headers=headers).json()

    # Add to array
    carpark_details_array.append(response)

  # Store information in JSON file
  with open('./data/carpark_structure.json', 'w') as f:
    json.dump(carpark_details_array, f)
  # Create dataframe and return it
  return pd.DataFrame(carpark_details_array)
```

Having done that, a new function - named `date_getter` - was created to give a list of all the days in a given time period. This would be useful as carpark history for each of the carparks within a given time delta would be needed.

```py
def date_getter(td):
    """
    Generate a list of dates based on the input time delta.

    Args:
    td (timedelta): The time delta to subtract from the cutoff date.

    Returns:
    list: A list of dates in the format "YYYY-MM-DD".
    """
    # Array that stores the dates to be searched for
    date_period_list = []

    # The last date to be searched for
    cutoff_date = datetime(2023, 12, 31)
    target_date = cutoff_date - td

    # Ensure that records of each day are obtained
    delta = timedelta(days=1)

    while target_date <= cutoff_date:
        date_period_list.append(target_date.strftime("%Y-%m-%d"))
        target_date += delta

    return date_period_list
```

Having a date function, a new function (`get_carpark_history`) was made to fetch the carpark history of a particular facility across a range of dates.

```python
def get_carpark_history(facility, dates_array):
    """
    Get carpark history data for a specific facility and dates.

    Args:
    facility (str): The name of the carpark facility.
    dates_array (list): List of dates for which to retrieve carpark history data.

    Returns:
    pandas.DataFrame: DataFrame containing the carpark history data.
    """
    # Initialize data array
    data_array = []

    # Define the path for the JSON file
    json_file_path = f"./data/carpark history/facility_{facility}.json"

    # Set the request header
    headers = {
        "Authorization": f"apikey {os.environ.get('apikey')}"
    }

    # Delete the file if it exists
    if os.path.exists(json_file_path):
        os.remove(json_file_path)

    # Make a request for each date and aggregate the data
    for date in dates_array:
        url = f'https://api.transport.nsw.gov.au/v1/carpark/history?facility={facility}&eventdate={date}'
        response = requests.get(url, headers=headers).json()

        if data_array == []:
            data_array = response
        else:
            data_array = data_array + response

    # Save the data to a JSON file
    with open(json_file_path, 'w') as f:
        json.dump(data_array, f)

    # Read the JSON file
    with open(json_file_path) as f:
        data = json.load(f)

    # Convert the read data into a pandas DataFrame
    return pd.DataFrame(data)
```

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**1. Data Sourcing:**</span></div>

Our data was sourced from the Transport for New South Wales(TfNSW) website, more speficially, from their [Car Park API](https://opendata.transport.nsw.gov.au/dataset/car-park-api).

The API - whose base URL was `https://api.transport.nsw.gov.au/v1/carpark` - had two endpoints:
1. `{baseURL}?facility={facility_id}` - Containts one optional variable ***facility_id***. Returns occupancy details of a car park based on a facility ID. If the facility ID specified, a list of facility names with their ID will be returned.
2. `{baseURL}}/history?facility={facility_id}&eventdate={date_in_question}` - Contains two mandatory variables, ***facility_id*** and ***date_in_question*** formatted as *YYYY-MM-DD*. Returns historical occupancy details of a car park based on a facility ID
and event date. 

Data was sourced over a 3 month period, from the beginning of October 2023 to 31st December 2023. A loop was created for each facility using the given date range, and the `get_carpark_history` function was run within that loop. The respective files that were saved contained the parking history of that facility for the 3-month time period (found in *./data/carpark_history_3_months/facility_<<facility_id>>*). However, in a bid to simplify the starting point and to ensure that one dataframe is used as our starting point, the code below was implemented to read all the data from the various JSON files and put it in one file, from which the one dataframe was created.

```python
combined = []

for file in os.listdir('data/carpark_history_3_months'):
  specific_file = 'data/carpark_history_3_months/' + file
  
  with open(specific_file) as f:
    data = json.load(f)
    
    if combined == []:
      combined = data
    else:
      combined = combined + data
  print(f"Gone through data of {file}")

# Save to csv
combined_df = pd.DataFrame(combined)
combined_df.to_parquet('data/carpark_history_3_months_combined.parquet', index=False)

print("<---------------DONE---------------->")
```

The parquet file was chosen due to its small size where large datasets are concerned, and the fact that it is rapidly reads data. The data from the parquet file was read and put in a dataframe

In [3]:
df_carpark_history = pd.read_parquet('./data/carpark_history_3_months_combined.parquet')

Despite there being 38 facilities in total, data was read from 28 of them. Facilities 1-5 did not have any historical data for the period in question, while facilities 486-490 were noted as having inaccurate data by the data providers. Thus, both these categories of facilities were omitted. 

Moving on, the data containing the parking lot structure as well as the parking lot names can now be converted to a dataframe. 

In [4]:
df_carparks = pd.read_json('./data/carparks_original.json')
df_carpark_structure = pd.read_json('./data/carpark_structure.json')

Having done this, the dataframes can now be passed onto the DataSourcing class

In [5]:
data_sourcing = DataSourcing(df_carparks,df_carpark_structure,df_carpark_history)

In [6]:
data_sourcing.carparks_all

Unnamed: 0,facility_id,CarParkName
0,1,Tallawong Station Car Park (historical only)
11,2,Kellyville Station Car Park (historical only)
22,3,Bella Vista Station Car Park (historical only)
27,4,Hills Showground Station Car Park (historical ...
33,5,Cherrybrook Station Car Park (historical only)
34,6,Gordon Henry St North Car Park
35,7,Kiama Car Park
36,8,Gosford Car Park
37,9,Revesby Car Park
1,10,Warriewood Car Park


### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**2. Data Understanding:**</span></div>

The identification, gathering, and cursory analysis of the data in this part will be carried out by:

- Gathering preliminary data, which has been put into a JSON file.
- Describing the data that we have at our disposal.
- Looking for patterns and correlations in the data.
- Confirming the accuracy of the data.

Firstly, to have access the data, we shall retrieve the dataframes from `data_sourcing`

In [7]:
data_sourcing = DataSourcing(carpark_history_file_path='./carpark_history_facility_14.json',carpark_structure_file_path='./data/carpark_structure.json',carparks_file_path='./data/carparks_original.json')

TypeError: DataSourcing.__init__() got an unexpected keyword argument 'carpark_history_file_path'

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**3. Data Preprocessing:**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**4. Data Handling**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**5. Explorative Data Analysis & Visualisation**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**6. Modelling**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**7. Deployment**</span></div>

### <div style='display:flex;align-items:center;flex-direction:row'><hr style='background-color:#d4ff00;height:35px;width:4px;margin:0 1rem 0 0;border-radius:2rem'/><span>**8. Conclusion**</span></div>

## <div style="padding:2rem;font-size:80%;text-align:left;display:fill;border-radius:0.25rem;overflow:hidden;background-image: url(https://images.pexels.com/photos/2860804/pexels-photo-2860804.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1)"><b><span style='color:white'>Conclusion</span></b> </div>