<a href="https://colab.research.google.com/github/R-Ramana/EE4211-Project/blob/main/EE4211_Group_9_Question_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Project Proposal 

**We propose a multi-pronged approach wherein we will be tackling the given problem from the perspective of 2 different stakeholders: vehicle drivers and the Urban Redevelopment Authority (URA)/Land Transport Authority (LTA). Our approach allows us to analyse data in the short term as well as the long term.**

## Short-term analytics

Our recommendation system targets users aiming to plan their trips early. Such users will be required to provide their destination and estimated time of arrival (in terms of hours from now) and our system will provide them with the best available parking option based on distance from their destination and parking availability.

### ML Forecasting for carpark availability
 - Dynamic ML system that uses the following data depending on the month to predict hourly availability[1]:

     - Past 2 (exclude 2020 due to covid) years (October 2019, October 2021) for the same month (in our case, October)
     - Past 2 months (August and September 2022)
     - October 2022 data for testing purposes.
     
**[1] For the purpose of this project we will be predicting October 2022's data to allow us to test the usability/performance of the system.**

### Algorithm
        
 1. Filter out a list of open carparks based on arrival hour. 
 2. Calculate distance based on destination geo coordinates and coordinates of open carparks
 3. Partial Selection Sort according to distance (least to max) to store top 5 possible open carparks with least distance for that sample.
 3. Predict carpark availability for the top 5 chosen carparks with the given arrival hour using the ML model described above.
 4. Choose the most available carpark from the top 5 carparks based on distance.
     



<table>
  <tr>
    <th style="text-align:center;">Data Attribute</th>
    <th style="text-align:center;">Source</th>
    <th style="text-align:center;">Remarks</th>
  </tr>
  <tr>
    <td style="text-align:left;">Geo-coordinates of current/final destination</td>
    <td style="text-align:left;">User Input</td>
    <td style="text-align:left;">To provide best carpark recommendations based on estimated time of arrrival and carparks near destination</td>
  </tr>
  <tr>
    <td style="text-align:left;">Geo-coordinates of carparks</td>
    <td style="text-align:left;" rowspan="3"><a href"https://data.gov.sg/dataset/hdb-carpark-information">Carpark Info Dataset</a></td>
    <td style="text-align:left;">To recommend carparks based on distance from destination</td>
  </tr>
  <tr>
      <td style="text-align:left;">Cost of parking</td>
      <td style="text-align:left;">To recommend carparks based on cost</td>
      
  </tr>
  <tr>
      <td style="text-align:left;">Carpark opening hours</td>
      <td style="text-align:left;">To recommend carparks that are open during user's time of arrival</td>
      
  </tr>
  <tr>
      <td style="text-align:left;">Carpark availability</td>
      <td style="text-align:left;"><a href"https://data.gov.sg/dataset/carpark-availability">Carpark Availability Dataset</a></td>
      <td style="text-align:left;">To recommend carparks based on availability</td>
  <tr>
      <td colspan="3" style="text-align:center; font-size:12px; text-transform:uppercase;"><b>If time permits...</b></td>
      
      
  </tr>
  <tr>
      <td style="text-align:left;">Weather forecast</td>
      <td style="text-align:left;"><a href"https://www.programmableweb.com/api/nea-datasets">NEA Weather API</a></td>
      <td style="text-align:left;">To recommend carparks based on current weather conditions</td>
      
  </tr>
  <tr>
      <td style="text-align:left;">Traffic Conditions</td>
      <td style="text-align:left;"><a href"https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html">Estimated Travel Times API</td>
      <td style="text-align:left;">To improve our time-of-arrival estimation</td>
      
  </tr>

</table>


**If time permits, we intend on supplementing our current data sources with Real-time Traffic Data and Weather forecasting insights to improve our reccomendations.**

### Long-term analytics

#### From a urban developer/Singapore Land Transport Authority's point of view:

We believe long-term analysis of this data will be beneficial from an urban developer's perspective, specifically for the LTA and URA. By looking at the usage of carparks trends through historical data analysis, we can forecast the availability of carparks in the long-term. Developers can identify hotspots where there may often be high usage of car parks and thus an excellent location to build more lots or carparks to cater to the high demand. In contrast, this long-term analysis can also identify potential spots where space can be saved by reducing the number of lots or carparks that don't have high usage forecasted.

**If time permits, with the additional data on upcoming HDB developments, we can take in the geo-coordinates of existing carparks as well as carpark availability, to predict if more carparks need to be built around these upcoming HDB developments.**


### PROPOSAL FEEDBACK:


*   Very extensive range of dataset and data sources

#### Recommendations:
*   Long-term analytics can clearly use a machine learning model, whereas for the short-term
analytics, the “features” are clear, but the “labels” are not so clear. If the “labels” are made
clear in the final report, this would be an excellent problem and solution.
*   There is a higher potential for the short-term analytics proposal, but it is tougher as well. We
recommend that this group approaches us during the live consultation sessions if you want
to verify with us and provide us with more details so that we can help. Else, if the group
already has clearly defined “labels” in mind, then please execute it. 

*Q3.1 At this point, you understand the data quite well. Carry out the analysis you proposed
in your group project proposal. You should use the dataset given but you may also use
additional datasets to supplement your analysis, look at unaggregated data, etc. Please
be sure to justify why the analysis is useful and interesting in the context of a data science
project. Note that you are not limited to the initial proposal and are free to expand on it.*

In [5]:
import pandas as pd
import numpy as np

carpark_info = pd.read_csv("hdb-carpark-information.csv")
carpark_info

Unnamed: 0,car_park_no,address,x_coord,y_coord,car_park_type,type_of_parking_system,short_term_parking,free_parking,night_parking,car_park_decks,gantry_height,car_park_basement
0,ACB,BLK 270/271 ALBERT CENTRE BASEMENT CAR PARK,30314.7936,31490.4942,BASEMENT CAR PARK,ELECTRONIC PARKING,WHOLE DAY,NO,YES,1,1.80,Y
1,ACM,BLK 98A ALJUNIED CRESCENT,33758.4143,33695.5198,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,5,2.10,N
2,AH1,BLK 101 JALAN DUSUN,29257.7203,34500.3599,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,0.00,N
3,AK19,BLOCK 253 ANG MO KIO STREET 21,28185.4359,39012.6664,SURFACE CAR PARK,COUPON PARKING,7AM-7PM,NO,NO,0,0.00,N
4,AK31,BLK 302/348 ANG MO KIO ST 31,29482.0290,38684.1754,SURFACE CAR PARK,COUPON PARKING,NO,NO,NO,0,0.00,N
...,...,...,...,...,...,...,...,...,...,...,...,...
2177,Y77M,BLK 461 YISHUN AVENUE 6,29850.1522,45576.0125,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,10,2.15,N
2178,Y78M,BLK 468 YISHUN ST 43,30057.2209,45166.4820,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,4,2.15,N
2179,Y8,"BLK 731/746 YISHUN STREET 71,72/AVENUE 5",27772.9219,45686.2734,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,4.50,N
2180,Y82M,BLK 478 YISHUN ST 42,29935.5818,45679.7181,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,11,2.15,N


In [35]:
## Reference from: https://github.com/zkkmin/coordconvert/blob/master/coordConverter.py

import requests
import csv
from sys import argv
import sys

# If python version is 2.5, install simplejson module and modified the module path in sys.path.apped()
if sys.version_info[:2][1] > 5:
    import json
else:
    sys.path.append("C:\\Python25\\Lib\\site-packages\\simplejson-3.0.7-py2.5.egg")
    import simplejson as json


url = "http://tasks.arcgisonline.com/ArcGIS/rest/services/Geometry/GeometryServer/project?"
inSR = ""
outSR = ""
X = ""
Y = ""
       
inSR = "3414"
outSR = "4326"

lat = []
lon = []

carpark_info['lon'] = 0

for i in range(0, len(carpark_info)):
    X = carpark_info.iloc[i]["x_coord"]
    Y = carpark_info.iloc[i]["y_coord"]

    geometries = 'geometries=%7B"geometryType"%3A"esriGeometryPoint"%2C"geometries"%3A%5B%7B"x"%3A' + str(X) + '%2C"y"%3A' + str(Y) + '%7D%5D%7D&f=pjson'  
    fullurl = url + 'inSR=' + inSR + '&outSR=' +outSR + '&' + geometries

    r = requests.get(fullurl)
    contents = json.loads(r.text)

    latitude = contents['geometries'][0]['y']
    lat.append(latitude)
    longitude = contents['geometries'][0]['x']
    lon.append(longitude)

In [34]:
carpark_info['lat'] = lat
carpark_info['lon'] = lon

carpark_info

Unnamed: 0,car_park_no,address,x_coord,y_coord,car_park_type,type_of_parking_system,short_term_parking,free_parking,night_parking,car_park_decks,gantry_height,car_park_basement,lat,lon
0,ACB,BLK 270/271 ALBERT CENTRE BASEMENT CAR PARK,30314.7936,31490.4942,BASEMENT CAR PARK,ELECTRONIC PARKING,WHOLE DAY,NO,YES,1,1.80,Y,1.301063,103.854118
1,ACM,BLK 98A ALJUNIED CRESCENT,33758.4143,33695.5198,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,5,2.10,N,1.321004,103.885061
2,AH1,BLK 101 JALAN DUSUN,29257.7203,34500.3599,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,0.00,N,1.328283,103.844620
3,AK19,BLOCK 253 ANG MO KIO STREET 21,28185.4359,39012.6664,SURFACE CAR PARK,COUPON PARKING,7AM-7PM,NO,NO,0,0.00,N,1.369091,103.834985
4,AK31,BLK 302/348 ANG MO KIO ST 31,29482.0290,38684.1754,SURFACE CAR PARK,COUPON PARKING,NO,NO,NO,0,0.00,N,1.366120,103.846636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2177,Y77M,BLK 461 YISHUN AVENUE 6,29850.1522,45576.0125,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,10,2.15,N,1.428448,103.849944
2178,Y78M,BLK 468 YISHUN ST 43,30057.2209,45166.4820,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,4,2.15,N,1.424744,103.851805
2179,Y8,"BLK 731/746 YISHUN STREET 71,72/AVENUE 5",27772.9219,45686.2734,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,4.50,N,1.429445,103.831278
2180,Y82M,BLK 478 YISHUN ST 42,29935.5818,45679.7181,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,11,2.15,N,1.429386,103.850712


In [36]:
carpark_info.to_csv('carpark_info_latlon.csv', index=False)

In [37]:
carpark_info_latlon = pd.read_csv("carpark_info_latlon.csv")
carpark_info_latlon

Unnamed: 0,car_park_no,address,x_coord,y_coord,car_park_type,type_of_parking_system,short_term_parking,free_parking,night_parking,car_park_decks,gantry_height,car_park_basement,lat,lon
0,ACB,BLK 270/271 ALBERT CENTRE BASEMENT CAR PARK,30314.7936,31490.4942,BASEMENT CAR PARK,ELECTRONIC PARKING,WHOLE DAY,NO,YES,1,1.80,Y,1.301063,103.854118
1,ACM,BLK 98A ALJUNIED CRESCENT,33758.4143,33695.5198,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,5,2.10,N,1.321004,103.885061
2,AH1,BLK 101 JALAN DUSUN,29257.7203,34500.3599,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,0.00,N,1.328283,103.844620
3,AK19,BLOCK 253 ANG MO KIO STREET 21,28185.4359,39012.6664,SURFACE CAR PARK,COUPON PARKING,7AM-7PM,NO,NO,0,0.00,N,1.369091,103.834985
4,AK31,BLK 302/348 ANG MO KIO ST 31,29482.0290,38684.1754,SURFACE CAR PARK,COUPON PARKING,NO,NO,NO,0,0.00,N,1.366120,103.846636
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2177,Y77M,BLK 461 YISHUN AVENUE 6,29850.1522,45576.0125,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,10,2.15,N,1.428448,103.849944
2178,Y78M,BLK 468 YISHUN ST 43,30057.2209,45166.4820,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,4,2.15,N,1.424744,103.851805
2179,Y8,"BLK 731/746 YISHUN STREET 71,72/AVENUE 5",27772.9219,45686.2734,SURFACE CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,0,4.50,N,1.429445,103.831278
2180,Y82M,BLK 478 YISHUN ST 42,29935.5818,45679.7181,MULTI-STOREY CAR PARK,ELECTRONIC PARKING,WHOLE DAY,SUN & PH FR 7AM-10.30PM,YES,11,2.15,N,1.429386,103.850712


In [38]:
carpark_info_latlon.groupby(['short_term_parking']).groups.keys()

dict_keys(['7AM-10.30PM', '7AM-7PM', 'NO', 'WHOLE DAY'])

In [None]:
## Calculating distance based on geo-coordinates


*Q3.2 Based on the insights derived from the analysis, suggest a practical action that can be
taken (i.e., an action that can be taken to benefit society. Do not suggest actions such as
hyperparameter tuning here).*