# Capstone Project: Grab Challenge(Traffic Management)

# 1. Introduction

I'd like to introduce you to Joe. Joe is your daily commuter. He gets up at 6 and leaves at 6:30 for his 8 am job. An hour and a half seems like ample time for his commute. However, he encounters this:

<img src="https://s3.amazonaws.com/carmudi-blogs/carmudi-ph/wp-content/uploads/2018/12/31192909/117.jpg">

# 2. Problem statement and hypothesis


### Big Picture
The big picture here is that the Philippines loses about Php 3.5 billion a day due to traffic congestion in Metro Manila. If we continue to do nothing, this will increase to Php 5.4 billion in 2035.

Of course traffic is a multi-faceted problem that cannot be solved immediately. However, one of the first steps in alleviating traffic congestion is to understand travel patterns within the city.

With this said, can we use Grab's historical travel demand data to observe patterns in the city.

### Specific Problem
- Questions?
  - Why do we can't to predict demand?
  - What is Grab's business model

- Situation
  - Grab is a ride-hailing company
  - Business model is 20% of the fare goes to them

- Complication
  - They want to maximize revenue (by knowing which locations have high demand to match the number of drivers there)
  - From personal experience, high demand areas sometimes do not have enough cars

- Question
  - What are the locations that have high demand at a specific time

- Resolution
  - Using Grab's data on travel demand, we may be able to observe travel patterns within the city to know which locations have high demand at a specific time

- Call to Action
  - Knowing which locations have high demand, Grab would be able to find a way to deploy more vehicles in that location to maximize revenue (moving vehicles from low demand areas to high demand ones)

- Benefits: This would be beneficial to the following:
  - Government
    - They can observe which areas are in need of infrastructural development to ease traffic congestion
    - Allow them to create or mandate rules and laws, such as vehicle coding
    - Deploy more MMDA to make traffic management more bearable
  - Community
    - Knowing travel patterns would allow people to forsee travel time so they could avoid being late


# 3. Description of your data set and how it was obtained

I used the data set in the Grab Challenge on Traffic Management.

<img src="https://vectorlogo4u.com/wp-content/uploads/2018/09/grqab-vector-logo-720x340.png">
<img src="files/ye.png">




<h3> GOAL of GRAB CHALLENGE</h3>

"In this challenge, participants are to build a model trained on a historical demand dataset, that can forecast demand on a Hold-out test dataset. The model should be able to accurately forecast ahead by T+1 to T+5 time intervals (where each interval is 15-min) given all data up to time T."

# 4. Description of any pre-processing steps you took (Data Preparation)

In [1]:
import pandas as pd
from datetime import datetime,timedelta

import geohash as gh
from math import sin, cos, radians, atan2,sqrt
import statsmodels.api as sm
import numpy as np

#Visualization
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import seaborn as sns
from mpl_toolkits.basemap import Basemap

import progressbar
from time import sleep

In [2]:
#Read file
df = pd.read_csv('training.csv')

In [3]:
df.dtypes

geohash6      object
day            int64
timestamp     object
demand       float64
dtype: object

In [4]:
df.shape

(4206321, 4)

In [5]:
df.describe()

Unnamed: 0,day,demand
count,4206321.0,4206321.0
mean,31.45299,0.1050907
std,17.68278,0.1592655
min,1.0,3.092217e-09
25%,16.0,0.01867379
50%,32.0,0.05043463
75%,47.0,0.1208644
max,61.0,1.0


In [6]:
df.head()

Unnamed: 0,geohash6,day,timestamp,demand
0,qp03wc,18,20:0,0.020072
1,qp03pn,10,14:30,0.024721
2,qp09sw,9,6:15,0.102821
3,qp0991,32,5:0,0.088755
4,qp090q,15,4:0,0.074468


## 4.1. Convert geohash6 to latitude and longitude

In [7]:
#Decode geohash to Latitude and Longitude
df['lat_long'] = df.geohash6.apply(lambda x: gh.decode(x))

In [8]:
df['latitude'] = df.lat_long.apply(lambda x: x[0])
df['longitude'] = df.lat_long.apply(lambda x: x[1])

In [9]:
df = df.drop(columns='lat_long')

In [10]:
df.dtypes

geohash6      object
day            int64
timestamp     object
demand       float64
latitude     float64
longitude    float64
dtype: object

## 4.2. Set arbitrary day of the week

In [11]:
#Since the data for day is in sequential order, we assign an arbitrary 
def day_week(day):
    if day%7 == 1: return "A"
    elif day%7 == 2: return "B"
    elif day%7 == 3: return "C"
    elif day%7 == 4: return "D"
    elif day%7 == 5: return "E"
    elif day%7 == 6: return "F"
    else: return "G"

df['day_of_week'] = df.day.apply(lambda x: day_week(x))

In [12]:
df.head()

Unnamed: 0,geohash6,day,timestamp,demand,latitude,longitude,day_of_week
0,qp03wc,18,20:0,0.020072,-5.353088,90.653687,D
1,qp03pn,10,14:30,0.024721,-5.413513,90.664673,C
2,qp09sw,9,6:15,0.102821,-5.325623,90.906372,B
3,qp0991,32,5:0,0.088755,-5.353088,90.752563,D
4,qp090q,15,4:0,0.074468,-5.413513,90.719604,A


## 4.3. Make 'day' and 'timestamp' a timestamp

Set arbitrary dates to the days as well

In [13]:
numdays = 61
base = datetime(2019,1,1,0,0)
date_list = [base + timedelta(days=x) for x in range(0, numdays)]

df.day = df.day.apply(lambda x: date_list[x-1].strftime("%Y-%m-%d"))

In [14]:
TS = []
for i in range(0,len(df.day)):
    x = pd.Timestamp(df.day[i] + ' ' + df.timestamp[i])
    TS.append(x)

In [16]:
df['Timestamp'] = TS

In [19]:
df = df.drop(columns = ['day','timestamp'])

In [20]:
df.head()

Unnamed: 0,geohash6,demand,latitude,longitude,day_of_week,Timestamp
0,qp03wc,0.020072,-5.353088,90.653687,D,2019-01-18 20:00:00
1,qp03pn,0.024721,-5.413513,90.664673,C,2019-01-10 14:30:00
2,qp09sw,0.102821,-5.325623,90.906372,B,2019-01-09 06:15:00
3,qp0991,0.088755,-5.353088,90.752563,D,2019-02-01 05:00:00
4,qp090q,0.074468,-5.413513,90.719604,A,2019-01-15 04:00:00


In [21]:
df.dtypes

geohash6               object
demand                float64
latitude              float64
longitude             float64
day_of_week            object
Timestamp      datetime64[ns]
dtype: object

In [22]:
df.isnull().sum()

geohash6       0
demand         0
latitude       0
longitude      0
day_of_week    0
Timestamp      0
dtype: int64

## 4.4. Save to csv file for easier access later on

In [23]:
#Save to CSV file
df.to_csv(r'C:\Users\Acer\Documents\01 Eskwelabs\Grab AI for SEA\Traffic Management\Processed_Grab.csv')