# Updating rental demand 
This notebook uses the datasets previously created to determine bike rentals for each station, for every time unit (i.e. six *4 hour* units). The dataset is being created only for the month of January 2024.

## Loading the datasets

In [1]:
# Importing the datasets
import pandas as pd
rentals = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Code/rentals_without_demand.csv")
tripdata = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Code/Tripdata_with_manual_ids.csv")

In [2]:
# Overview of the data
rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 00:00:00.000
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 04:00:00.000
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000


In [3]:
# Overview of the data
tripdata.head()

Unnamed: 0,ride_id,started_at,start_station_name,station_id,start_lat,start_lng
0,FD1815E65D61FAF1,2024-01-07 08:42:55.647,1 Ave & E 110 St,0,40.792327,-73.9383
1,C81EA1D1AF08E764,2024-01-05 16:47:51.873,1 Ave & E 110 St,0,40.792327,-73.9383
2,B68072A759AEBC11,2024-01-28 21:16:58.578,1 Ave & E 110 St,0,40.792327,-73.9383
3,40253EB730335B64,2024-01-14 06:21:56.489,1 Ave & E 110 St,0,40.792327,-73.9383
4,F3788085AE49CFD5,2024-01-10 17:09:08.398,1 Ave & E 110 St,0,40.792327,-73.9383


In [4]:
# adding a rentals column 
rentals['#_rentals'] = 0
rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 00:00:00.000,0
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 04:00:00.000,0
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000,0
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000,0
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000,0


In [5]:
# demand calculation for 31 Dec'23 (00-04 hrs)
tripdata[(tripdata["started_at"] < '2023-12-31 04:00:00.000') & (tripdata["started_at"] > '2023-12-31 00:00:00.000')]

Unnamed: 0,ride_id,started_at,start_station_name,station_id,start_lat,start_lng
160638,5594D8DDA68F5BB9,2023-12-31 02:36:55.648,Broadway & W 36 St,656,40.750977,-73.987654


There is only 1 rental for the 1st time unit for 31 Dec 2023. The rental is only made for <u>Broadway & W 36 St</u> station.

## Adding datetime attributes
To update demand (i.e. rental information), I will need to add date and time information to both the *trip dataset* as well as the *rentals* dataset.

In [66]:
pd.to_datetime('2023-12-31 23:59:59.999').time().microsecond

999000

In [67]:
# Updating the rentals dataset
rentals['year'] = pd.Series([x.date().year for x in pd.to_datetime(rentals["datetime"])])
rentals['month'] = pd.Series([x.date().month for x in pd.to_datetime(rentals["datetime"])])
rentals['day'] = pd.Series([x.date().day for x in pd.to_datetime(rentals["datetime"])])

rentals['hour'] = pd.Series([x.time().hour for x in pd.to_datetime(rentals["datetime"])])
rentals['minute'] = pd.Series([x.time().minute for x in pd.to_datetime(rentals["datetime"])])
rentals['second'] = pd.Series([x.time().second for x in pd.to_datetime(rentals["datetime"])])
rentals['microsecond'] = pd.Series([x.time().microsecond for x in pd.to_datetime(rentals["datetime"])])

In [69]:
# Updating the trips dataset
tripdata['started_at_year'] = pd.Series([x.date().year for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_month'] = pd.Series([x.date().month for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_day'] = pd.Series([x.date().day for x in pd.to_datetime(tripdata["started_at"])])

tripdata['started_at_hour'] = pd.Series([x.time().hour for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_minute'] = pd.Series([x.time().minute for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_second'] = pd.Series([x.time().second for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_microsecond'] = pd.Series([x.time().microsecond for x in pd.to_datetime(tripdata["started_at"])])

In [68]:
# rentals updated
rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals,year,month,day,hour,minute,second,microsecond
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 00:00:00.000,0,2023,12,31,0,0,0,0
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 04:00:00.000,0,2023,12,31,4,0,0,0
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000,0,2023,12,31,8,0,0,0
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000,0,2023,12,31,12,0,0,0
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000,0,2023,12,31,16,0,0,0


In [70]:
# tripdata updated
tripdata.head()

Unnamed: 0,ride_id,started_at,start_station_name,station_id,start_lat,start_lng,started_at_year,started_at_month,started_at_day,started_at_hour,started_at_minute,started_at_second,started_at_microsecond
0,FD1815E65D61FAF1,2024-01-07 08:42:55.647,1 Ave & E 110 St,0,40.792327,-73.9383,2024,1,7,8,42,55,647000
1,C81EA1D1AF08E764,2024-01-05 16:47:51.873,1 Ave & E 110 St,0,40.792327,-73.9383,2024,1,5,16,47,51,873000
2,B68072A759AEBC11,2024-01-28 21:16:58.578,1 Ave & E 110 St,0,40.792327,-73.9383,2024,1,28,21,16,58,578000
3,40253EB730335B64,2024-01-14 06:21:56.489,1 Ave & E 110 St,0,40.792327,-73.9383,2024,1,14,6,21,56,489000
4,F3788085AE49CFD5,2024-01-10 17:09:08.398,1 Ave & E 110 St,0,40.792327,-73.9383,2024,1,10,17,9,8,398000


## Calculating rentals
With date and time attributes, determining rentals should be doable.

In [79]:
# This is how to modify individual elements!
rentals.loc[(rentals["station_id"] == 656) & (rentals["year"] == 2023) & (rentals["hour"] == 0),"#_rentals"] = 0

In [82]:
# determining demand for 2023; 00-04hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 0)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 4) & (tripdata["station_id"] == i)])

In [83]:
# determining demand for 2023; 04-08hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 4)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 8) & (tripdata["started_at_hour"] >= 4) & (tripdata["station_id"] == i)])

In [84]:
# determining demand for 2023; 08-12hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 8)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 12) & (tripdata["started_at_hour"] >= 8) & (tripdata["station_id"] == i)])

In [85]:
# determining demand for 2023; 12-16hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 12)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 16) & (tripdata["started_at_hour"] >= 12) & (tripdata["station_id"] == i)])

In [86]:
# determining demand for 2023; 16-20hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 16)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 20) & (tripdata["started_at_hour"] >= 16) & (tripdata["station_id"] == i)])

In [87]:
# determining demand for 2023; 20-00hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 20)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] >= 20) & (tripdata["station_id"] == i)])

In [94]:
# verifying rental information for 31 Dec 2023
sum(rentals.loc[(rentals["year"] == 2023) & (rentals["#_rentals"] > 0),"#_rentals"])

128

In [89]:
# the rental demand is correct!
len(tripdata[(tripdata["started_at_year"] == 2023)])

128

> The rentals have been successfully determined for 2023. There was only one day included in this year i.e. *31 Dec* so calculating rentals was not difficult here.

In [96]:
# For 2024, data is only present for January
max(rentals.loc[rentals["year"] == 2024,"month"])

1

In [100]:
# Updating for 2024
for k in range(1, 32):
    
    # determining demand for 2024; 00-04hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 0) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] < 4) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 04-08hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 4) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 4) & (tripdata["started_at_hour"] < 8) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 08-12hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 8) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 8) & (tripdata["started_at_hour"] < 12) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 12-16hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 12) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 12) & (tripdata["started_at_hour"] < 16) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 16-20hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 16) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 16) & (tripdata["started_at_hour"] < 20) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 20-24hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 20) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 20) & (tripdata["station_id"] == i)])

In [103]:
# total trips on jan 1 2024
len(tripdata[(tripdata["started_at_year"] == 2024)&(tripdata["started_at_day"] == 1)])

16083

In [107]:
# rentals for jan 1 2024 is correct!
sum(rentals.loc[(rentals["year"] == 2024) & (rentals["day"] == 1),"#_rentals"])

16083

## Exporting the dataset
The rentals dataset with complete demand information is fully prepared and can be used for further analysis.

In [None]:
# exporting
rentals.to_csv("rentals_with_demand.csv", index=False)

## Demand patterns by time
This section is meant to explore which parts of the day have higher rental demand. This is being done to cut out some parts of time units where rental demand is consistently low throught the month.

In [1]:
# importing
import pandas as pd
rentals = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Code/rentals_with_demand.csv")
rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals,year,month,day,hour,minute,second,microsecond
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 00:00:00.000,0,2023,12,31,0,0,0,0
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 04:00:00.000,0,2023,12,31,4,0,0,0
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000,0,2023,12,31,8,0,0,0
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000,0,2023,12,31,12,0,0,0
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000,0,2023,12,31,16,0,0,0


Since minute, second and microsecond are not required they're being removed.

In [2]:
# dropping columns
rentals.drop(columns = ['minute','second','microsecond'], inplace=True)
rentals.columns

Index(['name', 'lat', 'lng', 'station_id', 'datetime', '#_rentals', 'year',
       'month', 'day', 'hour'],
      dtype='object')

In [5]:
# available time units
rentals["hour"].unique()

array([ 0,  4,  8, 12, 16, 20], dtype=int64)

In [9]:
# Average number of rentals
import statistics as st
means = []
for i in rentals["hour"].unique():
    means.append(st.mean(rentals.loc[rentals["hour"] == i, "#_rentals"]))

means

[0.17681850282485875,
 0.7471162900188324,
 2.487582391713748,
 2.5294991760828625,
 2.981844397363465,
 0.9870821563088512]

The average is taken for all days and all stations.

In [12]:
# The data spread
rentals[rentals["hour"] == 8]

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals,year,month,day,hour
2,1 Ave & E 110 St,40.792327,-73.938300,0,2023-12-31 08:00:00.000,0,2023,12,31,8
8,1 Ave & E 110 St,40.792327,-73.938300,0,2024-01-01 08:00:00.000,0,2024,1,1,8
14,1 Ave & E 110 St,40.792327,-73.938300,0,2024-01-02 08:00:00.000,3,2024,1,2,8
20,1 Ave & E 110 St,40.792327,-73.938300,0,2024-01-03 08:00:00.000,4,2024,1,3,8
26,1 Ave & E 110 St,40.792327,-73.938300,0,2024-01-04 08:00:00.000,5,2024,1,4,8
...,...,...,...,...,...,...,...,...,...,...
407780,Yankee Ferry Terminal,40.687066,-74.016756,2123,2024-01-27 08:00:00.000,0,2024,1,27,8
407786,Yankee Ferry Terminal,40.687066,-74.016756,2123,2024-01-28 08:00:00.000,0,2024,1,28,8
407792,Yankee Ferry Terminal,40.687066,-74.016756,2123,2024-01-29 08:00:00.000,0,2024,1,29,8
407798,Yankee Ferry Terminal,40.687066,-74.016756,2123,2024-01-30 08:00:00.000,0,2024,1,30,8


From the means, it can be seen that most of the traffic is coming from the 3 following time units:
- 8 to 12
- 12 to 16
- 16 to 20

Since the rental volume is also somewhat cosniderable from *20 to 24*, bike traffic from *20 to 22* should also be considered since this time window is likely to contain most of the rentals from the time unit: <u>20 to 24</u>.
<br><br>
Hence, the time-frame per day that should be taken into account to accomodate majority of the rentals ranges from **8 AM - 10 PM**.

## Proposed New Time Units
A new dataset calculating rentals can be prepared with the following 6 time units, each of *two hour* gaps:
- 8 - 10
- 10 - 12
- 12 - 14
- 14 - 16
- 16 - 20
- 20 - 22