# Rentals: New time units
The rentals were calculated in the previous attempt for a *4 hour* window. However, a time range from 08-22 is seen as more suitable (with a *2 hour* window) as shown in the notebook *Calculating_rental_demand.ipynb*. The seven time units were proposed as follows:
- 8 - 10
- 10 - 12
- 12 - 14
- 14 - 16
- 16 - 18
- 18 - 20
- 20 - 22

To create a template for *rentals with new time resolution*, **adjusted trip information**- <u>Tripdata_with_manual_ids.csv</u> can be used, since it contains the pre-processed trip information.

## Creating rentals template
A rentals template is prepared in this part.

In [1]:
# Reading the data
import pandas as pd
complete_data = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Datasets/Citibike Trip Data/202401-citibike-tripdata.csv")

# filtering the dataset for cbikes
cbikes = complete_data[complete_data["rideable_type"] == "classic_bike"]

# there are no rows with missing info about start
print(sum(cbikes["start_station_name"].isna()))

# reducing columns
cbikes = cbikes[["ride_id", "started_at", "ended_at", "start_station_name", "start_station_id", "end_station_name", "end_station_id", "start_lat", "start_lng", "end_lat", "end_lng"]]
cbikes.columns

  complete_data = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Datasets/Citibike Trip Data/202401-citibike-tripdata.csv")


0


Index(['ride_id', 'started_at', 'ended_at', 'start_station_name',
       'start_station_id', 'end_station_name', 'end_station_id', 'start_lat',
       'start_lng', 'end_lat', 'end_lng'],
      dtype='object')

### Creating New Dataset
The new dataset should contain stations with their location and time information included. Each row represents <u> one unit of chosen time resolution</u>.
<br><br>
**NOTE:** Station ID needs to be retained because there are some stations with the same name but slightly different locations!
<br><br>
In the dataset, some IDs are stored as text and others are stored as numbers. Converting th data type to *string* does not work because **\n** and some other text is being added automatically.
<br><br>
**Solution:**<br>
The stations can be manually given unique ids for the purpose of indentification.

In [2]:
# Creating the new dataset
rentals = pd.DataFrame()
rentals[["name", "lat", "lng"]] = cbikes[["start_station_name", "start_lat", "start_lng"]].drop_duplicates()
rentals = rentals.sort_values(by = ["name"], ignore_index=True)

# Adding IDs manually
rentals["station_id"] = range(len(rentals))

# Verifying the number of stations
print(len(cbikes[["start_station_name", "start_lat", "start_lng"]].drop_duplicates()), len(rentals["station_id"]))

2124 2124


In [3]:
# How the data looks like
rentals.head()

Unnamed: 0,name,lat,lng,station_id
0,1 Ave & E 110 St,40.792327,-73.9383,0
1,1 Ave & E 16 St,40.732219,-73.981656,1
2,1 Ave & E 18 St,40.733812,-73.980544,2
3,1 Ave & E 30 St,40.741444,-73.975361,3
4,1 Ave & E 39 St,40.74714,-73.97113,4


## Defining time resolution
Here time units are defined that will later be merged with the *rentals* template.

In [4]:
# importing the data
import pandas as pd
cbikes_altered = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Code/Tripdata_with_manual_ids.csv")

# defining the 7 time units for a given date
time_res = ['08:00:00.000', '10:00:00.000', '12:00:00.000', '14:00:00.000', '16:00:00.000', '18:00:00.000', '20:00:00.000']

# repeating time units 32 times - Jan'24 + 31 Dec'23
time_res = time_res*32

# creating dates
dates_included = []
while len(dates_included) < 32:
    dates_included.append('2024-01-01')

# days list
nums = ['00', '01', '02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']

# correcting days
for i in range(len(nums)):
    dates_included[i] = dates_included[i][:-2] + nums[i]

dates_included[0] = '2023-12-31'
dates_included[:5]

['2023-12-31', '2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04']

Now, we are supposed to combine the date and the time information together. Each day has 7 time units, so each day must be repeated 7 times to accodomate the time resolution.

In [5]:
# repeating each element 7 times
dates_included = [element for element in dates_included for _ in range(7)]
dates_included[:8]

['2023-12-31',
 '2023-12-31',
 '2023-12-31',
 '2023-12-31',
 '2023-12-31',
 '2023-12-31',
 '2023-12-31',
 '2024-01-01']

In [6]:
# Adding time dimension
date_time = dates_included.copy()

for t in range(0,len(dates_included)):
    date_time[t] = dates_included[t] + ' ' + time_res[t]

date_time[:7]

['2023-12-31 08:00:00.000',
 '2023-12-31 10:00:00.000',
 '2023-12-31 12:00:00.000',
 '2023-12-31 14:00:00.000',
 '2023-12-31 16:00:00.000',
 '2023-12-31 18:00:00.000',
 '2023-12-31 20:00:00.000']

**Note**: 2023-12-31 08:00:00.000 means the time window from <u>8AM to 10AM</u>.

## Adding time dimensionality to the *rentals* template
A time resolution needs to be added so that number of rentals can be calculated in terms of that time unit, per station.

In [7]:
# rows needed for each station
len(date_time)

224

In [8]:
# duplicating each station 224 times
rentals = rentals.loc[rentals.index.repeat(224)].reset_index(drop=True)

# Repeating date_time: 2124 unique stations
date_time = date_time*2124
rentals["datetime"] = date_time
rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 10:00:00.000
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 14:00:00.000
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000


In [9]:
len(rentals)

475776

## Calculating Demand
The rentals are calculated for every time unit, per day per station. The dataset is being prepared only until 31 January 2024.

In [10]:
# importing
tripdata = pd.read_csv("C:/Users/singh/Desktop/TUD (All Semesters)/Courses - Semester 5 (TU Dresden)/Research Task - Spatial Modelling/Code/Tripdata_with_manual_ids.csv")

# adding a rentals column 
rentals['#_rentals'] = 0

rentals.head()

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000,0
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 10:00:00.000,0
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000,0
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 14:00:00.000,0
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000,0


In [11]:
# Updating the rentals dataset
rentals['year'] = pd.Series([x.date().year for x in pd.to_datetime(rentals["datetime"])])
rentals['month'] = pd.Series([x.date().month for x in pd.to_datetime(rentals["datetime"])])
rentals['day'] = pd.Series([x.date().day for x in pd.to_datetime(rentals["datetime"])])

rentals['hour'] = pd.Series([x.time().hour for x in pd.to_datetime(rentals["datetime"])])
rentals['minute'] = pd.Series([x.time().minute for x in pd.to_datetime(rentals["datetime"])])
rentals['second'] = pd.Series([x.time().second for x in pd.to_datetime(rentals["datetime"])])
rentals['microsecond'] = pd.Series([x.time().microsecond for x in pd.to_datetime(rentals["datetime"])])

In [12]:
# Updating the trips dataset
tripdata['started_at_year'] = pd.Series([x.date().year for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_month'] = pd.Series([x.date().month for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_day'] = pd.Series([x.date().day for x in pd.to_datetime(tripdata["started_at"])])

tripdata['started_at_hour'] = pd.Series([x.time().hour for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_minute'] = pd.Series([x.time().minute for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_second'] = pd.Series([x.time().second for x in pd.to_datetime(tripdata["started_at"])])
tripdata['started_at_microsecond'] = pd.Series([x.time().microsecond for x in pd.to_datetime(tripdata["started_at"])])

In [13]:
# determining demand for 2023; 08-10hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 8)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 10) & (tripdata["started_at_hour"] >= 8) & (tripdata["station_id"] == i)])
    
# determining demand for 2023; 10-12hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 10)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 12) & (tripdata["started_at_hour"] >= 10) & (tripdata["station_id"] == i)])

# determining demand for 2023; 12-14hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 12)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 14) & (tripdata["started_at_hour"] >= 12) & (tripdata["station_id"] == i)])

# determining demand for 2023; 14-16hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 14)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 16) & (tripdata["started_at_hour"] >= 14) & (tripdata["station_id"] == i)])
    
# determining demand for 2023; 16-18hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 16)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 18) & (tripdata["started_at_hour"] >= 16) & (tripdata["station_id"] == i)])
    
# determining demand for 2023; 18-20hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 18)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 20) & (tripdata["started_at_hour"] >= 18) & (tripdata["station_id"] == i)])

# determining demand for 2023; 20-22hrs
for i in rentals["station_id"].unique():
    idx = (rentals["station_id"] == i) & (rentals["year"] == 2023) & (rentals["hour"] == 20)
    rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] < 22) & (tripdata["started_at_hour"] >= 20) & (tripdata["station_id"] == i)])  

In [15]:
# verifying rental information for 31 Dec 2023
print(sum(rentals.loc[(rentals["year"] == 2023) & (rentals["#_rentals"] > 0),"#_rentals"]))

# the rental demand is correct!
print(len(tripdata[(tripdata["started_at_year"] == 2023) & (tripdata["started_at_hour"] >= 8) & (tripdata["started_at_hour"] < 22)]))

36
36


> The rentals have been successfully determined for 2023. There was only one day included in this year i.e. *31 Dec* so calculating rentals was not difficult here.

In [16]:
# Updating for 2024
for k in range(1, 32):
    
    # determining demand for 2024; 08-10hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 8) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] < 10) & (tripdata["started_at_hour"] >= 8) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 10-12hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 10) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 10) & (tripdata["started_at_hour"] < 12) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 12-14hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 12) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 12) & (tripdata["started_at_hour"] < 14) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 14-16hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 14) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 14) & (tripdata["started_at_hour"] < 16) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 16-18hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 16) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 16) & (tripdata["started_at_hour"] < 18) & (tripdata["station_id"] == i)])
        
    # determining demand for 2024; 18-20hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 18) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 18) & (tripdata["started_at_hour"] < 20) & (tripdata["station_id"] == i)])
    
    # determining demand for 2024; 20-22hrs
    for i in rentals["station_id"].unique():
        idx = (rentals["station_id"] == i) & (rentals["year"] == 2024) & (rentals["hour"] == 20) & (rentals["day"] == k)
        rentals.loc[idx, "#_rentals"] = len(tripdata[(tripdata["started_at_year"] == 2024) & (tripdata["started_at_day"] == k) & (tripdata["started_at_hour"] >= 20) & (tripdata["started_at_hour"] < 22) & (tripdata["station_id"] == i)])

In [17]:
# total trips on jan 1 2024
print(len(tripdata[(tripdata["started_at_year"] == 2024)&(tripdata["started_at_day"] == 1)&(tripdata["started_at_hour"] >= 8)&(tripdata["started_at_hour"] < 22)]))

# rentals recorded for jan 1 2024 
sum(rentals.loc[(rentals["year"] == 2024) & (rentals["day"] == 1),"#_rentals"])

13348


13348

Hence, the dataset is correct!

In [19]:
# Created dataset
rentals_final = rentals.copy()

# removing unneeded columns
rentals_final.drop(columns=['minute','second','microsecond'], inplace=True)

# final look
rentals_final.head()

Unnamed: 0,name,lat,lng,station_id,datetime,#_rentals,year,month,day,hour
0,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 08:00:00.000,0,2023,12,31,8
1,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 10:00:00.000,0,2023,12,31,10
2,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 12:00:00.000,0,2023,12,31,12
3,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 14:00:00.000,0,2023,12,31,14
4,1 Ave & E 110 St,40.792327,-73.9383,0,2023-12-31 16:00:00.000,0,2023,12,31,16


In [20]:
# exporting
rentals_final.to_csv("rentals_with_demand_new_time_units.csv", index=False)