# Seoul Hourly Bike Sharing Prediction 

Whether you are looking for a short route nearby Seoul, or an end to end South Korea country-wide path, South Korea has both of them. South Korea has built massive cycling infrastructure in the past few decades, including a network of cross-country bike paths. Due to the high demand for bikes as a mode of transportation, Seoul City started a bike share program in 2015. 

There are more than a million trips per month from the program. Around the city, there are more than 1500 bike stations with about 20,000 bikes in operation according to 2021 data collected by Rutgers University. 

This project attempts to make predictions on what the count of hourly bike sharing demand will be for Seol city given features such as weather data and holiday information. The data was obtained from the UCI Machine Learning Repository. It has also been cited in a paper by Sathishkumar et. al. 

## Importing Dataset

In [1]:
import pandas as pd
import numpy as np

In [2]:
bike_df = pd.read_csv("seoulbikedata.csv", encoding="unicode_escape")
bike_df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


## Data Preprocessing

In [3]:
# remove the parenthesis in the features
bike_df.columns = bike_df.columns.str.replace(r"\([^)]+\)", "", regex=True)
bike_df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature,Humidity,Wind speed,Visibility,Dew point temperature,Solar Radiation,Rainfall,Snowfall,Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


In [4]:
# convert the column names to lower case and replace spaces with _
bike_df.columns = bike_df.columns.str.lower()
bike_df.columns = bike_df.columns.str.rstrip(" ")
bike_df.columns = bike_df.columns.str.replace(" ", "_")
bike_df.head()

Unnamed: 0,date,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,dew_point_temperature,solar_radiation,rainfall,snowfall,seasons,holiday,functioning_day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


In [5]:
# convert date to datetime object
bike_df.date = pd.to_datetime(bike_df.date)
#bike_df.info()

  bike_df.date = pd.to_datetime(bike_df.date)


In [6]:
# convert all the string values to lower case and replace space
strings = list(bike_df.dtypes[bike_df.dtypes == "object"].index)
for col in strings:
    bike_df[col] = bike_df[col].str.lower().str.replace(" ", "_")
bike_df.head()

Unnamed: 0,date,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,dew_point_temperature,solar_radiation,rainfall,snowfall,seasons,holiday,functioning_day
0,2017-01-12,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes
1,2017-01-12,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes
2,2017-01-12,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,winter,no_holiday,yes
3,2017-01-12,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes
4,2017-01-12,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,winter,no_holiday,yes


In [8]:
# extract year, month, weekday and dayofyear from date column
bike_df['year'] = bike_df.date.dt.year
bike_df['month'] = bike_df.date.dt.month
bike_df['weekday'] = bike_df.date.dt.weekday
bike_df['daysofyear'] = bike_df.date.dt.dayofyear
bike_df.head() 

Unnamed: 0,date,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,dew_point_temperature,solar_radiation,rainfall,snowfall,seasons,holiday,functioning_day,year,month,weekday,daysofyear
0,2017-01-12,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,3,12
1,2017-01-12,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,3,12
2,2017-01-12,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,3,12
3,2017-01-12,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,3,12
4,2017-01-12,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,3,12


In [9]:
# map the weekdays to actual strings
weekdays_dict = {
    0: "monday", 
    1: "tuesday", 
    2: "wednesday", 
    3: "thursday",
    4: "friday", 
    5: "saturday",
    6: "sunday"
}
bike_df.weekday = bike_df.weekday.map(weekdays_dict)
bike_df.head()

Unnamed: 0,date,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,dew_point_temperature,solar_radiation,rainfall,snowfall,seasons,holiday,functioning_day,year,month,weekday,daysofyear
0,2017-01-12,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
1,2017-01-12,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
2,2017-01-12,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
3,2017-01-12,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
4,2017-01-12,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12


In [10]:
# remove the date columns
bike_df.drop("date", axis=1, inplace=True)
bike_df.head()

Unnamed: 0,rented_bike_count,hour,temperature,humidity,wind_speed,visibility,dew_point_temperature,solar_radiation,rainfall,snowfall,seasons,holiday,functioning_day,year,month,weekday,daysofyear
0,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
1,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
2,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
3,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12
4,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,winter,no_holiday,yes,2017,1,thursday,12


In [11]:
# check for nulls and missing values
bike_df.isnull().sum()

rented_bike_count        0
hour                     0
temperature              0
humidity                 0
wind_speed               0
visibility               0
dew_point_temperature    0
solar_radiation          0
rainfall                 0
snowfall                 0
seasons                  0
holiday                  0
functioning_day          0
year                     0
month                    0
weekday                  0
daysofyear               0
dtype: int64

There are no null values in the data set. As it is, it is set for exploratory data analysis. 

## Exploratory Data Analysis (EDA)

## References
1. Seoul Bike Sharing Dataset UCI page: https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand

2. Cited papers: Sathishkumar V E, Jangwoo Park, and Yongyun Cho. 'Using data mining techniques for bike sharing demand prediction in metropolitan city.' Computer Communications, Vol.153, pp.353-366, March, 2020