<a href="https://colab.research.google.com/github/VaibhavBhusawale/-Bike-Sharing-Demand-Prediction/blob/main/Bike_Sharing_Demand_Prediction_Capstone_Project_(Shaloy%2C_Smriti%2C_Vaibhav).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <b><u> Project Title : Seoul Bike Sharing Demand Prediction </u></b>

## <b> Problem Description </b>

### Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.


## <b> Data Description </b>

### <b> The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.</b>


### <b>Attribute Information: </b>

* ### Date : year-month-day
* ### Rented Bike count - Count of bikes rented at each hour
* ### Hour - Hour of he day
* ### Temperature-Temperature in Celsius
* ### Humidity - %
* ### Windspeed - m/s
* ### Visibility - 10m
* ### Dew point temperature - Celsius
* ### Solar radiation - MJ/m2
* ### Rainfall - mm
* ### Snowfall - cm
* ### Seasons - Winter, Spring, Summer, Autumn
* ### Holiday - Holiday/No holiday
* ### Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)

# **Hypothesis for the problem:**

1. The number of bikes rented during rainfall/snowfall reduces.
2. The number of bikes rented increases during peak hours.
3. The number of bikes rented during weekends is less compared to weekdays.
4. The number of bikes rented on holidays is less compared to that of working days.
5. The number of bikes rented reduces if there is high humidity.
6. The number of bikes rented on days with high solar radiations is low.
7. The number of bikes rented on average in summer is higher compared to other seasons.

# **Data Warehousing:**

In [None]:
# Importing the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Importing data

path = '/content/drive/MyDrive/Bike sharing demand prediction - Shaloy Lewis/Copy of SeoulBikeData.csv'

df = pd.read_csv(path, encoding='iso-8859-1')

In [None]:
# Basic inspection

df.head()

In [None]:
df.columns

In [None]:
# updating the attribute names

df = df.rename(columns= {'Date':'date','Rented Bike Count': 'rented_bike_count', 'Hour':'hour',
                    'Temperature(°C)':'temp', 'Humidity(%)':'humidity',
                    'Wind speed (m/s)': 'wind_speed', 'Visibility (10m)': 'visibility',
                    'Dew point temperature(°C)':'dew_point_temp',
                    'Solar Radiation (MJ/m2)': 'solar_radiation', 'Rainfall(mm)': 'rainfall',
                    'Snowfall (cm)':'snowfall', 'Seasons':'seasons',
                    'Holiday':'holiday', 'Functioning Day':'func_day'})

In [None]:
df.columns

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.describe().shape

In [None]:
df.isnull().sum()

In [None]:
df['func_day'].value_counts()

In [None]:
df['seasons'].value_counts()

In [None]:
df['holiday'].value_counts()

## Visual Inspection:

In [None]:
sns.scatterplot(data= df, x='visibility', y='rented_bike_count')

In [None]:
sns.scatterplot(data= df, x='humidity', y='rented_bike_count')

In [None]:
sns.scatterplot(data= df, x='temp', y='rented_bike_count')

In [None]:
sns.scatterplot(data= df, x='wind_speed', y='rented_bike_count')

# **EDA:**

In [None]:
sns.barplot(df["seasons"],df['rented_bike_count'])

In [None]:
sns.barplot(df["holiday"],df['rented_bike_count'])

In [None]:
sns.barplot(df["func_day"],df['rented_bike_count'])

In [None]:
plt.figure(figsize=(7,7))
sns.distplot(df['rented_bike_count'], color="c")

In [None]:
plt.figure(figsize=(10,8))
sns.distplot(np.sqrt(df['rented_bike_count']), color='c')

In [None]:
sns.distplot(np.log10(df['rented_bike_count']+10),color="y")

In [None]:
numeric_columns = ['rented_bike_count', 'hour', 'temp', 'humidity', 'wind_speed',
       'visibility', 'solar_radiation', 'rainfall', 'dew_point_temp',
       'snowfall']

In [None]:
categorical_columns = ['holiday', 'func_day', 'seasons']

In [None]:
for column in numeric_columns[:]:
  if column == 'rented_bike_count':
    pass
  else:
    sns.regplot(x=df[column],y=df["rented_bike_count"],line_kws={"color": "r"})
  
  plt.show()

In [None]:
plt.figure(figsize=(15,8))
correlation = df.corr()
sns.heatmap(abs(correlation), annot=True, cmap='coolwarm')

In [None]:
plt.figure(figsize=(15,8))
correlation = independent_var.corr()
sns.heatmap(abs(correlation), annot=True, cmap='coolwarm')