<a href="https://colab.research.google.com/github/akkka10/bike-rental-analysis/blob/main/bike_rental_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
Bike Sharing Dataset Analytics
Data :bike-sharing dataset by UCI machine learning repository
author:Akshit


"""


In [None]:
import warnings
warnings.simplefilter("ignore")

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error,mean_squared_error


In [None]:
"""
Importing essential libraries (tensorflow)
"""
import tensorflow as tf
from tensorflow.keras import layers

In [None]:
!pip install skillsnetwork
import skillsnetwork
await skillsnetwork.prepare("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML311-Coursera/labs/Module1/L2/data/Bike-Sharing-Dataset.zip",overwrite=True)

### Bike sharing dataset

We will be using the bike-sharing dataset from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML311Coursera747-2022-01-01). It contains the following features:

> - instant: record index
> - dteday : date
> - season : season (1:winter, 2:spring, 3:summer, 4:fall)
> - yr : year (0: 2011, 1:2012)
> - mnth : month ( 1 to 12)
> - hr : hour (0 to 23)
> - holiday : weather day is holiday or not (extracted from [Web Link](https://dchr.dc.gov/page/holiday-schedules?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkML311Coursera747-2022-01-01))
> - weekday : day of the week
> - workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
> - weathersit :(1) Clear, Few clouds, Partly cloudy, Partly cloudy, (2) Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist, (3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds, (4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
> - temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
> - atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
> hum: Normalized humidity. The values are divided to 100 (max)
> windspeed: Normalized wind speed. The values are divided to 67 (max)

> casual: count of casual users

> registered: count of registered users
> cnt: count of total rental bikes including both casual and registered


In [None]:
raw_data = pd.read_csv('day.csv')

In [None]:
"""
Lets clean the Data
"""
raw_data.sample(5)

In [None]:
raw_data.info()

In [None]:
"""
Drop unnecesary Columns
dteday -> Not required
instant -> Id not required
registered & casual -> To prevent data lekage

"""
raw_data=raw_data.drop(columns=['dteday','instant','registered','casual'])

In [None]:
"""
Renaming columns
cnt-> total_rentals
mnth-> month
hum ->humidity
"""
raw_data.rename(columns={'cnt':'total_rentals','mnt':'month','hum':'humidity'},inplace=True)

In [None]:
raw_data.info()

In [None]:
df=raw_data

In [None]:
"""
Data Explorations

uni-variate analysis
"""
sns.boxplot(y='total_rentals',data=df)

In [None]:
"""
Bi variate analysis
"""
sns.pairplot(data=df)

In [None]:
df.head()

In [None]:
"""
ploting regression plot
"""
col=['temp','atemp','windspeed','humidity']
plt.figure(figsize=(20,12))
plt.style.use('ggplot')

for x in enumerate(col):
  plt.subplot(2,2,x[0]+1)
  sns.regplot(data=df,x=x[1],y='total_rentals',line_kws={"color":"red"}
              ,scatter_kws={"color":"green"})


In [None]:
"""
So we can see there are positive correlation between temp and atemp with total_rentals.
and there is negative correlations between total_rentals and humidity, windspeed
"""

"""
Lets analysis the categorial data
"""
col=["season","yr","mnth","holiday","weekday","workingday","weathersit"]
plt.figure(figsize=(20,12))
for x in enumerate(col):
  plt.subplot(2,4,x[0]+1)
  sns.boxplot(data=df,x=x[1],y='total_rentals')





In [None]:
"""
The Demand for Bike rentals has highest median in fall while lowest in spring.
Stong growth in a year.
Rentals increase from jan-> peark around september
and falls after october
Non-holidays has high median
"""


In [None]:
"""
Use Heatmaps for Correlations
"""
plt.figure(figsize=(15,9))
fig=sns.heatmap(
    data=df.corr(),
    cmap='plasma',
    annot=True,
    annot_kws={'fontsize':12},
    fmt='0.2f',
    linecolor='black',
    linewidths=0.6

)

In [None]:
"""
we can see temp and atemp is highly correlated
"""
df=df.drop(columns=['atemp'])

In [None]:
"""
Spliting the training and testing set
"""
x=df.drop(columns=['total_rentals'])
y=df['total_rentals']
X_train,X_test,y_train,y_test=train_test_split(
    x,
    y,
    test_size=0.2,
    random_state=42
)
X_train

In [None]:
"""
Let's Create a linear Model
Normalized our data
"""
normalizer=tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(X_train))

In [None]:
# print mean and variance
print(normalizer.mean.numpy())
print(normalizer.variance.numpy())

In [None]:
model=tf.keras.Sequential([normalizer,tf.keras.layers.Dense(units=1)])


In [None]:
df.columns.tolist()

In [None]:
"""
Comiple the model
"""
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.004),
              loss='mse',
              metrics=['mae'])


In [None]:
"""
Fitting the model
"""
run=model.fit(X_train,y_train,epochs=200,validation_split=0.2)

In [None]:
"""
Evalulate the result
"""
model.evaluate(X_test,y_test)

In [None]:
"""
lets plot loss
"""
fig,ax=plt.subplots()
ax.plot(run.history["loss"],'k',marker='.',label="training loss")
ax.plot(run.history["val_loss"],'r',marker='.',label="validation loss")
ax.legend()
plt.show()

In [None]:
"""
lets predict
"""
y_predict=model.predict(X_test)
y_predict=y_predict.flatten()

In [None]:
plt.figure(figsize=(8,6))
sns.regplot(x=y_test,y= y_predict,scatter_kws={'alpha':0.7},line_kws={'color':'red'})
plt.xlabel("Actual Rentals")
plt.ylabel("Predicted Rentals")
plt.title("Actual vs Predicted Rentals")
plt.show()