# Assignment - Regression Benchmark 

### Problem Statement

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. 

Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city.

To make the process seamless, and ensure that enough bikes are available for the people, we need to predict the count of bikes required in the coming month based on the past data.

- date:       Date in "yyyy-mm-dd" format
- season:     Four categories-> 1 = spring, 2 = summer, 3 = fall, 4 = winter
- month:      Extracted from the date variable
- hour:       Hour of the day
- holiday:    whether the day is a holiday or not (1/0)
- workingday: whether the day is neither a weekend nor holiday (1/0)
- weather:    Four Categories of weather
            * 1-> Clear, Few clouds, Partly cloudy, Partly cloudy
            * 2-> Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
            * 3-> Light Snow and Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
            * 4-> Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp:       hourly temperature in Celsius
- atemp:      "feels like" temperature in Celsius
- humidity:   relative humidity
- windspeed:  wind speed


- registered: number of registered user
- casual:     number of non-registered user
- count:      number of total rentals (registered + casual)

### Importing Libraries

In [None]:
#importing libraries 

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

### Importing Dataset

In [None]:
df=pd.read_csv('Bike-Sharing-Dataset/hour.csv')
df.shape

In [None]:
df.head()

### Shuffling and Creating Train and Test Set

Task 1:

- Shuffle the dataset
- Create Train and Validation set

In [None]:
## Shuffling the Dataset
from sklearn.utils import shuffle
data = shuffle(??)

In [None]:
#creating 4 divisions
div = int(data.shape[0]/4)

# 3 parts to train set and 1 part to test set
train = data.iloc[:3*div+1]
test = data.iloc[3*div+1:]

In [None]:
train.shape, test.shape, data.shape

In [None]:
train.head()

In [None]:
test.head()

## Simple Mean (mean of count)

Task 2-

- Calculate the mean of target variable

In [None]:
# calculate mean of column
test['simple_mean'] = ??

Task 3-

- import mean absolute error from sklearn
- calculate mean absolute error

In [None]:
#calculating mean absolute error

from sklearn.metrics import __??___ as MAE
simple_mean_error = MAE(??)
simple_mean_error

## Mean count with respect to weekday

Task 4 -
- Check average count for different weekdays
- Make predictions using average wrt weekday

In [None]:
# calculating mean count based on day of week 
# Hint: use  pivot table

weekday_mean = pd.pivot_table(??, values=??, index = [??], aggfunc=??)
weekday_mean

In [None]:
# initializing new column to zero
test['weekday_mean'] = 0

# For every unique entry in weekday
for i in train[??].unique():
    
  # Assign the mean value corresponding to unique entry
  test['weekday_mean'][test[??] == i] = train[??][train[??] == i].mean()

In [None]:
#calculating mean absolute error
weekday_mean_error = MAE(test['count'] , test['weekday_mean'] )
weekday_mean_error

## Mean Count with respect to Month

Task 5-

- Print month-wise average count using pivot table
- Use month-wise average count as predictions
- Calculate the Error

In [None]:
# calculating mean count based on month
# use pivot table
month_wise_count = ??
month_wise_count

In [None]:
# initializing new column to zero
test['month_wise_mean'] = 0

# For every unique entry in month variable
for i in ??
    
  # Assign the mean value corresponding to unique entry
  test['month_wise_mean'][test['month'] == i] = ??

In [None]:
#calculating mean absolute error
month_wise_mean_error = MAE(?? , ??)
month_wise_mean_error

## Mean Count with respect to both Month workingday

In [None]:
combo = pd.pivot_table(train, values = 'count', index = ['month','workingday'], aggfunc = np.mean)
combo

Task 6-
- Predict average count based on month and workingday variables

In [None]:
# Initiating new empty column
test['combo_mean'] = 0


# For every Unique Value in month
for ??
  # For every Unique Value in workingdaya
  for ??
        
    # Calculate and Assign mean to new column, corresponding to both unique values simultaneously
    test['combo_mean'][(test['month'] == i) & (test['workingday']==j)] = ??

In [None]:
#calculating mean absolute error
combo_mean_error = MAE(test['count'] , test['combo_mean'] )
combo_mean_error