<img src='https://weclouddata.com/wp-content/uploads/2016/11/logo.png' width='30%'>
-------------

<h3 align='center'> Applied Machine Learning Course - Assignment Week 1 </h3>
<h1 align='center'> Bike Rental Prediction </h1>

<br>
<center align="left"> Developed by:</center>
<center align="left"> WeCloudData Academy </center>


<h2>Background</h2>

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.

The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.

You are provided daily rental data spanning two years. For this competition, the training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each day covered by the test set, using only information available prior to the rental period.

> We will be using the `train.csv` for this assignment.

<h2>Data Description</h2>

<b>Features:</b>

- datetime - hourly date + timestamp
- season -  1 = spring, 2 = summer, 3 = fall, 4 = winter
- holiday - whether the day is considered a holiday
- workingday - whether the day is neither a weekend nor holiday
- weather  
  - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
  - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 
  - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 
  - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog 

- temp - temperature in Celsius
- atemp - "feels like" temperature in Celsius
- humidity - relative humidity
- windspeed - wind speed

<b>Features should not be used:</b>

- casual - number of non-registered user rentals initiated (Not Provided in Test Data)
- registered - number of registered user rentals initiated (Not Provided in Test Data)

<b>Target Value:</b>

- count - number of total rentals

## $\Omega$ 1: Explore the Training Data

- Step 1: Import two libraries: 
  - 'pandas', 
  - 'numpy'
  - 'matplotlib'(used for data visualization)

- Step 2: Load the training data `train.csv` into a Dataframe named 'data'.

- Step 3: Explore the dataframe

In [0]:
#Step 1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

In [0]:
#Step 2
data = pd.read_csv('')

In [0]:
#Step 3


## $\Omega$ 2: Get Dummy Variables

- <b>Step 1: Drop the `casual` and `registered` columns.</b>

In [0]:
# Step 1


- <b>Step 2: Covert the 'datetime' feature to Year, Month, and Hour</b>
  - Date attribute cannot be taken as a numerical feature in a regression analysis. We need to convert the 'datetime' column into three columns: Year, Month, and hour.
  - example: 2011-01-01 02:00:00 
    - Year: 2011, 
    - Month: 1, 
    - Hour: 2

 - After add those three columns, drop the 'datetime' column.

In [0]:
# Step 2


- <b>Step 3: Get a list of all categorical variables</b>

  - Use data.columns to check what columns in our `data` now.

  - Now that in our data we have both numerical features and categorical features, we need to convert categorical features to Numerical features.

  - Some of the features are already 0,1 (binary) style, such as `holiday`. So we don't need to convert them to dummy variables.

  - In step 3, please identify Categorical features (those need to be converted) in the data and save all their column names to a list.
    - Example: `categorical_features = ['season', ...]`

In [0]:
# Step 3
data.columns

NameError: name 'data' is not defined

In [0]:
categorical_features = []

- <b>Step 4: Converting categorical features to numerical </b>
  - Convert these categorical_features to dummies
  - Hint:
    - `data = pd.get_dummies(data,columns=categorical_features,drop_first=True)`
    - Guess what 'drop_first=True' means here and why we need it.

In [0]:
# Step 4


- <b>Step 5: Count how many columns in the Dataset</b>

In [0]:
# Step 5


## $\Omega$ 3: Prepare Training and Testing Data

- <b>Step 1</b>

  - According to the Data Information, our Target Value is data['count'].
  - Input Features should include all other features **except**: data['count'], data['casual'], data['registered']
  - So we need to set y = data['count'] and X include other part of the dataframe except those three columns we just mentioned.

In [0]:
# Step 1



- <b>Step 2</b>

  - Use the 'train_test_split' function in scikit learn to split X and y into 80% Traning data and 20% Testing Data
  
  - Hint: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [0]:
# Step 2



- <b>Step 3</b>

  - Perform feature standardization on `X_train` by using sklearn's `StandardScaler`, and use the same standardizer to standardize `X_test`.
  - Hint: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

In [0]:
# Step 3




## $\Omega$ 4: Linear Regression </h3>

- <b>Step 1</b>

  - Import the linear_model from scikit learn and mean_squared_error to evaluate the result of the regression model 
  - Create a Linear Regression model - 'lr' and fit X_train and y_train to train it.
  - Hint: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html


In [0]:
#Step 1


- <b>Step 2</b>
  - Use lr.predict on X_test to get predicted value of y and call mean_squared_error(Y_test, y_predict) to get the mean squre error(MSE).
  - Call numpy's function to calculate the squre root of MSE(RMSE).

In [0]:
# Step 2


## $\Omega$ Problem 5 (Advanced): Implement gradient descent on linear regression</h3>

- <b>Step 1</b>

  - Under the hood, Scikit-learn does not use gradient descent to fit LinearRegression; rather, it uses an approximation to the exact analytical solution we have seen using calculus.

  - Therefore, it is a chanllenge for you to implement gradient descent on linear regession.

  - In this step, we initialize a weight/coefficient vector $\theta$ randomly. This vector should have the same dimensionality of the features in the training data, i.e., one coefficient for one single feature. 

In [0]:
# Step 1
def initialize_theta(dim):
    # TODO: randomly initialize coefficient to be a dim-sized 1D vector
    
    
    

- <b>Step 2</b>

  - Calculate the gradient of the linear regresion function $h_\theta(x)$ against each $\theta_i$. Hint: we have talked about this gradient in the lecture.

In [0]:
# Step 2: implement this function
def prediction(X, current_theta):
    # TODO: compute the current estimation of the output, H, given the current_theta and X
    
def loss(X, y, current_theta):
    # TODO: compute loss function J, given H, and y
    
def loss_gradient(X, y, current_theta):
    # TODO: implement the loss gradient
    # compute the current estimation of the output, H, given the current_theta and X
    # compute gradient of the loss function against current_theta
    

- <b>Step 3</b>

  - Define a learning rate $\alpha$, which would be the step size to update each $\theta_i$, $\theta_i=\theta_i - \alpha \times loss\_gradient_i$. Repeatedly updating all $\theta_i$'s until the loss converges.

In [0]:
# Step 3:

def update_theta(gradient, current_theta, step_size):
    # TODO: implement theta update logic
    

In [0]:
# We put the skeleton code here for you

# initialization:
# convert pandas dataframe into numpy arrays
X_train = np.array(X_train)
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)

precision = 0.001
step_size = 0.1 # use your own step_size
current_theta = initialize_theta(dim=) # you need to determine the input value to `dim`
current_loss = loss(X_train, y_train, current_theta)
losses = [current_loss]

In [0]:
# main graident descent loop
while len(losses) < 2 or abs(losses[-1] - losses[-2]) > precision: # all some other convergence condition
    gradient = loss_gradient(X_train, y_train, current_theta)
    current_theta = update_theta(gradient, current_theta, step_size)
    
    # compute current loss
    current_loss = loss(X_train, y_train, current_theta)
    losses.append(current_loss)
    
# once converged, current_theta are therefore the coefficients in the linear regression model  
print(f'converge after {len(losses)} iterations')

In [0]:
# plot loss against number of iterations



- <b>Step 4</b>

  - Calculate the RMSE of your trained model on test data

In [0]:
# Step 4
