# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Mini-Project: Implementation of Linear Regression using OOPs

## Learning Objectives

At the end of the mini-project, you will be able to :

- understand the power and flexibility of the Object-oriented programming (OOP) paradigm
- build OOP based classes and methods and use them to implement Linear Regression for solving real world data related queries


## Problem Statement

Implement linear regression using classes and methods built with OOP.

## Information

#### Object oriented programming in a nutshell

Object oriented programming is based around the concept of "objects". Objects have two kinds of attributes (accessed via . syntax): data attributes (or instance variables) and function attributes (or methods). Object data is typically modified by object methods.

To know more about OOPs click [here](https://docs.python.org/3/tutorial/classes.html)

#### Linear Regression

In statistics, linear regression is a linear approach to model the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

To know more about Linear regression  click [here](http://www.mit.edu/~6.s085/notes/lecture3.pdf)


## Grading = 10 Points

#### There are total 10 exercises and 1 point for each.

##### Importing Necessary Packages

In [None]:
import numpy as np # Numpy Package
import pandas as pd # Pandas Package

#### Exercise 1: Generate 50 points with an approximate relationship of y = 3x + 1, with normally distributed errors.

**Hint:** np.linspace(), np.random.randn()

In [None]:
# YOUR CODE HERE

#### Exercise 2: Define a class named **LinearRegression** and add a short description of linear regression using built in method \_\_repr\_\_

**Hint:** [How to use \_\_repr\_\_ method](https://www.educative.io/edpresso/what-is-the-repr-method-in-python)

In [None]:
# YOUR CODE HERE

#### Exercise 3: In the above defined Linear Regression class, add a method which takes list of values as input and returns the mean of those values. 

**Note:** Don't use built-in method to calculate the mean

**Hint:** 
1. The mean is the average of the numbers
2. [How to define a method in a class](https://docs.python.org/3/tutorial/classes.html#scopes-and-namespaces-example)

In [None]:
# YOUR CODE HERE

#### Exercise 4: In the above defined Linear Regression class, add a method which takes list of values as input and returns the variance of those values.

**Note:** Don't use built-in method to calculate the variance

**Hint:** 

1. The Variance is the average of the squared differences of each datapoint from the Mean
2. [How to access one method in different method inside a class](https://docs.python.org/3/tutorial/classes.html#scopes-and-namespaces-example)


In [None]:
# YOUR CODE HERE

#### Exercise 5: In the above defined Linear Regression class, add a method which takes two values as input and returns the covariance of those values.

**Note:** Don't use built-in method to calculate the covariance

**Hint:** [How to calculate the covariance of two values](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/covariance/)

In [None]:
# YOUR CODE HERE

#### Exercise 6: In the above defined Linear Regression class, add a method named 'fit' which takes two values as input (x, y) and returns the estimated coefficients.

**Hint:**

- Equation of line : $  y = b_{0} + b_{1} * x $
- The estimated coefficients i.e. values of $b_{0}$ and $b_{1}$ are calculated as below
    - $ b_{1} = covariance(x,y) / variance(x) $ and
    - $ b_{0} = mean(y) - b_{1} * mean(x)$

In [None]:
# YOUR CODE HERE

#### Exercise 7: In the above defined Linear Regression class, add a method named predict which takes two values as input (x, y) and returns the predicted values.

**Hint:** substitute the estimated coefficients values calculated above in the equation of line i.e $  y = b_{0} + b_{1} * x $

In [None]:
# YOUR CODE HERE

#### Data

The dataset chosen for this experiment is **Pizza Franchise** dataset. The dataset contains the following data

X = annual franchise fee ($1000)

Y = start up cost ($1000) for a pizza franchise

Download the dataset [here](https://cdn.iisc.talentsprint.com/CDS/Datasets/pizza.csv)

#### Exercise 8: Using the above defined class LinearRegression, calculate the Estimated coefficients, fit the model, and predict the values on the Pizza Franchise dataset.

In [None]:
# YOUR CODE HERE

#### Exercise 9: In the above defined Linear Regression class, add a method named RMSE which takes two values as input (x, y) and returns the error value.

**Hint:**

- [How to calculate RMSE value](https://towardsdatascience.com/what-does-rmse-really-mean-806b65f2e48e)

In [None]:
# YOUR CODE HERE

#### Data

The dataset chosen for this exercise is **List Price Vs. Best Price for a New GMC Pickup** dataset. The dataset contains the following data

X = List price (in $1000) for a GMC pickup truck

Y = Best price (in $1000) for a GMC pickup truck

Download the dataset [here](https://cdn.iisc.talentsprint.com/CDS/Datasets/gmc.csv)

#### Exercise 10: Using above defined class LinearRegression, 

- calculate the Estimated coefficients, fit the model, and predict the values on the List Price Vs. Best Price for a New GMC Pickup dataset.
- calculate the RMSE error on predicted and actual values of List Price Vs. Best Price for a New GMC Pickup dataset using the function defined above.

In [None]:
# YOUR CODE HERE

### Optional

* Use the built-in `sklearn LinearRegression` package to determine the coefficients for the above problems. 
* Compare the coefficients obtained using OOP based implementation vs coefficients from `sklearn LinearRegression` package.