<a href="https://colab.research.google.com/github/SSDivyaRavali/CDS/blob/main/Module2/Linear_Reg_using_OOPs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Mini-Project: Implementation of Linear Regression using OOPs

**DISCLAIMER:** THIS NOTEBOOK IS PROVIDED ONLY AS A REFERENCE SOLUTION NOTEBOOK FOR THE MINI-PROJECT. THERE MAY BE OTHER POSSIBLE APPROACHES/METHODS TO ACHIEVE THE SAME RESULTS.

## Learning Objectives

At the end of the mini-project, you will be able to :

- understand the power and flexibility of the Object-oriented programming (OOP) paradigm
- build OOP based classes and methods and use them to implement Linear Regression for solving real world data related queries


## Problem Statement

Implement linear regression using classes and methods built with OOP.

## Information

#### Object oriented programming in a nutshell

Object oriented programming is based around the concept of "objects". Objects have two kinds of attributes (accessed via . sytax): data attributes (or instance variables) and function attributes (or methods). Object data is typically modified by object methods.

To know more about OOPs click [here](https://docs.python.org/3/tutorial/classes.html)

#### Linear Regression

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

To know more about Linear regression  click [here](http://www.mit.edu/~6.s085/notes/lecture3.pdf)


## Grading = 10 Points

There are total 10 exercises and 1 point for each.

##### Importing Necessary Packages

In [None]:
import numpy as np # Numpy Package
import pandas as pd # Pandas Package

#### Exercise 1: Generate 50 points with an approximate relationship of y = 3x + 1, with normally distributed errors.

**Hint:** np.linspace(), np.random.randn()

In [None]:
np.random.seed(0)
numberofPoints = 50   # Number of data points
x = np.linspace(0, 10, numberofPoints)
y = x * 3 + 1 + 1 * np.random.randn(numberofPoints) # Standard deviation 1
print(x) # Printing the x values
print(y) # Printing the y values

#### Exercise 2: Define a class named **LinearRegression** and add a short description of linear regression using built in method \_\_repr\_\_

**Hint:** [How to use \_\_repr\_\_ method](https://www.educative.io/edpresso/what-is-the-repr-method-in-python)

In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"

In [None]:
type(LinearRegression) # Checking for the type

In [None]:
help(LinearRegression) # Checking for the description of the class

In [None]:
LinearRegression() # Creating an instance of a class

#### Exercise 3: In the above defined Linear Regression class, add a method which takes list of values as input and returns the mean of those values. 

**Note:** Don't use built-in method to calculate the mean

**Hint:** 
1. The mean is the average of the numbers
2. [How to define a method in a class](https://docs.python.org/3/tutorial/classes.html#scopes-and-namespaces-example)

In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)

In [None]:
Lr = LinearRegression() # Instance of a LinearRegression class

In [None]:
x_mean = Lr.getMean(x) # Accessing getMean method from the class

In [None]:
x_mean

#### Exercise 4: In the above defined Linear Regression class, add a method which takes list of values as input and returns the variance of those values.

**Note:** Don't use built-in method to calculate the variance

**Hint:** 

1. The Variance is the average of the squared differences from the Mean
2. [How to access one method in different method inside a class](https://docs.python.org/3/tutorial/classes.html#scopes-and-namespaces-example)


In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)
    # Function to calculate the variance
    def getVariance(self,values):
        mean = self.getMean(values)
        return sum([(x-mean)**2 for x in values])

In [None]:
Lr = LinearRegression() # Instance of a class LinearRegression

In [None]:
Lr.getMean(x), Lr.getMean(y) # Accessing the getMean method

In [None]:
Lr.getVariance(x), Lr.getVariance(y) # Accessing the getVariance method

#### Exercise 5: In the above defined Linear Regression class, add a method which takes two values as input and returns the covariance of those values.

**Note:** Don't use built-in method to calculate the covariance

**Hint:** [How to calculate the covariance of two values](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/covariance/)

In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)
    # Function to calculate the variance
    def getVariance(self,values):
        mean = self.getMean(values)
        return sum([(x-mean)**2 for x in values])
    # Function to calculate the covariance
    def getcovariance(self, x, y):
        xmean = self.getMean(x) # Mean of x values
        ymean = self.getMean(y) # Mean of y values
        covar = 0.0
        for i in range(len(x)):
            covar += (x[i] - xmean) * (y[i] - ymean)
        return covar

In [None]:
Lr = LinearRegression() # Creating a instance of a class LinearRegression

In [None]:
Lr.getcovariance(x,y) # Accessing the getcovariance method 

#### Exercise 6: In the above defined Linear Regression class, add a method named 'fit' which takes two values as input (x, y) and returns the estimated coefficients.

**Hint:**

- Equation of line : $  y = b_{0} + b_{1} * x $
- The estimated coefficients i.e. values of $b_{0}$ and $b_{1}$ are calculated as below
    - $ b_{1} = covariance(x,y) / variance(x) $ and
    - $ b_{0} = mean(y) - b_{1} * mean(x)$




In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)
    # Function to calculate the variance
    def getVariance(self,values):
        mean = self.getMean(values)
        return sum([(x-mean)**2 for x in values])
    # Function to calculate the covariance
    def getcovariance(self, x, y):
        xmean = self.getMean(x)
        ymean = self.getMean(y)
        covar = 0.0
        for i in range(len(x)):
            covar += (x[i] - xmean)*(y[i] - ymean)
        return covar
    # Function to calculate the estimated coefficients
    def fit(self,x,y):
        # Calculating mean of x and y
        mean_x, mean_y = self.getMean(x), self.getMean(y)
        # Calculating covariance between x and y and finding the b1 value
        b1 = self.getcovariance(x, y) / self.getVariance(x)
        # finding b0 value
        b0 = mean_y - b1 * mean_x
        # returning the estimated coefficients
        return [b0, b1]

In [None]:
Lr = LinearRegression() # Instance of a Linear Regression class

In [None]:
Lr.fit(x,y) # Accessing the fit method defined in the LinearRegression class

#### Exercise 7: In the above defined Linear Regression class, add a method named predict which takes two values as input (x, y) and returns the predicted values.

**Hint:** substitute the estimated coefficients values calculated above in the equation of line i.e $  y = b_{0} + b_{1} * x $

In [None]:
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)
    # Function to calculate the variance
    def getVariance(self,values):
        mean = self.getMean(values)
        return sum([(x-mean)**2 for x in values])
    # Function to calculate the covariance
    def getcovariance(self, x, y):
        xmean = self.getMean(x)
        ymean = self.getMean(y)
        covar = 0.0
        for i in range(len(x)):
            covar += (x[i] - xmean)*(y[i] - ymean)
        return covar
    # Function to calculate the estimated coefficients
    def fit(self,x,y):
        # Calculating mean of x and y
        mean_x, mean_y = self.getMean(x), self.getMean(y)
        # Calculating covariance between x and y and finding the b1 value
        b1 = self.getcovariance(x, y) / self.getVariance(x)
        # finding b0 value
        b0 = mean_y - b1 * mean_x
        # returning the estimated coefficients
        return [b0, b1]
    def predict(self, x, y):
        predictions = [] # Defining a empty list to store prediction values
        b0, b1 = self.fit(x, y) # Calculating the estimated coeffients
        for row in x: # Iterating over x values
            ynew = b0 + b1 * row # Calculating y values
            predictions.append(ynew)
        return predictions

In [None]:
Lr = LinearRegression() # Instance of the class

In [None]:
Lr.predict(x,y) # Predicitng the values

#### Data

The dataset choosen for this experiment is **Pizza Franchise** dataset. The dataset contains following data

X = annual franchise fee ($1000)

Y = start up cost ($1000) for a pizza franchise

Download the dataset [here](https://cdn.iisc.talentsprint.com/CDS/Datasets/pizza.csv)

#### Exercise 8: Using the above defined class LinearRegression, calculate the Estimated coeffients, fit the model, and predict the values on the Pizza Franchise dataset.

In [None]:
!wget -qq https://cdn.iisc.talentsprint.com/CDS/Datasets/pizza.csv

In [None]:
import pandas as pd # Importing pandas package
df = pd.read_csv("pizza.csv") # Loading pizza dataset

In [None]:
df.head() # Checking for the first five rows from the dataset

In [None]:
Lr = LinearRegression()  # Instance of a class LinearRegression

In [None]:
Lr.fit(df['X'], df['Y']) # Fitting the values

In [None]:
Predicted_values = Lr.predict(df['X'], df['Y']) # Predicting the values

In [None]:
Predicted_values

#### Exercise 9: In the above defined Linear Regression class, add a method named RMSE which takes two values as input (x, y) and returns the error value.

**Hint:**

- [How to calculate RMSE value](https://towardsdatascience.com/what-does-rmse-really-mean-806b65f2e48e)

In [None]:
from math import sqrt
class LinearRegression:   
    def __repr__(self):
        return "We are working on Linear Regression"
    # Function to calculate the mean
    def getMean(self, values):
        return sum(values) // len(values)
    # Function to calculate the variance
    def getVariance(self,values):
        mean = self.getMean(values)
        return sum([(x-mean)**2 for x in values])
    # Function to calculate the covariance
    def getcovariance(self, x, y):
        xmean = self.getMean(x)
        ymean = self.getMean(y)
        covar = 0.0
        for i in range(len(x)):
            covar += (x[i] - xmean)*(y[i] - ymean)
        return covar
    # Function to calculate the estimated coefficients
    def fit(self,x,y):
        # Calculating mean of x and y
        mean_x, mean_y = self.getMean(x), self.getMean(y)
        # Calculating covariance between x and y and finding the b1 value
        b1 = self.getcovariance(x, y) / self.getVariance(x)
        # finding b0 value
        b0 = mean_y - b1 * mean_x
        # returning the estimated coefficients
        return [b0, b1]
    def predict(self, x, y):
        predictions = [] # Defining a empty list to store prediction values
        b0, b1 = self.fit(x, y) # Calculating the estimated coeffients
        for row in x: # Iterating over x values
            ynew = b0 + b1 * row # Calculating y values
            predictions.append(ynew)
        return predictions
    # Function to calculate RMSE Value
    def rmse(self, actual, predicted):
        sum_err = 0.0
        for i in range(len(actual)):
            pred_err = predicted[i] - actual[i]
            sum_err += pred_err ** 2
        mean_err = sum_err / float(len(actual))
        return sqrt(mean_err)

#### Data

The dataset choosen for this exercise is **List Price Vs. Best Price for a New GMC Pickup** dataset. The dataset contains following data

X = List price (in $1000) for a GMC pickup truck

Y = Best price (in $1000) for a GMC pickup truck

Download the dataset [here](https://cdn.iisc.talentsprint.com/CDS/Datasets/gmc.csv)

#### Exercise 10: Using above defined class LinearRegression, 

- calculate the Estimated coeffients, fit the model, and predict the values on the List Price Vs. Best Price for a New GMC Pickup dataset.
- calculate the RMSE error on predicted and actual values of List Price Vs. Best Price for a New GMC Pickup dataset using the function defined above.

In [None]:
!wget -qq https://cdn.iisc.talentsprint.com/CDS/Datasets/gmc.csv

In [None]:
data = pd.read_csv("gmc.csv") # Loading the dataset

In [None]:
data.head() # Checking for the top five rows

In [None]:
Lr = LinearRegression() # Instance of a class

In [None]:
Lr.fit(data['X'], data['Y']) # Fitting the data

In [None]:
Predicted_values = Lr.predict(data['X'], data['Y']) # Predicting the values

In [None]:
Lr.rmse(data['Y'], Predicted_values) # Calculating rmse value

### Optional

* Use the built-in `sklearn LinearRegression` package to determine the coefficients for the above problems. 
* Compare the coefficients obtained using OOP based implementation vs coefficients from `sklearn LinearRegression` package.