# Simple linear regression from scratch

In this practical, we embark on an exciting journey to build a Simple Linear Regression model from scratch. Our goal is to peel back the layers of one of the most fundamental algorithms in machine learning and statistics, providing a hands-on experience that bridges theory with practical application. By constructing this model step by step, we aim to gain a deeper understanding of its mechanics and appreciate the elegance of linear regression as a tool for making predictions. Whether you're a student, a budding data scientist, or simply curious about machine learning, this project is designed to demystify the process of creating a predictive model and lay a solid foundation for further exploration in the field.



## About the data

In this practical, we will be working with a dataset that encapsulates the relationship between marketing budgets allocated to TV advertisements and the corresponding sales outcomes. The dataset contains two primary columns:


**TV**: This column represents the marketing budget allocated to TV advertisements for a particular product or service. It reflects the investment made by a company in television-based marketing campaigns over a specific period.


**Sales**: This column records the number of sales achieved following the corresponding TV advertisement budget. It is a direct indicator of the effectiveness of TV marketing expenditures in driving sales for the product or service.

## What is Simple Linear Regression?

Simple Linear Regression is a statistical method that allows us to understand and predict the relationship between two variables. Specifically, it focuses on the case where we have one independent variable (X) and one dependent variable (Y), hence the term 'simple'. The core idea is to find the best-fitting straight line through the data points that minimizes the differences between the observed values and the values predicted by the model. This line can then be used to predict future outcomes based on new inputs of the independent variable.

Formulas in Simple Linear Regression
In Simple Linear Regression, the relationship between the independent variable (X) and the dependent variable (Y) is represented by the following equation:

$$
Y = β0 + β1X
$$

Y is the dependent variable,
X is the independent variable,
β0 is the intercept of the regression line on the Y-axis,
β1 is the slope of the regression line, representing the change in Y for a unit change in X,


The aim of Simple Linear Regression is to find the values of β0 and β1 that minimize the sum of the squared differences between the observed values and the values predicted by our model. This method is known as Ordinary Least Squares (OLS). The formulas for calculating β0 and β1 are:

$$
β1 = \frac{\sum ((X_i - \bar{X})(Y_i - \bar{Y}))}{\sum ((X_i - \bar{X})^2)}
$$

$$
β0 = \bar{Y} - β1\bar{X}
$$

where X̄ and Ȳ are the mean values of the independent and dependent variables, respectively.


By applying these formulas, we can estimate the coefficients of the regression line that best fits our data. This enables us to make predictions about the dependent variable using new inputs for the independent variable.

## Let's start

Load the dataset from the file `data.csv`

In [None]:
# FIXME

Display the first 3 rows

In [None]:
# FIXME

Display the last 3 rows

In [None]:
# FIXME

Display some informations about the dataset, use the functions `info` and `describe`

In [None]:
# FIXME

Now plot the dataset. You can use plotly if you want or direcly the `plot` function from pandas.

We want to put the `TV` column on the X axis and `Sales` column on the Y axis.

Use a scatter plot

In [None]:
# FIXME

What can you conclude from the plot ?

Do we have a linear relation between our variables ?

In [None]:
# FIXME

### Now let's build the model from scratch

Now that we have explored our dataset and understood the relationship between the TV marketing budget and sales outcomes, we are ready to build our Simple Linear Regression model. This model will help us predict sales based on the TV advertisement budget. Unlike using pre-built functions from libraries, building the model from scratch will provide us with deeper insights into the underlying mathematical principles.

In [None]:
import numpy as np

class MySimpleLinearRegression:
    def __init__(self):
        '''
        Initialize the parameters of the simple Linear regression: beta0 and beta1
        '''
        # FIXME
        pass

    def fit(self, x: np.array, y: np.array):
        '''
        Function that is used to learn the beta parameters from the data

        x: is your feature column in a numpy array.
        y: is your label column in a numpy array. We want to predict this variable
        '''
        # FIXME
        pass

    def predict(self, x: float) -> float:
        '''
        The function that will be used to make a prediction with our learnt parameters
        '''
        # FIXME
        pass

Now train your model on the data

In [11]:
# FIXME

Now display the parameters beta0 and beta1

In [None]:
# FIXME

Call the prediction function with new x values

Try:

x = 5

x = 50

x = 100

x = 500

x = 1000

In [None]:
# FIXME

### Plot

Plot the regression line on your data

In [None]:
# FIXME