## 1. Install and Import libraries

In [36]:
# !pip install -U scikit-learn

In [1]:
# Manipulating matrixes and DataFrames
import numpy as np
import pandas as pd

# Pre-build models
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import scale

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## 2. Load and process data

In [2]:
# Load the hitters dataset
hitters = pd.read_csv('https://raw.githubusercontent.com/qlabpucp/datasets/main/datasets/boston.csv')

# Drop string columns
hitters = hitters.drop(columns=['Unnamed: 0'])

# Dop missing values
hitters = hitters.dropna()

# Splitting into input and target variables
x = hitters.drop('crim', axis=1)
y = hitters['crim']

# Scaling data
x_scaled = scale(x)
y_scaled = scale(y)

## 3. Fit a Linear Regression Model

In [3]:
model = LinearRegression(fit_intercept=False)
model.fit(x_scaled, y_scaled)

## 4. Estimate coefficients

### 4.1. Using the models attribute

We can determinate the OLS coefficients by using the build-in attribute `.coef`

In [4]:
print(model.coef_)

[ 0.12393939 -0.04653842 -0.02437248 -0.1341459   0.05137256 -0.00277602
 -0.24780411  0.6199926  -0.07397933 -0.07653287  0.1152332  -0.23529275]


In [5]:
coefs_df = pd.DataFrame(data=model.coef_, columns=["Coefs - Attribute"], index=x.columns)
coefs_df

Unnamed: 0,Coefs - Attribute
zn,0.123939
indus,-0.046538
chas,-0.024372
nox,-0.134146
rm,0.051373
age,-0.002776
dis,-0.247804
rad,0.619993
tax,-0.073979
ptratio,-0.076533


### 4.2. Manually

We can estimate the Linear regression coefficientes manually as well. For this we should take into account that Linear regression is expressed as:

$$ y = X\beta + \epsilon $$

where:
- $ X $ is the feature matrix,
- $ \beta $ is the vector of coefficients, and
- $ \epsilon $ is the vector of error terms.

We can find the optimal coefficients by minimizing the Residual Sum of Squares (RSS). RSS is defined as:

$$ \text{RSS} = (y - X\beta)'(y - X\beta) $$
$$ \text{RSS} = y'y - 2y'X\beta + \beta'X'X\beta $$

To minimize the RSS, we derive the loss function with respect to $ \beta $ and set it to zero. In other words, we are looking to find the critical point where the values of $ \beta $ lead to the lowest possible error. For this:

The derivative of RSS with respect to $ \beta $ is:

$$ \frac{\partial \text{RSS}}{\partial \beta} = -2X'y + 2X'X\beta $$

Setting the derivative to zero:

$$ -2X'y + 2X'X\beta = 0 $$

From this, we solve for $ \beta $, which takes the value:

$$ \hat{\beta} = (X'X)^{-1}X'y $$
$$ X'X\hat{\beta} = X'y $$

By solving this equation, we obtain the coefficient values that minimize the error, providing us with the best fit model.

In [6]:
Xtx = np.dot(x_scaled.T, x_scaled)
Xty = np.dot(x_scaled.T, y_scaled)
beta = np.linalg.solve(Xtx, Xty)
print(beta)

[ 0.12393939 -0.04653842 -0.02437248 -0.1341459   0.05137256 -0.00277602
 -0.24780411  0.6199926  -0.07397933 -0.07653287  0.1152332  -0.23529275]


In [7]:
coefs_df['Coefs - Beta'] = beta
coefs_df

Unnamed: 0,Coefs - Attribute,Coefs - Beta
zn,0.123939,0.123939
indus,-0.046538,-0.046538
chas,-0.024372,-0.024372
nox,-0.134146,-0.134146
rm,0.051373,0.051373
age,-0.002776,-0.002776
dis,-0.247804,-0.247804
rad,0.619993,0.619993
tax,-0.073979,-0.073979
ptratio,-0.076533,-0.076533


This code performs matrix operations that correspond to the mathematical steps we've described:

- `np.dot(x_scaled.T, x_scaled)` calculates $ X'X $,
- `np.dot(x_scaled.T, y_scaled)` calculates $ X'y $,
- `np.linalg.solve()` solves the equation $ X'X\beta = X'y $ for $ \beta $,
- `print(beta)` then outputs the calculated coefficients.


## 5. Gradient Descent

The gradient descent algorithm is an optimization method used to find the minimum of a cost function. This process iteratively adjust the model's parameters to minimize the cost function. The update rule can be summarized by the following formula:

$$ w_{1} = w_{0} - \alpha \cdot (\nabla f(w_{0})) $$

Where the gradient is expressed as:

$$ \nabla f(w_0) = -2 \cdot X^T \cdot (y - Xw_0) $$


- **Initial Weights (`w0`)**: Start with a randomly initialized set of coefficients.
- **Learning Rate (`alpha`)**: A predefined step size that determines how much we adjust the weights with respect to the gradient.
- **Gradient Calculation**: Compute the gradient of the cost function, which is the sum of squared residuals in linear regression.
- **Weights Update (`w1`)**: Modify the weights in the opposite direction of the gradient to minimize the cost function.
- **Convergence Check**: Continue iterating until the changes in weights are smaller than a defined threshold (`atol`), indicating that we've reached the minimum.
- **Final Weights Output**: Once convergence is achieved, output the optimized weights, which represent the best-fit coefficients for the linear regression model.


In [8]:
# Initialize weights randomly
w0 = np.random.uniform(size=x_scaled.shape[1])

# Set the learning rate
alpha = 0.0002

w1 = w0.copy()

# Set a loop that will continue untl break condition is met
while True:

    # Calculate predictions
    predictions = np.dot(x_scaled, w0)
    # Calculate errors
    errors = y_scaled - predictions
    # Calculate gradient (direction to adjust weights to minimize the loss function)
    gradient = -2 * np.dot(x_scaled.T, errors)
    # Update weights
    w1 = w0 - alpha * gradient

    # Check for convergence
    if np.allclose(w1, w0, atol=1e-4):
        break

    # Prepare for the next iteration
    w0 = w1.copy()

# Print final weights
print(w1)

[ 0.12296162 -0.04924836 -0.02412141 -0.13330899  0.0515497  -0.00324933
 -0.24755159  0.61336394 -0.06616674 -0.07600982  0.11626937 -0.23406389]


In [9]:
coefs_df['Coefs - GD'] = w1
coefs_df

Unnamed: 0,Coefs - Attribute,Coefs - Beta,Coefs - GD
zn,0.123939,0.123939,0.122962
indus,-0.046538,-0.046538,-0.049248
chas,-0.024372,-0.024372,-0.024121
nox,-0.134146,-0.134146,-0.133309
rm,0.051373,0.051373,0.05155
age,-0.002776,-0.002776,-0.003249
dis,-0.247804,-0.247804,-0.247552
rad,0.619993,0.619993,0.613364
tax,-0.073979,-0.073979,-0.066167
ptratio,-0.076533,-0.076533,-0.07601
