## Calculating Linear Regression Coefficients Using the Least Squares Method

Calculation of Coefficients in Simple Linear Regression

Simple linear regression is a model that expresses the linear relationship between the values of an independent variable $x$ and a dependent variable $y$ for a dataset with $n$ observations:

$$
y_i = mx_i + c + \varepsilon_i
$$

Where:

* $x_i$ and $y_i$ are the values of the independent and dependent variables for the $i$‑th observation, respectively.
* $m$ represents the slope (regression coefficient).
* $c$ represents the intercept (y-intercept).
* $\varepsilon_i$ is the error term.

The least squares method aims to minimize these errors by reducing the differences between the observed values and the predicted values by the model. Using this method, the coefficients $m$ and $c$ are calculated as follows:

$$
m = \frac{{n \sum{(x_iy_i)} - \sum{x_i} \cdot \sum{y_i}}}{{n \sum{(x_i^2)} - (\sum{x_i})^2}}
$$

$$
c = \frac{{\sum{y_i} - m \cdot \sum{x_i}}}{{n}}
$$

Calculation of Coefficients in Multiple Linear Regression

Multiple linear regression is a model that expresses the linear relationship between the dependent variable $Y$ and the independent variables $X_1, X_2, \cdots, X_p$ for a dataset with $n$ observations:

$$
Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_pX_p + \varepsilon
$$

Where:

* $Y$ represents the dependent variable.
* $X_1, X_2, \cdots, X_p$ represent the $p$ independent variables.
* $\beta_0$ is the intercept term.
* $\beta_1, \beta_2, \dots, \beta_p$ are the regression coefficients for each independent variable.
* $\varepsilon$ is the error term.

The regression coefficients \beta_0, \beta_1, \dots, \beta_p are calculated using the least squares method as follows:

$$
\mathbf{\beta} = (X^T X)^{-1} X^T Y
$$

Where:

* $\mathbf{\beta}$ is the vector of regression coefficients.
* $X$ is the matrix of independent variable values.
* $Y$ is the vector of dependent variable values.


In [30]:
import numpy as np

In [31]:
# Setting the seed value for random number generation
np.random.seed(42)

# Creating the Dataset - Generate a dataset with 100 random values for x1, x2, x3, and Y
data = {'X1': np.random.rand(100),
       'X2': np.random.rand(100),
       'X3': np.random.rand(100),
       'Y': np.random.rand(100)}

In [32]:
# Creating the X matrix by combining X1, X2, and X3
X = np.column_stack((data['X1'],data['X2'],data['X3']))

# Y = Dependent Variable Vector
Y = data['Y']

Multiple Linear Regression Model

$$
Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_pX_p + \varepsilon
$$

Estimating Regression Coefficients Using the Least Squares Method

$$
\mathbf{\beta} = (X^T X)^{-1} X^T Y,\quad \beta = [\beta_0, \beta_1, \beta_2, \beta_3]
$$

In [33]:
# IMPORTANT: To compute beta_0 (the intercept), a column of ones must be added to the X matrix
X = np.column_stack((np.ones(len(X)), X))

In [34]:
# beta = (X^T . X)^(-1) . X^T . Y

beta = np.linalg.inv(X.T @ X) @ X.T @ Y

In [35]:
print("Multiple Linear Regression Coefficients")
print("beta_0 (constant term)=", beta[0]),
print("beta_1 =", beta[1]),
print("beta_2 =", beta[2]),
print("beta_3 =", beta[3])

Multiple Linear Regression Coefficients
beta_0 (constant term)= 0.7403458027139663
beta_1 = -0.21936986847317522
beta_2 = -0.05321605927029976
beta_3 = -0.23099000038654308


In [36]:
def least_square_func(X,Y):
    X = np.column_stack((np.ones(len(X)), X))
    beta = np.linalg.inv(X.T @ X) @ X.T @ Y
    return beta

In [37]:
np.random.seed(42)


data = {'X1': np.random.rand(100),
       'X2': np.random.rand(100),
       'X3': np.random.rand(100),
       'Y': np.random.rand(100)}


X = np.column_stack((data['X1'], data['X2'], data['X3']))


Y = data['Y']

In [38]:
beta = least_square_func(X, Y)

In [39]:
print("Multiple Linear Regression Coefficients")
print("beta_0 (constant term)=", beta[0]),
print("beta_1 =", beta[1]),
print("beta_2 =", beta[2]),
print("beta_3 =", beta[3])

Multiple Linear Regression Coefficients
beta_0 (constant term)= 0.7403458027139663
beta_1 = -0.21936986847317522
beta_2 = -0.05321605927029976
beta_3 = -0.23099000038654308


**Calculating Linear Regression Coefficients with the Scikit-learn Library**

In [40]:
import pandas as pd
from sklearn.linear_model import LinearRegression

In [41]:
# Set the seed for reproducibility
np.random.seed(42)

# Creating the dataset
data = {
    'X1': np.random.rand(100),
    'X2': np.random.rand(100),
    'X3': np.random.rand(100),
    'Y': np.random.rand(100)
}

# Convert the data into a Pandas DataFrame
df = pd.DataFrame(data)

# X = Matrix of Independent Variables
X = df[['X1', 'X2', 'X3']]

# Y = Vector of Dependent Variable
Y = df['Y']

In [42]:
# Building the Linear Regression Model
model = LinearRegression()

In [43]:
# Modeli fit Etme
model.fit(X, Y)

In [44]:
model.fit(X, Y)
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)

Intercept: 0.7403458027139651
Coefficients: [-0.21936987 -0.05321606 -0.23099   ]


In [45]:
# beta_0 = intecept 
print('beta_0 =', model.intercept_)

# beta_1,2,3 = coefficients
print('beta_1 =', model.coef_[0])
print('beta_2 =', model.coef_[1])
print('beta_3 =', model.coef_[2])

beta_0 = 0.7403458027139651
beta_1 = -0.21936986847317438
beta_2 = -0.05321605927030065
beta_3 = -0.23099000038654072
