## Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable (also called the response variable) and one or more independent variables (also called predictor variables). The basic idea is to find the best-fit line or plane that can predict the dependent variable based on the independent variables.

The linear regression equation for a simple linear regression model with one independent variable is given by:

Y = b0 + b1*X + e

where:

Y is the dependent variable (response variable)
X is the independent variable (predictor variable)
b0 is the intercept or constant term
b1 is the slope coefficient (the change in Y for a unit change in X)
e is the error term (the difference between the predicted and actual values of Y)
The goal of linear regression is to estimate the values of the intercept and slope coefficients (b0 and b1) that minimize the sum of squared errors between the predicted and actual values of Y. This is typically done using the method of least squares, which involves finding the values of b0 and b1 that minimize the sum of the squared residuals:

Sum of squared residuals = Σ (Yi - Ŷi)2

where:

Yi is the actual value of the dependent variable for the i-th observation
Ŷi is the predicted value of the dependent variable for the i-th observation
The solution to the least squares problem can be obtained using matrix algebra. Specifically, the solution is given by:

B = (X^T X)^-1 X^T Y

where:

B is a vector of the estimated coefficients (b0 and b1)
X is the matrix of independent variables (including a column of ones for the intercept)
Y is the vector of dependent variable values
Once the coefficients are estimated, we can use the linear regression equation to make predictions on new data.

Linear regression can also be extended to multiple linear regression, where there are multiple independent variables. The equation is similar, but there is a coefficient for each independent variable:

Y = b0 + b1X1 + b2X2 + ... + bn*Xn + e

where:

X1, X2, ..., Xn are the independent variables
b1, b2, ..., bn are the slope coefficients for each independent variable
The solution to the multiple linear regression problem involves estimating the values of the coefficients that minimize the sum of squared errors, using techniques such as matrix algebra or gradient descent.

In [3]:
import numpy as np

class LinearRegression:
    def __init__(self, lr=0.01, n_iters=1000):
        self.lr = lr
        self.n_iters = n_iters
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        # Initialize weights and bias
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        # Gradient descent
        for _ in range(self.n_iters):
            y_pred = np.dot(X, self.weights) + self.bias
            dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
            db = (1 / n_samples) * np.sum(y_pred - y)
            self.weights -= self.lr * dw
            self.bias -= self.lr * db
        
    def predict(self, X):
        y_pred = np.dot(X, self.weights) + self.bias
        return y_pred


In [4]:
# Create a toy dataset
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([6, 15, 24])

# Initialize the linear regression model
model = LinearRegression(lr=0.01, n_iters=1000)

# Train the model
model.fit(X, y)

# Make predictions on new data
X_test = np.array([[10, 11, 12], [13, 14, 15]])
y_pred = model.predict(X_test)

print(y_pred)


[32.99872173 41.99792905]


## Logistic Regression

Logistic regression is a type of classification algorithm that is used to model the relationship between a binary (or categorical) output variable and one or more input variables. It is called logistic regression because it uses a logistic (or sigmoid) function to model the probability of the output variable taking on a particular value, given the input variables.

The logistic function is defined as:

�
(
�
)
=
1
1
+
�
−
�
σ(z)= 
1+e 
−z
 
1
​
 

where $z = w_0 + w_1x_1 + w_2x_2 + ... + w_mx_m$ is a linear combination of the input variables and their associated weights.

The logistic function maps any real-valued number $z$ to a value between 0 and 1, which can be interpreted as the probability of the output variable being 1, given the input variables. For example, if $\sigma(z) = 0.8$, then we can say that the probability of the output variable being 1 is 0.8.

To learn the weights $w_0, w_1, w_2, ..., w_m$, we need to define a cost function that measures how well the logistic regression model is doing at predicting the output variable. A common cost function for logistic regression is the cross-entropy loss function, which is defined as:

�
(
�
)
=
−
1
�
∑
�
=
1
�
[
�
(
�
)
log
⁡
(
�
(
�
(
�
)
)
)
+
(
1
−
�
(
�
)
)
log
⁡
(
1
−
�
(
�
(
�
)
)
)
]
J(w)=− 
m
1
​
 ∑ 
i=1
m
​
 [y 
(i)
 log(σ(z 
(i)
 ))+(1−y 
(i)
 )log(1−σ(z 
(i)
 ))]

where $m$ is the number of training examples, $y^{(i)}$ is the actual output value for the $i$th training example, and $z^{(i)}$ is the linear combination of input variables and weights for the $i$th training example. The goal is to minimize the value of the cost function $J(w)$ with respect to the weights $w_0, w_1, w_2, ..., w_m$.

To minimize the cost function, we can use an optimization algorithm such as gradient descent. The gradient of the cost function with respect to the weights is given by:

∂
�
(
�
)
∂
�
�
=
1
�
∑
�
=
1
�
(
�
(
�
(
�
)
)
−
�
(
�
)
)
�
�
(
�
)
∂w 
j
​
 
∂J(w)
​
 = 
m
1
​
 ∑ 
i=1
m
​
 (σ(z 
(i)
 )−y 
(i)
 )x 
j
(i)
​
 

where $x_j^{(i)}$ is the $j$th input variable for the $i$th training example. We can use this gradient to update the weights using the following update rule:

�
�
:
=
�
�
−
�
∂
�
(
�
)
∂
�
�
w 
j
​
 :=w 
j
​
 −α 
∂w 
j
​
 
∂J(w)
​
 

where $\alpha$ is the learning rate.

By repeating this update rule iteratively, we can find the weights that minimize the cost function and provide us with the best logistic regression model for our data.

In [5]:
import numpy as np

class LogisticRegression:
    def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
        self.lr = lr
        self.num_iter = num_iter
        self.fit_intercept = fit_intercept
        self.verbose = verbose
    
    def __add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate((intercept, X), axis=1)
    
    def __sigmoid(self, z):
        return 1 / (1 + np.exp(-z))
    
    def __loss(self, h, y):
        return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
    
    def fit(self, X, y):
        if self.fit_intercept:
            X = self.__add_intercept(X)
        
        # initialize weights
        self.theta = np.zeros(X.shape[1])
        
        for i in range(self.num_iter):
            z = np.dot(X, self.theta)
            h = self.__sigmoid(z)
            gradient = np.dot(X.T, (h - y)) / y.size
            self.theta -= self.lr * gradient
            
            if self.verbose and i % 10000 == 0:
                z = np.dot(X, self.theta)
                h = self.__sigmoid(z)
                print(f'Loss: {self.__loss(h, y)}')
    
    def predict_prob(self, X):
        if self.fit_intercept:
            X = self.__add_intercept(X)
        
        return self.__sigmoid(np.dot(X, self.theta))
    
    def predict(self, X, threshold=0.5):
        return self.predict_prob(X) >= threshold


In [6]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# generate some random data for classification
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create a logistic regression model
model = LogisticRegression(lr=0.1, num_iter=100000)

# train the model on the training data
model.fit(X_train, y_train)

# make predictions on the testing data
y_pred = model.predict(X_test)

# calculate the accuracy of the predictions
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')


Accuracy: 0.825
