<a href="https://colab.research.google.com/github/anshabrol/Algozenith-essential-ml/blob/main/regression_walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here's a detailed line-by-line walkthrough of the code for a simple linear regression using gradient descent, including the `train_linear_regression` function.

### Code Walkthrough

#### Imports
```python
import numpy as np
import pandas as pd
```
- **Line 1**: Import the `numpy` library, which is used for numerical operations.
- **Line 2**: Import the `pandas` library, which is used for handling data in DataFrame structures.

#### Given Dataset
```python
data = {
    'feature': [1, 2, 3, 4, 5],
    'target': [2, 4, 5, 4, 5]
}
```
- **Lines 4-7**: Create a dictionary containing the dataset. It has two keys: `feature` and `target`. Each key maps to a list of values.

#### Task 1: Create a DataFrame
```python
df = pd.DataFrame(data)
print("DataFrame:\n", df)
```
- **Line 9**: Convert the dictionary into a `pandas` DataFrame.
- **Line 10**: Print the DataFrame to verify its contents.

#### Task 2: Split Data into Features and Target
```python
X = df[['feature']].values
y = df['target'].values
```
- **Line 12**: Extract the feature column from the DataFrame and convert it to a `numpy` array. `X` now contains the input features.
- **Line 13**: Extract the target column from the DataFrame and convert it to a `numpy` array. `y` now contains the target values.

#### Add a Column of Ones to Include the Intercept in the Model
```python
X = np.c_[np.ones(X.shape[0]), X]
```
- **Line 15**: Add a column of ones to the `X` array to account for the intercept (bias term) in the linear regression model.

#### Train Linear Regression Model Using Gradient Descent
```python
def train_linear_regression(X, y, learning_rate=0.01, epochs=1000):
    weights = np.zeros(X.shape[1])  # Initialize weights with zeros
    m = len(y)  # Number of training examples

    for epoch in range(epochs):
        predictions = X.dot(weights)  # Calculate predictions
        errors = predictions - y  # Calculate the error
        gradient = X.T.dot(errors) / m  # Calculate gradient
        weights -= learning_rate * gradient  # Update weights

        if epoch % 100 == 0:  # Print the loss every 100 epochs
            loss = np.mean(errors ** 2)
            print(f"Epoch {epoch}, Loss: {loss}")
    
    return weights
```
- **Line 17**: Define a function `train_linear_regression` to train the model using gradient descent. It takes `X`, `y`, `learning_rate`, and `epochs` as inputs.
- **Line 18**: Initialize the weights to zeros. The size of the weight vector matches the number of features (including the intercept).
- **Line 19**: Store the number of training examples in `m`.

#### Gradient Descent Loop
- **Line 21**: Start a loop that runs for the specified number of epochs (iterations).
- **Line 22**: Calculate the predicted values by taking the dot product of `X` and the weights.
- **Line 23**: Compute the errors as the difference between predictions and actual target values.
- **Line 24**: Calculate the gradient of the cost function with respect to the weights.
- **Line 25**: Adjust the weights in the opposite direction of the gradient by multiplying the gradient by the learning rate and subtracting this from the weights.
- **Line 27**: Every 100 epochs, calculate and print the mean squared error loss to monitor the training progress.
- **Line 30**: Return the trained weights.

#### Predict Using the Linear Regression Model
```python
def predict(X, weights):
    return X.dot(weights)
```
- **Line 33**: Define a function `predict` that takes `X` and the trained weights as inputs.
- **Line 34**: Calculate the predicted values by taking the dot product of `X` and the weights. Return these predictions.

#### Train the Model
```python
weights = train_linear_regression(X, y)
print("Trained weights:", weights)
```
- **Line 37**: Call the `train_linear_regression` function with the feature matrix `X` and target vector `y`. Store the returned weights.
- **Line 38**: Print the trained weights.

#### Test Input
```python
X_test = np.array([[1, 6], [1, 7]])  # Including the intercept term (column of ones)
```
- **Line 41**: Create a test input array `X_test` with two samples. Each sample includes a column of ones for the intercept term.

#### Predict the Target Values for the Test Input
```python
predictions = predict(X_test, weights)
print("Predictions:", predictions)
```
- **Line 44**: Call the `predict` function with the test input `X_test` and the trained weights. Store the returned predictions.
- **Line 45**: Print the predicted values for the test input.


In [None]:
import numpy as np
import pandas as pd

# Given dataset
data = {
    'feature': [1, 2, 3, 4, 5],
    'target': [2, 4, 5, 4, 5]
}

# TASK 1: Create a DataFrame
df = pd.DataFrame(data)
print("DataFrame:\n", df)

# TASK 2: Split Data into Features and Target
X = df[['feature']].values
y = df['target'].values

# Add a column of ones to include the intercept in the model
X = np.c_[np.ones(X.shape[0]), X]

# Function to train linear regression model using gradient descent
def train_linear_regression(X, y, learning_rate=0.01, epochs=1000):
    # Initialize weights with zeros
    weights = np.zeros(X.shape[1])
    m = len(y)  # Number of training examples

    for epoch in range(epochs):
        # Calculate predictions
        predictions = X.dot(weights)

        # Calculate the error
        errors = predictions - y

        # Calculate gradient
        gradient = X.T.dot(errors) / m

        # Update weights
        weights -= learning_rate * gradient

        # Print the loss every 100 epochs
        if epoch % 100 == 0:
            loss = np.mean(errors ** 2)
            print(f"Epoch {epoch}, Loss: {loss}")

    return weights

# Function to predict using the linear regression model
def predict(X, weights):
    # TASK 4: Make predictions
    return X.dot(weights)

# Train the model
weights = train_linear_regression(X, y)
print("Trained weights:", weights)

# Test input
X_test = np.array([[1, 6], [1, 7]])  # Including the intercept term (column of ones)

# Predicting the target values for the test input
predictions = predict(X_test, weights)
print("Predictions:", predictions)


DataFrame:
    feature  target
0        1       2
1        2       4
2        3       5
3        4       4
4        5       5
Epoch 0, Loss: 17.2
Epoch 100, Loss: 0.9429888231434214
Epoch 200, Loss: 0.8100752332146092
Epoch 300, Loss: 0.7153181203937461
Epoch 400, Loss: 0.6477636254206384
Epoch 500, Loss: 0.5996024937101454
Epoch 600, Loss: 0.5652673305421165
Epoch 700, Loss: 0.5407890139431251
Epoch 800, Loss: 0.5233378668322708
Epoch 900, Loss: 0.5108965482369706
Trained weights: [1.85212787 0.696355  ]
Predictions: [6.03025788 6.72661288]
