# __Stochastic Gradient Descent (SGD)__
- Stochastic gradient descent (SGD) is an optimization algorithm, commonly used in machine learning to train models. It is easier to fit into memory due to a single training sample being processed by the network.
- It is computationally fast as only one sample is processed at a time. For larger datasets, it can converge faster as it causes updates to the parameters more frequently.

## Steps to be followed:
1. Import the required libraries
2. Load the dataset
3. Preprocess the data
4. Initialize parameters
5. Define the loss function
6. Implement the SGD algorithm
7. Train the model
8. Evaluate the model

  ### Step 1: Import the required libraries

  - It imports the necessary libraries and modules for data analysis and evaluation tasks.

  - It specifically imports NumPy (for numerical operations), Pandas (for data manipulation), Matplotlib (for data visualization), and scikit-learn (for machine learning tasks) modules and functions.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Step 2: Load the dataset

In [None]:
iris_data = load_iris()
X, y = iris_data.data, iris_data.target

**Observation**

- The Iris dataset is successfully loaded. It contains 150 samples with 4 features each. The target variable has 3 classes representing different species of Iris.

### Step 3: Preprocess the data

- One-hot encode the target variable.

In [None]:
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y.reshape(-1, 1))



**Observation**

- The target variable y is one-hot encoded, transforming it from a single column of class labels to a matrix where each row is a one-hot encoded vector representing the class.

- Split the data into training and testing sets.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Observation**

- The dataset is split into training (80%) and testing (20%) sets. This separation ensures that we can evaluate the model's performance on unseen data.

- Standardize the data.

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**Observation**

- The features are standardized to have zero mean and unit variance, which helps in faster convergence of the SGD algorithm and ensures that all features contribute equally to the gradient updates.

### Step 4: Initialize parameters

In [None]:
np.random.seed(42)
weights = np.random.randn(X_train.shape[1], y_train.shape[1])
bias = np.random.randn(y_train.shape[1])

**Observation**

- The weights and bias are initialized randomly. This randomness can affect the starting point and convergence path of the algorithm. Setting a random seed ensures reproducibility.

### Step 5: Define the loss function

In [None]:
def compute_loss(X, y, weights, bias):
    predictions = softmax(np.dot(X, weights) + bias)
    loss = -np.mean(np.sum(y * np.log(predictions), axis=1))
    return loss

def softmax(z):
    if z.ndim == 1:
        z = z.reshape(1, -1)
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
    return exp_z / exp_z.sum(axis=1, keepdims=True)

**Observation**

- The softmax function calculates probabilities for each class, ensuring that the sum of probabilities for each sample equals 1.
- The cross-entropy loss function measures the difference between the predicted and actual distributions. It penalizes incorrect predictions more heavily.

### Step 6: Implement the SGD algorithm

In [None]:
def sgd(X, y, weights, bias, learning_rate, epochs):
    for epoch in range(epochs):
        for i in range(X.shape[0]):
            # Compute the prediction
            z = np.dot(X[i], weights) + bias
            prediction = softmax(z).flatten()

            # Compute the error
            error = prediction - y[i]

            # Update the weights and bias
            weights -= learning_rate * np.outer(X[i], error)
            bias -= learning_rate * error

        # Optionally, print the loss for each epoch
        if epoch % 10 == 0:
            loss = compute_loss(X, y, weights, bias)
            print(f'Epoch {epoch}, Loss: {loss}')

    return weights, bias

**Observation**

- The algorithm iteratively updates weights and bias for each sample, minimizing the loss function.
- The loss is printed every 10 epochs to track the training progress.
- The decreasing loss over epochs indicates that the model is learning and improving its predictions.

### Step 7: Train the model

In [None]:
weights, bias = sgd(X_train, y_train, weights, bias, learning_rate, epochs)

Epoch 0, Loss: 1.0880300581662026
Epoch 10, Loss: 0.33942298320903863
Epoch 20, Loss: 0.25877241037361626
Epoch 30, Loss: 0.2152833378451731
Epoch 40, Loss: 0.18748912695944972
Epoch 50, Loss: 0.16804243546955017
Epoch 60, Loss: 0.1536121423967302
Epoch 70, Loss: 0.14244701115914368
Epoch 80, Loss: 0.13353214357163423
Epoch 90, Loss: 0.1262367431452422


**Observation**

- The model is trained using the SGD function. As epochs increase, the loss typically decreases, showing that the model is learning and the parameters are being optimized.

### Step 8: Evaluate the model

In [None]:
def predict(X, weights, bias):
    predictions = softmax(np.dot(X, weights) + bias)
    return np.argmax(predictions, axis=1)

y_train_pred = predict(X_train, weights, bias)
y_test_pred = predict(X_test, weights, bias)

y_train_true = np.argmax(y_train, axis=1)
y_test_true = np.argmax(y_test, axis=1)

train_accuracy = accuracy_score(y_train_true, y_train_pred)
test_accuracy = accuracy_score(y_test_true, y_test_pred)

print(f'Training Accuracy: {train_accuracy}')
print(f'Testing Accuracy: {test_accuracy}')

Training Accuracy: 0.9666666666666667
Testing Accuracy: 1.0


**Observation**

- The predictions are made on the training and testing sets.
- The accuracy scores provide a measure of the model's performance.
- High training accuracy indicates the model fits the training data well.
- High testing accuracy suggests the model generalizes well to unseen data.
- Using stochastic gradient descent (SGD) for training a classification model on the Iris dataset demonstrates the effectiveness of the algorithm. The step-by-step observations show the model's learning process, starting from data preprocessing, parameter initialization, loss computation, iterative parameter updates, and finally evaluating the model's performance.