# Practical Implementation and Hands-on Exercises - Logistic Regression From Scratch

In this lab session, we will dive deep into logistic regression, offering a hands-on experience that blends theory with practice. 
You'll not only build a logistic regression model from scratch but also compare its performance with scikit-learn's LogisticRegression implementation. 
This approach will reinforce your understanding of the underlying mechanics of logistic regression and provide insight into the effectiveness of your implementation.

#### Implementing Logistic Regression from Scratch

1. **Overview:** You'll start by implementing logistic regression using a step-by-step approach, focusing on understanding each part of the algorithm, including the logistic function, the cost function, and the gradient descent optimization process.

2. **Step-by-Step Guide:**
   - **Logistic Function:** Implement the sigmoid function that models the probability as a function of input features.
   - **Cost Function:** Code the log loss function to evaluate how well your model fits the data.
   - **Gradient Descent:** Write the gradient descent optimization algorithm to minimize the cost function, updating the model's weights.

3. **Model Training:** Use a simple, split dataset to train your model. Pay attention to how the choice of learning rate and number of iterations impacts the convergence and performance of your model.


#### Applying Logistic Regression with Scikit-learn

1. **Quick Application:** Apply logistic regression to the same dataset (Iris) using scikit-learn’s `LogisticRegression` class. This will serve as a benchmark to evaluate the performance of your implementation.
2. **Preprocessing Steps:** Review and apply necessary preprocessing steps like feature scaling, which is crucial for logistic regression. Scikit-learn's `StandardScaler` can be used for scaling the features.

#### Model Evaluation: Compare and Interpret Results

1. **Evaluation Metrics:** Use accuracy, precision, recall, and the F1 score to evaluate the performance of both your model and scikit-learn’s model. This will give you a comprehensive view of how well each model performs.
2. **Confusion Matrix:** Generate confusion matrices for both models to visualize true positives, true negatives, false positives, and false negatives.
3. **Performance Comparison:**
   - **Quantitative Analysis:** Compare the metrics between your logistic regression model and the scikit-learn model. Discuss any differences in performance and speculate on potential reasons for these differences.
   - **Qualitative Analysis:** Reflect on the learning experience of implementing logistic regression from scratch. Consider aspects like the complexity of the algorithm, challenges encountered, and the insights gained through manual implementation.

Some code is already provided to guide you through the implementation process. You are encouraged to experiment with the code, modify it, and explore additional functionalities to deepen your understanding of logistic regression.

There are tags as "### FILL HERE ###" in the code blocks, which you need to fill in with the appropriate code. I tried to give as minimal code as possible so that you can understand the concept better. Mainly the code is given to provide structure to the implementation.

In [None]:
# Sigmoid function implementation
import numpy as np

def sigmoid(x):
    z = np.clip(x, -250, 250)  # Clipping the input to avoid overflow
    ### FILL HERE ###
    pass

In [None]:
# Cost function - Our MLE cost function
def cost_function(X, y, weights):
    ### FILL HERE ###
    pass

In [None]:
# Gradient descent
def gradient_descent(X, y, weights, learning_rate, iterations):
    ### FILL HERE ###
    pass

In [None]:
# Prediction
def predict(X, weights, threshold = 0.5):
    ### FILL HERE ###
    pass

In [None]:
# Accuracy
def accuracy(y_true, y_pred):
    ### FILL HERE ###
    pass

In [None]:
# Plotting the learning curve
def plot_learning_curve(cost_history):
    import matplotlib.pyplot as plt
    plt.plot(range(len(cost_history)), cost_history)
    plt.xlabel('Iterations')
    plt.ylabel('Cost')
    plt.title('Gradient Descent Learning Curve')
    plt.show()

In [None]:
from sklearn.datasets import load_wine

# Load the iris dataset
wine = load_wine()
X = wine.data
y = wine.target

# Feature scaling
### FILL HERE ### Hint: Use sklearn

# Splitting dataset into training and testing set
X_train, X_test, y_train, y_test = None ### FILL HERE ### Hint: sklearn has a function for this

In [None]:
# Training the Model
X_train_b = np.c_[np.ones((X_train.shape[0], 1)), X_train]  # Add bias column
X_test_b = np.c_[np.ones((X_test.shape[0], 1)), X_test]  # Add bias column


weights = None  # Initialize the weights    ### FILL HERE ###
iterations = None # Number of iterations    ### FILL HERE ###
learning_rate = None # Learning rate        ### FILL HERE ###

weights, cost_history = gradient_descent(X_train_b, y_train, weights, learning_rate, iterations)

print("Weights after training:", weights)

In [None]:
# Making predictions
y_pred = predict(X_test_b, weights)

# Evaluating the model
accuracy = None
print("Model accuracy on the test set:", accuracy)

In [None]:
plot_learning_curve(cost_history)

In [None]:
# Comparing with sklearn's implementation
from sklearn.linear_model import LogisticRegression

### FILL HERE ###