**IMPORTANT NOTES** 
- Please complete the code between the two comments: `## START CODE HERE` and `## END CODE HERE`. 
- Be sure to run the codes in order.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
seed= 1
np.random.seed(seed)

# Outline
- [1 -Binary Logistic Regression](#1)

<a name="1"></a>
## 1 - Logistic Regression

In [None]:
X, y = make_blobs(n_samples=500, centers= [(0, 5), (5, 0)], cluster_std=3, random_state=seed)
y = y.reshape(-1, 1)
fig = plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=y)
plt.title("Dataset")
plt.xlabel("First feature")
plt.ylabel("Second feature")
plt.show()

Divide the data, using 80% for training and 20% for testing.

In [None]:
## START CODE HERE

## END CODE HERE
print(f'Shape X_train: {X_train.shape}')
print(f'Shape y_train: {y_train.shape}')
print(f'Shape X_test: {X_test.shape}')
print(f'Shape y_test: {y_test.shape}')

Logistic regression is a foundational method widely used for modeling the relationship between a binary or categorical dependent variable and one or more independent variables. It is particularly well-suited for classification tasks, where the objective is to predict a discrete outcome, often represented as class labels (e.g., 0 or 1).

Here we'll focus on logistic regression for binary classification. In logistic regression, the model estimates the probability that an instance belongs to a particular class. This estimation is achieved by applying the logistic function, also known as the sigmoid function, to the linear combination of the independent variables ($w \cdot x + b$). The logistic regression equation can be expressed as:

$$ f_{\mathbf{w},b}(x) = \boldsymbol{\hat{y}} = \sigma(\mathbf{w}\cdot \mathbf{x} + b) $$
where function $\sigma$ is the Sigmoid function. The sigmoid function is defined as:

$$\sigma(z) = \frac{1}{1+e^{-z}}$$

In this equation:
- $x$: The independent variable
- $w$: the coefficient or weights which signifies the influence of an independent variable on the dependent variable 
- $b$: The intercept or bias  

The primary objective of logistic regression is to identify the optimal values of $w$ and $b$ that align best with the data. This optimization process typically involves maximizing the likelihood of the observed data or minimizing a chosen cost function, such as the log-likelihood or cross-entropy loss. Commonly employed optimization techniques include gradient descent.
Cross-entropy cost funcntion is as follows:
$$J(\boldsymbol{w},b) = - \frac{1}{m} \sum_{i=1}^m [ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})]$$
We want to model the probability of the target values being 0 or 1. So during training we want to adapt our parameters such that our model outputs high values for examples with a positive label (true label being 1) and small values for examples with a negative label (true label being 0). This is reflected in this equation.

Might be suprising but if you do the derivation from calculus, you'll find out that the formula for computing partial derivatives of the cost function with respect to each parameter is exqctly like what it was for linear regression:
$$\frac{\partial J(\mathbf{w},b)}{\partial w_j}  = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)})x_{j}^{(i)}$$
$$\frac{\partial J(\mathbf{w},b)}{\partial b}  = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (\hat{y}^{(i)} - y^{(i)})$$
* m is the number of training examples in the dataset

Yeeeah! but No! Here, in logistic regression the definition of $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ is different!

With help of formulas given above and your own knowladge, complete the following code. 
**You are encouraged  to use numpy's built-in functions like: `np.exp`, `np.sum`, `np.argmax` etc.
You may not use scikit-learn's built-in functions to facilitate the following code.**

In [None]:
np.random.seed(seed)
class LogisticRegression:
    def __init__(self):
        pass

    def sigmoid(self, z):
        '''
        Args:
            z: numpy array of shape = (n_samples, 1)
        Returns:
            g: numpy array of shape = (n_samples, 1)
        '''
        ## START CODE HERE

        ## END CODE HERE

    def train(self, X, y, n_iters, lr):
        '''        
        Args:
            X: numpy array of shape = (n_samples, n_features)
            y: numpy array of shape = (n_samples, 1)
            n_iters: number of iterations. scalar
            lr: learning rate. scalar
        Returns:
            w: numpy array of shape = (n_features, 1)
            b: bias. scalar
            costs: list of cost for each iteration
        '''
        n_samples, n_features = X.shape
        ## START CODE HERE
        # Initialize weights and bias to zero values, and ensure you monitor their dimensions.
        
        
        ## END CODE HERE
        costs = []
        
        for i in range(n_iters):
            ## START CODE HERE
            # Step 1: Compute a linear combination of the input features and weights

            # Step 2: Apply the Sigmoid activation function
            
            # Step 3: Compute the cost

            # Step 4: Compute the gradients

            # Step 5: Update the parameters

            # Also print the cost each 10 iterations

            ## END CODE HERE

    def predict(self, X):
        '''
        Args:
            X: numpy array of shape = (n_samples, n_features)
        Returns:
            numpy array of shape = (n_samples, 1) with predicted classes.(assume threshhold is 0.5)
        '''
        ## START CODE HERE
    
        ## END CODE HERE

In [None]:
regressor = LogisticRegression()
w_trained, b_trained, costs = regressor.train(X_train, y_train, n_iters=, lr=) # choose learning rate and number of iterations, you think is best.

plt.plot(np.arange(len(costs)), costs)
plt.title("Cost during training")
plt.xlabel("Number of iterations")
plt.ylabel("Cost")
plt.show()

In [None]:
y_pred = regressor.predict(X_test)
df = pd.DataFrame({'y_test': y_test.squeeze(), 'y_pred': y_pred.squeeze()})
df

### Evaluation Metrics
These are some metrics for classification tasks provide insights into a model's ability to correctly classify instances into different classes:
- true positive (TP): The model classifies the example as positive, and the actual label also positive.
- false positive (FP): The model classifies the example as positive, but the actual label is negative.
- true negative (TN): The model classifies the example as negative, and the actual label is also negative.
- false negative (FN): The model classifies the example as negative, but the label is actually positive.  

- **Accuracy**: Measures the proportion of correct predictions.
  $$accuracy = \frac{\text{true positives} + \text{true negatives}}{\text{true positives} + \text{true negatives} + \text{false positives} + \text{false negatives}}$$

- **Precision**: Quantifies the accuracy of positive predictions.
  $$precision = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}$$

- **Recall**: Evaluates the model's capability to capture all positive instances.
  $$recall = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}$$

- **F1 Score**: A composite metric that balances precision and recall.
  $$F1 = \frac{2 \times \text{precision} \times \text{recall}}{\text{precision} + \text{recall}}$$

In [None]:
def compute_binary_classification_eval_metrics(y_true, y_pred):
    # START CODE HERE
    
    
    # END CODE HERE

    print(f"Accuracy: {accuracy}")
    print(f"True Positives (tp): {tp}")
    print(f"True Negatives (tn): {tn}")
    print(f"False Positives (fp): {fp}")
    print(f"False Negatives (fn): {fn}")
    print(f"Precision: {precision}")
    print(f"Recall: {recall}")
    print(f"F1 Score: {f1}")
compute_binary_classification_eval_metrics(y_test, y_pred)