***Logistic Regression***



a. Load the Iris dataset as a Pandas data frame with 4 input features and the target feature. Convert the target value so that 0 means Setosa and 1 means Not Setosa.

In [None]:
from sklearn import datasets
iris = datasets.load_iris()

In [None]:
import pandas as pd
iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
iris_df['target'] = iris['target']
iris_df['target'] = (iris['target'] != 0).astype(int)
iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


b. Create a list the parameters called `theta`, and initialize all parameter values to zero.

In [None]:
import numpy as np
theta = np.zeros(5)
theta

array([0., 0., 0., 0., 0.])

c. Define a function `model(data, theta)` that represents the logistic regression model. Here `data` represents a data frame containing multiple records. Apply `model` to the entire dataset and the initial `theta`, and print the initial predictions.


In [None]:
import numpy as np

def sigmoid(z):
    """
    Sigmoid function to squash the output to a range between 0 and 1.
    """
    return 1 / (1 + np.exp(-z))

def model(data, theta):
    """
    Logistic regression model function.
    """
    data_with_bias = np.hstack((np.ones((data.shape[0], 1)), data))

    z = np.dot(data_with_bias, theta)


    return sigmoid(z)


In [None]:
import pandas as pd
from sklearn import datasets


iris = datasets.load_iris()
iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])

iris_df['target'] = (iris['target'] != 0).astype(int)

theta = np.zeros(5)

predictions = model(iris_df.drop('target', axis=1), theta)

print(predictions)


[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
 0.5 0.5 0.5 0.5 0.5 0.5]


d. Define a function `cost(y_true, y_pred)` that returns the average cross entropy cost between a list of true labels and a list predicted probabilities. Display the cost of those initial predictions.

In [None]:
def cost(y_true, y_pred):
    """
    Compute the average cross entropy cost between true labels and predicted probabilities.
    """

    epsilon = 1e-15
    cross_entropy = -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))

    return cross_entropy


In [None]:
initial_cost = cost(iris_df['target'], predictions)


print("Initial Cost:", initial_cost)


Initial Cost: 0.6931471805599433


e. Define a function `gradient(data, theta)` that calculates the gradient of the cost function. The the j-th component of the gradient vector is calculated by the following formula:
- for $\theta_0$:
$$
\frac{\partial J}{\partial \theta_0}(\theta) = \frac{1}{m}\sum_{i=1}^m\big(model(x^{(i)}) - y^{(i)}\big)
$$
- for $j=1,2,3,4$:
$$
\frac{\partial J}{\partial \theta_j}(\theta) = \frac{1}{m}\sum_{i=1}^m\big(model(x^{(i)}) - y^{(i)}\big)x^{(i)}_j
$$

Display the gradient vector for the inital theta.

In [None]:
def gradient(data, theta):
    """
    Compute the gradient of the cost function.
    """
    m = len(data)
    data_with_bias = np.hstack((np.ones((m, 1)), data))


    predictions = model(data, theta)
    errors = predictions - data[:, 0]

    grad_0 = np.mean(errors)

    grad_rest = np.mean(errors[:, np.newaxis] * data_with_bias, axis=0)[1:]

    gradient_vector = np.concatenate(([grad_0], grad_rest))

    return gradient_vector


In [None]:
initial_gradient = gradient(iris_df.drop('target', axis=1).values, theta)

print("Gradient Vector for Initial Theta:")
print(initial_gradient)


Gradient Vector for Initial Theta:
[ -5.34333333 -31.904      -16.2942     -21.34606667  -6.92126667]


f. Create a function `update(data, theta, learning_rate)` that performs one gradient update. Apply the `update` function once with `learning_rate=0.001`, and print the updated theta.

In [None]:
def update(data, theta, learning_rate):
    """
    Perform one gradient update.
    """
    grad = gradient(data, theta)
    theta -= learning_rate * grad

    return theta


In [None]:
learning_rate = 0.001
updated_theta = update(iris_df.drop('target', axis=1).values, theta, learning_rate)

print("Updated Theta:")
print(updated_theta)


Updated Theta:
[0.00534333 0.031904   0.0162942  0.02134607 0.00692127]


g. Apply the update function 100 times. Print the value of the cost function after each 10 updates.

In [None]:
num_iterations = 100
updates_per_print = 10


theta = np.zeros(5)

learning_rate = 0.001


for i in range(num_iterations):
    theta = update(iris_df.drop('target', axis=1).values, theta, learning_rate)


    if (i + 1) % updates_per_print == 0:
        predictions = model(iris_df.drop('target', axis=1).values, theta)

        current_cost = cost(iris_df['target'], predictions)

        print(f"Iteration {i+1}: Cost = {current_cost}")


Iteration 10: Cost = 0.8541148150490373
Iteration 20: Cost = 1.579118060163085
Iteration 30: Cost = 2.3442182499072235
Iteration 40: Cost = 3.1124372034783003
Iteration 50: Cost = 3.880941798265103
Iteration 60: Cost = 4.64947531656604
Iteration 70: Cost = 5.4180119401989915
Iteration 80: Cost = 6.1865488413530105
Iteration 90: Cost = 6.955084924834577
Iteration 100: Cost = 7.723610133667861
