6)	Implement a logistic regression model for the built-in digits dataset

Logistic regression aims to solve classification problems. It does this by predicting
categorical outcomes, unlike linear regression that predicts a continuous outcome. In the
simplest case there are two outcomes, which is called binomial, an example of which is
predicting if a tumor is malignant or benign. Other cases have more than two outcomes to
classify; in this case it is called multinomial. A common example for multinomial logistic
regression would be predicting the class of an iris flower between 3 different species.
The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-valued
set of independent variables input into a value between 0 and 1. This function is known as
the logistic function.

Logistic Function – Sigmoid Function
 The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
 It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
 The S-form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.

Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns

# Load the digits dataset
digits = load_digits()
X, y = digits.data, digits.target

# i. Data scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# ii. Training and testing of the model
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# iii. Create the regression model
base_model = LogisticRegression(max_iter=1000)
model = OneVsRestClassifier(base_model)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# iv. Display confusion matrix and classification report
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Visualize the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# v. Display k-fold cross-validation score
cv_scores = cross_val_score(model, X_scaled, y, cv=5)
print("\nCross-validation scores:", cv_scores)
print("Mean CV score:", cv_scores.mean())
print("Standard deviation of CV scores:", cv_scores.std())