Logistic Regression is part of Supervised Learning algorithms, family of classifications:

Classification VS Clustering
Classification is the process of predicting a label for a given input data sample. For example, given a set of images of animals, a classification algorithm might predict whether a given image contains a cat or a dog.
Clustering, on the other hand, is the process of grouping data samples into clusters based on their similarity. For example, given a set of customer data, a clustering algorithm might group the customers into clusters based on their purchasing habits.
In summary, classification is used to predict labels for individual data samples, while clustering is used to group similar data samples together.

Sources:
http://www.kt.agh.edu.pl/~kulakowski/ml/03_Logistic_bias_variance.pdfhttps://www.simplilearn.com/tutorials/machine-learning-tutorial/logistic-regression-in-python
https://www.datacamp.com/tutorial/understanding-logistic-regression-python
https://realpython.com/logistic-regression-python/
https://aws.amazon.com/what-is/logistic-regression/#:~:text=Logistic%20regression%20is%20a%20data,outcomes%2C%20like%20yes%20or%20no.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn import metrics

print("Libs Loaded")

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
Each datapoint is a 8x8 image of a digit.

https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html

In [None]:
from sklearn.datasets import load_digits
digits = load_digits()

In [None]:
print(digits.target)

In [None]:
print("Image Data Shape" , digits.data.shape)
print("Label Data Shape", digits.target.shape)

In [None]:
X = digits.data
y = digits.target

In [None]:
X

In [None]:
y

In [None]:
plt.figure(figsize=(20,10))
# https://note.nkmk.me/en/python-for-enumerate-zip/
for i, (image, label) in enumerate(zip(X[:10], y[:10])): # first 10 images than labels
     plt.subplot(3, 6, i + 1)
     plt.title(f'(Id, Num) = ({i}, {label})', fontsize=17)
     plt.imshow(np.reshape(image, (8,8)), cmap="plasma")

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
print(X_test)

In [None]:
# logisticReg = LogisticRegression()
# logisticReg.fit(X_train, y_train)

Regularization is a technique used to prevent overfitting and improve the generalization ability of a model. It does this by adding a penalty term to the objective function (also called the cost function) that the model is trying to minimize. The regularization term is usually a function of the model parameters (also called weights) and it is added to the objective function to penalize large weights. This helps to reduce the complexity of the model and prevent overfitting.

Standardization is a technique used to transform the values of a feature to a standard scale. It is often used when the features have different scales and units, as this can affect the performance of some machine learning algorithms. Standardization is usually done by subtracting the mean of the feature from each value and dividing by the standard deviation. This results in a standard normal distribution with a mean of 0 and a standard deviation of 1.

In [None]:
# https://www.digitalocean.com/community/tutorials/standardscaler-function-in-python
logisticReg = make_pipeline(StandardScaler(), LogisticRegression(fit_intercept=True, penalty='l2'))
logisticReg.fit(X_train, y_train)

In [None]:
predictions = logisticReg.predict(X_test) #[:10]
"OK"

Evaluating performance

In [None]:
score = logisticReg.score(X_test, y_test)
score
# 97%

In [None]:
print(
    f"Classification report for classifier {logisticReg}:\n"
    f"{metrics.classification_report(y_test, predictions)}\n"
)

A confusion matrix is a table that is used to evaluate the performance of a classification model. It is a summary of the model's prediction results on a set of test data for which the true values are known.

In [None]:
conf_mat = metrics.confusion_matrix(y_test, predictions)
conf_mat

In [None]:
plt.figure(figsize=(10, 10))
sns.heatmap(conf_mat, annot=True, square=True, linewidths=.8, cmap='inferno', cbar=False)
plt.title(f'SCORE = {score}', size=30, loc='center')
plt.ylabel('ACTUAL',size=15)
plt.xlabel('PREDICTED', size=15)

In [None]:
wrong_indexes = []
for _id, (predict, actual) in enumerate(zip(predictions, y_test)):
     if predict != actual: wrong_indexes.append(_id)

In [None]:
wrong_indexes
# [52, 71, 133, 149, 159, 222, 234, 239, 244, 339]

In [None]:
y_test

In [None]:
predictions

In [None]:
plt.figure(figsize=(20,10))
for _id, wrong in enumerate(wrong_indexes[:10]):
     plt.subplot(3, 6, _id + 1)
     plt.title(f'(pred, actual) = ({predictions[wrong]}, {y_test[wrong]})', fontsize = 20)
     plt.imshow(np.reshape(X_test[wrong], (8,8)), cmap='plasma')