In [22]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import classification_report, confusion_matrix

In [4]:
dataset = load_breast_cancer()

In [6]:
dataset.feature_names

array(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error',
       'fractal dimension error', 'worst radius', 'worst texture',
       'worst perimeter', 'worst area', 'worst smoothness',
       'worst compactness', 'worst concavity', 'worst concave points',
       'worst symmetry', 'worst fractal dimension'], dtype='<U23')

In [10]:
dataset.target_names

array(['malignant', 'benign'], dtype='<U9')

In [12]:
features= dataset.data
labels = dataset.target

In [14]:
x_train, x_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=3)

In [26]:
model = LogisticRegression(max_iter=500)
model.fit(x_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [28]:
ypred=model.predict(x_test)

In [30]:
print(classification_report(y_test, ypred))

              precision    recall  f1-score   support

           0       0.90      0.88      0.89        40
           1       0.93      0.95      0.94        74

    accuracy                           0.92       114
   macro avg       0.92      0.91      0.91       114
weighted avg       0.92      0.92      0.92       114



In [32]:
print(confusion_matrix(y_test, ypred))

[[35  5]
 [ 4 70]]


## Classification report analysis

The model predicts class 1 especially well (high precision and recall), which is important to clasify if a tumor is malignant or benign.

Overall accuracy is 92%—quite strong.

Suppose class 1 represents "cancer" and class 0 represents "no cancer":

Precision (Cancer = 0.93): Of all patients the model predicted as “cancer,” 93% truly have cancer.

Real-world example: If the model says 100 patients have cancer, 93 actually do, and 7 are false alarms.

Recall (Cancer = 0.95): Of all real cancer patients, 95% were correctly detected (“caught”) by the model.

Real-world example in product: Nearly all patients with cancer will be flagged, so fewer cases slip by undetected.

F1-Score (Cancer = 0.94): Combines both precision and recall, so a high value means the model hardly ever misses (false negatives) or creates unnecessary alarm (false positives).

## Confusion Matrix analysis
35 True Negatives (class 0 correctly predicted)

5 False Positives (class 0 predicted as 1)

4 False Negatives (class 1 predicted as 0)

70 True Positives (class 1 correctly predicted)

True Negative (35): 35 healthy people were correctly identified as healthy.

False Positive (5): 5 healthy people were wrongly told they have cancer (leading to unnecessary stress and tests).

False Negative (4): 4 patients with cancer were missed (very serious).

True Positive (70): 70 patients with cancer were correctly diagnosed.