
**Author**: Aaryan Samanta

**Organization**: Legend College Preparatory

**Date**: 2025

**Title**: Breast Cancer dataset Dataset - Supervised Learning

**Version**: 1.0

**Type**: Source Code

**Adaptation details**: Based on classroom exercises

**Description**: Understand how a logistic regression model works and how to evaluate its performance on a real dataset.



---


Developed as part of the AI Internship at Legend College Preparatory.
Please note that it is a violation of school policy to copy and use this code without proper attribution and credit acknowledgement.
Failing to do so can constitute plagiarism, even with small code snippets.

In [None]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# 1. Load the Breast Cancer dataset
data = load_breast_cancer()

# 2. Convert to DataFrame and preview it
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df['target_name'] = df['target'].apply(lambda x: data.target_names[x])

# 👉 Display the first 5 rows of the dataset
print("📊 Breast Cancer Dataset Preview:")
print(df.head())

# 3. Prepare features and labels
X = data.data
y = data.target

# 4. Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. Initialize and train the Logistic Regression model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# 6. Make predictions on the test data
y_pred = model.predict(X_test)

# 7. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\n✅ Model Accuracy: {accuracy * 100:.2f}%")
print("\n📄 Classification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# 8. Bonus: Predict on a new sample
sample = [X[0]]  # Using first sample as an example
predicted_class = model.predict(sample)
print(f"\n🔍 Predicted class for the new sample: {data.target_names[predicted_class[0]]}")



📊 Breast Cancer Dataset Preview:
   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst perimeter  worst area