üîπ Section 2: Logistic Regression
üéØ Goal:
Learn how to classify data into categories (e.g., yes/no, true/false) using logistic regression ‚Äî a key model for binary classification problems.
‚úÖ What You‚Äôll Learn:
What logistic regression is and how it differs from linear regression
How to turn categorical variables into numerical labels (label encoding)
How to train and test a logistic regression model using scikit-learn
How to evaluate classification performance using:
Accuracy
Confusion matrix
Classification report (precision, recall, F1-score)
üìå Real-World Scenario:
You‚Äôll use the tips dataset to predict whether someone is a smoker based on features like total_bill, tip, size, etc.

‚úÖ Step 1: Load and explore the dataset

In [2]:
import pandas as pd

# Load dataset
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')
df.head()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


‚úÖ Step 2: Convert categorical target to numeric


In [3]:
from sklearn.preprocessing import LabelEncoder

# Encode 'smoker' column: Yes = 1, No = 0
le = LabelEncoder()
df['smoker_encoded'] = le.fit_transform(df['smoker'])
df[['smoker', 'smoker_encoded']].head()


Unnamed: 0,smoker,smoker_encoded
0,No,0
1,No,0
2,No,0
3,No,0
4,No,0


üîé Output: Table showing smoker values mapped to 1s and 0s.

‚úÖ Step 3: Select features and target

In [4]:
X = df[['total_bill', 'tip', 'size']]  # Features
y = df['smoker_encoded']              # Target

üéØ You‚Äôve now separated your input and output data.

‚úÖ Step 4: Split into training and test sets

In [5]:
from sklearn.model_selection import train_test_split

# Split data 80/20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

üì¶ Output: Training and test sets created.

‚úÖ Step 5: Train the logistic regression model


In [6]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

‚öôÔ∏è Output: Model is trained.

‚úÖ Step 6: Make predictions

In [7]:
y_pred = model.predict(X_test)
y_pred[:10]  # Show first 10 predictions

array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])

üîç Output: Predicted smoker (0 = No, 1 = Yes).

‚úÖ Step 7: Evaluate model performance

In [8]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=le.classes_))


Accuracy: 0.6530612244897959

Confusion Matrix:
 [[29  2]
 [15  3]]

Classification Report:
               precision    recall  f1-score   support

          No       0.66      0.94      0.77        31
         Yes       0.60      0.17      0.26        18

    accuracy                           0.65        49
   macro avg       0.63      0.55      0.52        49
weighted avg       0.64      0.65      0.59        49



üìä Output: Accuracy score, confusion matrix, and precision/recall/F1.