‚úÖ Concept: Logistic Regression
(Logistic Regression is the foundation of binary classification. Very important.)

üß© What is Logistic Regression?
- Despite its name, it's used for classification, not regression.
- It models the probability that a data point belongs to a particular class (like spam or not spam).

üîç Why it matters:
- Fast, interpretable, works well on linearly separable data.
- Foundation for neural nets (sigmoid activation!).
- Interviewers often test your understanding of this model.

‚úÖ 2. What is Linearly Separable Data?
Data is linearly separable when you can draw a straight line (or plane) to separate the classes.

2D Example:
If you have this:

Class 0: (1, 2), (2, 3), (3, 4)
Class 1: (6, 7), (7, 8), (8, 9)
A straight line like x + y = 10 can perfectly separate them.

‚úÖ Logistic Regression works best when such a straight line exists to separate classes.

If it's not separable linearly (like spiral patterns), we use models like SVM (with kernel) or Neural Networks.

üìà The Idea:
Instead of predicting numbers like linear regression, logistic regression predicts a probability between 0 and 1 using the sigmoid function:

Sigmoid(z)= 1/ (1+e)^‚àíz
Where z = wX + b

In [1]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Load data
iris = load_iris()
X = iris.data  # features
y = (iris.target == 0).astype(int)  # classify only "setosa" vs others (binary classification)

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 1: Model
model = LogisticRegression()

# Step 2: Train
model.fit(X_train, y_train)

# Step 3: Predict
preds = model.predict(X_test)

# Step 4: Evaluate
print(classification_report(y_test, preds))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        20
           1       1.00      1.00      1.00        10

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



üß™ What‚Äôs happening behind the scenes:

- Fits weights w and bias b to maximize likelihood of correct classification.
- Applies sigmoid to output probabilities.
- Thresholds (default 0.5) to give final class label (0 or 1).

In [2]:
# üß™ Mini Project: Email Spam Classifier
# üéØ Goal: Use logistic regression to classify emails as Spam or Not Spam.

# Step 1: Load Dataset
# Use a dataset like SMSSpamCollection or use sklearn.datasets.
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv", sep="\t", header=None, names=['label', 'message'])
print(df.head())


  label                                            message
0   ham  Go until jurong point, crazy.. Available only ...
1   ham                      Ok lar... Joking wif u oni...
2  spam  Free entry in 2 a wkly comp to win FA Cup fina...
3   ham  U dun say so early hor... U c already then say...
4   ham  Nah I don't think he goes to usf, he lives aro...


In [3]:
# Step 2: Preprocess
# Label encode spam ‚Üí 1, ham ‚Üí 0
# Convert text to numeric using TfidfVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

df['label'] = df['label'].map({'ham': 0, 'spam': 1})
X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)


In [4]:
# Step 3: Train Logistic Regression
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train_vec, y_train)


0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100


In [5]:
# Step 4: Evaluate
from sklearn.metrics import accuracy_score, classification_report

y_pred = model.predict(X_test_vec)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.9748878923766816
              precision    recall  f1-score   support

           0       0.97      1.00      0.99       966
           1       1.00      0.81      0.90       149

    accuracy                           0.97      1115
   macro avg       0.99      0.91      0.94      1115
weighted avg       0.98      0.97      0.97      1115



‚úÖ Deliverables:
Trained model

Confusion matrix / accuracy / precision

Comment: "If the model misclassifies, what could improve it?"