🔰 Step 1: Concept Summary 

🧠 Yeh Regression nahi, Classification hota hai!

Logistic Regression ek supervised learning algorithm hai, lekin yeh classification problems ke liye use hota hai — especially binary classification ke liye (jaise yes/no, spam/not spam, fraud/not fraud).

🧠 Linear Regression continuous value predict karta hai

📊 Logistic Regression discrete class (0 ya 1) predict karta hai

🔣 Step 2: Maths + Visualization
    
    🧪 Equation:
    
Logistic Regression me hum Linear Regression ki tarah ek linear combination banate hain:

            z = w*x + b 

Lekin prediction output continuous number ke bajaye hota hai between 0 and 1 using:

            sigmoid(z) = 1 / (1 + e^(-z))


Yeh sigmoid function decide karta hai ke:

Output 0 ke kareeb ho to class 0

Output 1 ke kareeb ho to class 1

🔍 Visualization:

Sigmoid ka graph "S"-shaped hota hai.
Har input ke corresponding ek probability nikalti hai ki yeh class 1 hoga ya nahi.





⚪ Sigmoid Function ka Role
To solve this, Logistic Regression ek sigmoid function lagata hai jo output ko compress karta hai between 0 and 1.

📉 Sigmoid Function:

f(x)=1/(1+e(−x))

Ye function input ke upar depend karta hai (like Linear regression ka output), aur usko probability me convert karta hai.

🧪 Example:
Agar model output karta hai 0.93 → iska matlab hai:

93% chance hai ki input class 1 me aata hai.

Agar 0.12 → to 12% chance = likely class 0

Decision Boundary:
Probability > 0.5 → class 1

Probability ≤ 0.5 → class 0





🧠 Why is it called "Regression" if it does Classification?

Kyuki internally:

    Model still calculates a linear combination of inputs
    
    But fir usko sigmoid me daal deta hai — jisse prediction classification ban jata hai




| Concept             | Meaning                                                      |
| ------------------- | ------------------------------------------------------------ |
| Logistic Regression | Classification algorithm (not regression)                    |
| Output              | Probability (0 to 1)                                         |
| Decision boundary   | 0.5 usually (can be adjusted)                                |
| Sigmoid Function    | Compress karta hai input into probability form               |
| Use-cases           | Spam detection, medical diagnosis, ad click prediction, etc. |


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


📊 accuracy_score, confusion_matrix, classification_report


✅ 1. accuracy_score(y_true, y_pred)

Yeh batata hai ki total predictions me se kitne sahi the?

📌 Formula:

Accuracy = \frac{Correct\ Predictions}{Total\ Predictions}
]

🔥 Agar accuracy na nikalein to?

Aapko yeh pata hi nahi chalega ki model kaam kar raha hai ya nahi.

High accuracy hone ke bawajood model biased ho sakta hai (imbalanced data me).

✅ 2. confusion_matrix(y_true, y_pred)

Jab hum binary classification karte hain (jaise spam vs non-spam, fraud vs non-fraud, etc.), humare paas do classes hoti hain:

Class 0 → Negative class

Class 1 → Positive class

Yeh ek table hoti hai jo model ki har prediction ka status batati hai:

True Positives (TP): Class 1 sahi predict

    ✅ True Positives (TP)

    🟢 Model ne class 1 ko 1 hi predict kiya — sahi prediction

    👉 Example: Fraud tha (1), aur model ne correctly bol diya ki fraud hai.

False Positives (FP): Class 0 ko galti se 1 predict

    ❌ False Positives (FP)

    🔴 Model ne class 0 ko galti se 1 predict kiya

    👉 Example: Fraud nahi tha (0), par model ne bol diya fraud hai — galti ho gayi.



False Negatives (FN): Class 1 ko galti se 0 predict

    ❌ False Negatives (FN)

    🔴 Model ne class 1 ko galti se 0 predict kiya

    👉 Example: Fraud tha (1), par model ne bol diya safe hai — bada risk!

True Negatives (TN): Class 0 sahi predict

    ✅ True Negatives (TN)

    🟢 Model ne class 0 ko sahi se 0 hi predict kiya
    
    👉 Example: Transaction safe thi (0), aur model ne bhi bola safe — sahi prediction.



🔥 Agar confusion matrix na dekhein to?

Aap samjh nahi paayenge model kahan galti kar raha hai.

Misleading accuracy se fas jaaoge.



✅ 3. classification_report(y_true, y_pred)

classification_report se pata chalta hai model ka performance har class pe kaisa hai

Yeh full detailed report deta hai:

Precision: Kitna sahi 1 predict kiya

Recall: Kitne actual 1s ko model pakad paya

F1-score: Precision + Recall ka balance

Support: Kitne examples the

🔥 Agar ye report skip kar di?

Aap sirf superficial accuracy dekh rahe honge

Deep understanding miss ho jayegi

Biased ya overfitting model ka pata nahi chalega

🔚 Summary for You:

🎯 Precision and Recall — Samajhne ke Tareeke:

🔹 Precision (Positive Predictive Value)

Formula: Precision = TP / (TP + FP)
    
    🧠 Model ne jitni baar "1" bola, usme se kitni baar sahi tha?

    👉 Agar precision low hai, to model unnecessary panic create kar raha hai (bina wajah fraud bol raha hai).

🔹 Recall (Sensitivity / True Positive Rate)

Formula: Recall = TP / (TP + FN)

    🧠 Actual jitne 1 the, unme se kitno ko model ne detect kiya?

    👉 Agar recall low hai, to model important positive cases (like frauds or cancers) miss kar raha hai.

🤔 Ek line me difference:

    Precision: Model ne jab "yes" bola, kitni baar sahi tha?

    Recall: Model ne kitne actual "yes" logon ko pakda?



In [2]:
import numpy as np

# 🔸 Input features (X): height in cm
X = np.array([150, 160, 170, 180, 190]).reshape(-1, 1)

# 🔸 Target labels (y): 0 = Short, 1 = Tall
y = np.array([0, 0, 1, 1, 1])



🔹 X = np.array([...])

Yeh hamara input hai. Imagine karo ki yeh kisi insaan ki height hai. Ham predict karna chahte hain ki koi short hai (0) ya tall hai (1).

🔹.reshape(-1, 1)

Scikit-learn ko input hamesha column format (2D) me chahiye hota hai.

Agar aap ye na karo, to model training me error dega.

For example: [150, 160, 170] is 1D — but [ [150], [160], [170] ] is 2D.

🔹 y = np.array([...])

Yeh target variable hai — hamara ground truth. Model yeh seekhega ki kis height par banda short hai aur kis par tall.

🧪 Step 3: Model Training using LogisticRegression (Model Banana + Train Karna )


In [7]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# 🔸 Step 1: Split data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 🔸 Step 2: Model create karo
model = LogisticRegression()

# 🔸 Step 3: Train karo model ko
model.fit(X_train, y_train)







🔹 train_test_split()

Data ko do parts me divide karte hain:

X_train, y_train: model yeh data se seekhega.

X_test, y_test: model ko test karenge (yeh data model ne kabhi nahi dekha hota).

    ✅ test_size=0.2 → 20% test data

    ✅ random_state=42 → Taaki result har baar same aaye (repeatability ke liye)

🔹 LogisticRegression()

Yeh scikit-learn ka built-in classifier hai jo logistic regression algorithm use karta hai.

Iska kaam hai sigmoid function lagakar probability nikalna — aur phir decide karna ki output 0 hoga ya 1.

🔹 model.fit(X_train, y_train)

Is line me hum model ko train karte hain.

Model height (X) aur label (y) ka relation samajhta hai aur ek decision boundary seekhta hai.'''

🔹 Step 4: Model Evaluation (Dummy Dataset)

In [8]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Predict kar rahe hain
y_pred = model.predict(X_test)

# Accuracy
print("Accuracy Score:", accuracy_score(y_test, y_pred))

# Confusion Matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))


Accuracy Score: 0.0
Confusion Matrix:
 [[0 1]
 [0 0]]
Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00       1.0
           1       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


OUTPUT Explaination :

Accuracy Score: 0.0

Confusion Matrix:

 [[0 1]

 [0 0]]

Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00       1.0
           1       0.00      0.00      0.00       0.0

accuracy                           0.00       1.0
    
macro avg       0.00      0.00      0.00       1.0

weighted avg       0.00      0.00      0.00       1.0


📈 Classification Report Breakdown

For Class 0:

| Metric    | Value | Meaning                                 |
| --------- | ----- | --------------------------------------- |
| Precision | 0.00  | Predicted 0s me se koi bhi correct nahi |
| Recall    | 0.00  | Actual 0s me se koi bhi pakda nahi      |
| F1-score  | 0.00  | Balance of precision & recall bhi zero  |
| Support   | 1.0   | 1 sample tha class 0 ka test me         |

For Class 1:

| Metric    | Value | Meaning                                 |
| --------- | ----- | --------------------------------------- |
| Precision | 0.00  | 1 predict kiya par actual 1 tha hi nahi |
| Recall    | 0.00  | Class 1 ka sample test me tha hi nahi   |
| F1-score  | 0.00  | Not computable, isliye 0                |
| Support   | 0.0   | 0 samples the test me class 1 ke        |

📉 Macro Avg vs Weighted Avg:

| Metric Type  | Meaning                                                      |
| ------------ | ------------------------------------------------------------ |
| Macro Avg    | Dono class ka simple average (0 + 0)/2                       |
| Weighted Avg | Class ke sample size ke hisaab se weighted avg (yahan bhi 0) |


✅ Final Verdict:

📌 Model fail kiya — test set me sirf 1 data point tha aur wo bhi galat predict hua.

🧠 Isse hume ye sikh milta hai:

Jab data bahut kam ho, to evaluation metrics reliable nahi hote.

Always keep test set big enough to get meaningful results.

