# Supervised Learning

* Supervised learning ek aisi ML technique hai jisme hum model ko labeled data dete hai.
Labeled data ka matlab hota hai ki har input ke saath uska correct output diya hota hai.

* Model ka kaam hota hai ki:

    - Pehle training ke time input aur output ka relation seekhe

    - Phir naye data par sahi prediction kare

* Isse “supervised” isliye bolte hain kyunki training ke dauran model ko teacher-like guidance milti hai, jisme har example ka correct answer diya hota hai.

* Example to easily understand

    - Agar tumhare paas ghar ke prices ka data hai:

                - Area	 Bedrooms	Price
                - 1200	 2	        35 lakh
                - 1500	 3	        50 lakh
                - 900	 2	        28 lakh

* Yaha Price target/output hai.
Model seekh lega ki price kaise change hota hai aur phir naye ghar ka rate predict karega.
This is supervised regression.

## Regression vs Classification

* Supervised learning me hum features ko use karke target predict karte hain.

* Target do types ka hota hai:



### Regression

* Jab target continuous number hota hai.

* Examples :

    - ghar ki price
    - temperature
    -  salary estimate

* Model ek straight line ya curve fit karta hai jo number predict kare.

### Classification

* Jab target category hota hai (label form).

* Examples:

    - email spam ya not spam

    - image cat ya dog

    - heart disease yes ya no

* Model boundaries banata hai jisse woh bata sake input kaun se class me jayega.

* Why in AI ML:

    - Yeh dono machine learning ke base pillars hain.
    - Har real project inhi me divide hota hai.
    - Tumhara problem type decide karta hai kaun sa algorithm use karna hai.

#### Simple Regression (Price Predict Example)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Step 2: Input Data (X) aur Output Data (y)
# Yaha X = Ghar ka size (sq feet)
# Aur y = Ghar ki price (lakhs)

X = np.array([[500], [600], [800], [1000], [1200]])  # input features
y = np.array([30, 35, 45, 55, 70])                   # target values

# Step 3: Train-test split
# test_size = 0.2 matlab 20% data testing ke liye rakhenge
# random_state fix karne se result repeatable hota hai

X_train , X_test , y_train , y_test = train_test_split(X , y , test_size=0.2 , random_state=42)

# Step 4: Model create karna
reg = LinearRegression()

# Step 5: Model train karna || Model data ko learn karta hai
reg.fit(X_train , y_train)

# Step 6: Prediction || Model new area ke liye price predict karta hai.
y_pred = reg.predict(X_test)

# Step 7: Print output
print("Input Test Data:", X_test)               # model ko input diya
print("Actual Price:", y_test)                 # real price
print("Predicted Price:", y_pred)              # model ka output

# Step 8: Coefficients check karna
print("Slope (Weight):", reg.coef_)        # coef_ batata hai area badhne par price kitna badhta hai 
print("Intercept:", reg.intercept_)            # base price || intercept base value hoti hai
 

Input Test Data: [[600]]
Actual Price: [35]
Predicted Price: [34.57943925]
Slope (Weight): [0.05607477]
Intercept: 0.9345794392523388


#### Simple Classification (Pass/Fail Prediction)

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression
# Logistic Regression : Ye regression naam hai par kaam classification ka karta hai.
from sklearn.model_selection import train_test_split

# Step 2: Input + Output Data
# X = study hours
# y = 1 means pass, 0 means fail

X = np.array([[1], [2], [3], [4], [5], [6]])    # hours
y = np.array([0, 0, 0, 1, 1, 1])                # pass/fail label

# Step 3: Train-test split
X_train , X_test , y_train , y_test = train_test_split(X , y , test_size=0.3 , random_state=10)

# Step 4: Model define
lreg = LogisticRegression()

# Step 5: Train model
lreg.fit(X_train , y_train)

# Step 6: Predict
# logreg.predict(X_test)
# Agar probability > 0.5 → 1 (pass)
# Agar probability < 0.5 → 0 (fail)

y_pred = lreg.predict(X_test)

# Step 7: Print results
print("Test Input:", X_test)
print("Actual Labels:", y_test)
print("Predicted Labels:", y_pred)

# Step 8: Accuracy check
print("Accuracy:", lreg.score(X_test, y_test))

Test Input: [[3]
 [6]]
Actual Labels: [0 1]
Predicted Labels: [1 1]
Accuracy: 0.5


### Linear Regression

* Linear regression ek mathematical model hota hai jo ek straight line draw karta hai jo best fit ho data ke beech me.

* Formula hota hai:

    - Y = mX + c

    - m = slope

    - c = intercept

* Model try karta hai ki predicted values aur real values ke beech gap minimum ho.

* Why in AI ML:

    - Use hota hai numbers estimate karne ke liye.

    - Training fast hoti hai aur explain karna easy hota hai.

* Example (Python above):

    - Model ne training ke baad output diya:

    - Coefficient: 0.05607477
    
    - Intercept: 0.9345794392523388


* Model ki MSE = 2.61

* Comment:

    - Ye loss batata hai ki model predictions me average error kitna hua.

### Logistic Regression

* Naam bhale regression ho, par yeh classification ke liye use hota hai.
Yeh probability predict karta hai ki sample kis class me jayega.

* Example:

    - probability = 0.82

    - means 82 percent chance class = 1

* Phir threshold or accuracy score (0.5) se final label decide hota hai.

* Why in AI ML:

    - Binary classification ka most simple aur powerful model hoga.
    - Interpret karna bhi easy hota hai.

* Example Output:

    - Accuracy: 0.80

* Comment:

    - Matlab 80 percent test samples sahi predict hue.

## KNN (K Nearest Neighbors) Basics

* KNN ek distance based algorithm hota hai.
Jab koi new point aata hai, model uske sabse pass wale K points dekhta hai.

* Agar K = 3 aur 3 me se 2 points class 1 ke hain to prediction = class 1

* Why in AI ML:

    - Simple, intuitive, non linear patterns bhi pakad leta hai.

    - Training instant hoti hai kyunki model store only data.

* Example Output:

    - Accuracy: 0.85

* Comment:

    - Scaling use kiya gaya taaki distance calculation fair ho jaye. KNN me scaling zaroori hota hai.

In [None]:
# KNN (K-Nearest Neighbors) Example 
# Predict: fruit type (apple = 0, orange = 1) based on weight + size.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

# Input Features:
# weight(g), size(cm) apple or orange

X = np.array([
    [150, 7],   # apple
    [160, 7],   # apple
    [170, 8],   # apple
    [250, 10],  # orange
    [260, 11],  # orange
    [270, 12]   # orange
])

# Output labels (0 = apple, 1 = orange)

y = np.array([0, 0, 0, 1, 1, 1])

# Train-test split
X_train , X_test , y_train , y_test = train_test_split(X , y , test_size=0.3 , random_state=20)

# Model define karna
knn = KNeighborsClassifier(n_neighbors=3) # 3 nearest neighbors

# Train karna
knn.fit(X_train , y_train)

# Prediction
y_pred = knn.predict(X_test)


# output
print("Test Input:", X_test)
print("Actual Labels:", y_test)
print("Predicted Labels:", y_pred)

# Accuracy
print("Accuracy:", knn.score(X_test, y_test))

Test Input: [[160   7]
 [260  11]]
Actual Labels: [0 1]
Predicted Labels: [0 1]
Accuracy: 1.0


## Train Model, Predict, Evaluate

#### Train Model

* Model data ka pattern learn karta hai.

* Line of best fit learn kar raha hai.

In [None]:
lr_model.fit(X_train_reg, y_train_reg)


#### Predict:

* Model unseen data ka output guess karta hai.

* Test data ke liye continuous values predict ki.

In [None]:
y_pred_reg = lr_model.predict(X_test_reg)


#### Evaluate:

* Accuracy, MSE, F1 score jaise metrics se model ka performance judge hota hai.

* Example:

    - Regression MSE = 2.61

    - Logistic Accuracy = 0.80

    - KNN Accuracy = 0.85

## Loss Functions Intro

* Loss function bataata hai model ki prediction aur actual value ke beech gap kitna hai. Model ka goal loss ko minimum karna hota hai.

* Types (Basics):

    1. Regression Loss

        - MSE (Mean Squared Error):

            - Square error leke average nikala jata hai.

            - Large mistakes ko heavily penalize karta hai.

    * Example:

        - MSE = 2.61 ( MSE = average( (real_output - predict_output)^2 ) )

    * Comment:
    
        - Lower MSE ka matlab model better.

    2. Classification Loss

        - Log Loss / Binary Cross Entropy: 

            - Logistic regression me use hota hai.

            - Probability ke form me penalty deta hai.

* Why in AI ML:

    - Loss function hi batata hai model kitna behtar ho raha hai. Training ka main objective hota hai loss reduce karna.

In [16]:
# Loss Function Example (Manual Calculation)

y_actual = np.array([40, 50, 60]) # real output
y_pred = np.array([38, 52, 63])  # model predictions

mse = np.mean((y_actual - y_pred)**2) # Formula

print("MSE:", mse)   # jitni chhoti value, utna achcha model

MSE: 5.666666666666667
