# ML

Machine Learning (ML) is a field of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It is widely used for tasks such as classification, regression, clustering, and more.


Real-life applications of Machine Learning include spam detection in emails, recommendation systems (like those used by Netflix or Amazon), fraud detection in banking, image and speech recognition, autonomous vehicles, and medical diagnosis.

in simple

In simple terms, Machine Learning is a way for computers to learn from data and make decisions or predictions without being directly programmed for each specific task.



# Artificial Intelligence vs Machine Learning vs Deep Learning

## Artificial Intelligence (AI)
- Broadest field
- Making machines simulate human intelligence
- Example: Smart assistants like Siri

## Machine Learning (ML)
- Subset of AI
- Learning from data without explicit programming
- Example: Spam filters, recommendations

## Deep Learning (DL)
- Subset of ML
- Uses neural networks with many layers
- Example: Face recognition, language translation

                AI
                ↓
              ML
              ↓
              DL


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# -----------------------------
# 1. Create Sample Data
# -----------------------------
# User & Movie Features:
# age, likes_action, likes_comedy, likes_romance, movie_genre
data = {
    "age": [16, 25, 30, 18, 40, 22, 35, 28, 50, 15],
    "likes_action": [1, 1, 0, 1, 0, 1, 0, 1, 0, 1],
    "likes_comedy": [0, 1, 1, 0, 0, 1, 1, 0, 0, 1],
    "likes_romance": [0, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    "movie_genre": ["Action", "Comedy", "Romance", "Action", "Romance",
                    "Comedy", "Romance", "Action", "Romance", "Comedy"],
    "liked_movie": [1, 1, 1, 1, 0, 1, 1, 0, 0, 1]   # Target
}

df = pd.DataFrame(data)

# Convert categorical feature 'movie_genre' to numeric (One-Hot Encoding)
df = pd.get_dummies(df, columns=["movie_genre"], drop_first=True)

# -----------------------------
# 2. Train/Test Split
# -----------------------------
X = df.drop("liked_movie", axis=1)
y = df["liked_movie"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# -----------------------------
# 3. Train Model (Random Forest)
# -----------------------------
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# -----------------------------
# 4. Evaluate
# -----------------------------
y_pred = model.predict(X_test)
print("✅ Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# -----------------------------
# 5. Make a New Prediction
# -----------------------------
# Example user: 21 years old, likes action, no comedy, no romance, movie is "Action"
new_user = pd.DataFrame([{
    "age": 21, "likes_action": 1, "likes_comedy": 0, "likes_romance": 0,
    "movie_genre_Comedy": 0, "movie_genre_Romance": 0
}])

prediction = model.predict(new_user)
print("\n🎬 Will the user like the movie?", "YES" if prediction[0] == 1 else "NO")


✅ Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3


🎬 Will the user like the movie? YES


percentile

# Percentile Calculation (Manual)

**Dataset:**  
[5, 31, 43, 48, 50, 41, 7, 11, 15, 39, 80, 82, 32, 2, 8, 6, 25, 36, 27, 61, 31]

yaml
Copy code

---

## Step 1: Sort the data in ascending order
[2, 5, 6, 7, 8, 11, 15, 25, 27, 31, 31, 32, 36, 39, 41, 43, 48, 50, 61, 80, 82]

yaml
Copy code

Number of data points, \(N = 21\)

---

## Step 2: Percentile Formula

\[
\text{Rank} = \frac{P}{100} \times (N + 1)
\]

Where:  
- \(P\) = percentile you want (e.g., 25, 50, 75, 90)  
- \(N\) = total number of values  

---

### Step 2a: Q1 (25th percentile)

\[
\text{Rank} = \frac{25}{100} \times (21 + 1) = 0.25 \times 22 = 5.5
\]

- Rank 5.5 → between 5th and 6th values  
- 5th value = 8, 6th value = 11  
- Interpolate:  
\[
8 + 0.5 \times (11 - 8) = 8 + 1.5 = 9.5
\]

✅ **Q1 = 9.5**

---

### Step 2b: Median (50th percentile)

\[
\text{Rank} = \frac{50}{100} \times 22 = 11
\]

- Rank = 11 → 11th value = 31  
✅ **Median = 31**

---

### Step 2c: Q3 (75th percentile)

\[
\text{Rank} = \frac{75}{100} \times 22 = 16.5
\]

- Between 16th (43) and 17th (48) values  
- Interpolate:  
\[
43 + 0.5 \times (48 - 43) = 43 + 2.5 = 45.5
\]

✅ **Q3 = 45.5**

---

### Step 2d: 90th percentile

\[
\text{Rank} = \frac{90}{100} \times 22 = 19.8
\]

- Between 19th (61) and 20th (80)  
- Interpolate:  
\[
61 + 0.8 \times (80 - 61) = 61 + 15.2 = 76.2
\]

✅ **P90 = 76.2**

---

## Step 3: Summary Table

| Percentile | Rank  | Value |
|------------|-------|-------|
| Q1 (25%)   | 5.5   | 9.5   |
| Median (50%) | 11  | 31    |
| Q3 (75%)   | 16.5  | 45.5  |
| 90%        | 19.8  | 76.2  |

Notebook:

# Percentile Rank of 15 in the Dataset

**Dataset (sorted):**  


[2, 5, 6, 7, 8, 11, 15, 25, 27, 31, 31, 32, 36, 39, 41, 43, 48, 50, 61, 80, 82]


---

## Step 1: Count values below 15
Values below 15:  


2, 5, 6, 7, 8, 11

- Count = 6

---

## Step 2: Count values equal to 15
- Count = 1

---

## Step 3: Use percentile rank formula

\[
\text{Percentile Rank} = \frac{\text{Number of values below X} + 0.5 \times \text{Number of values equal to X}}{N} \times 100
\]

Where:  
- \(X = 15\)  
- \(N = 21\)

\[
\text{Percentile Rank} = \frac{6 + 0.5 \times 1}{21} \times 100
= \frac{6.5}{21} \times 100
\approx 30.95
\]

---

✅ **Result:**  
The value **15** is approximately at the **31st percentile** in this dataset.