## 📈 Linear Regression

Linear Regression is a method used to model the relationship between a dependent variable `y` and an independent variable `x` by fitting a straight line.

### ✅ Steps:

**Step 1:** Find the pattern in the old (historical) data  
**Step 2:** Fit a straight line through the data  
**Step 3:** Use the line to make predictions

### 🧮 Line Equation:

\[
y = mx + b
\]

- `y` = predicted value  
- `m` = slope of the line  
- `x` = input (independent variable)  
- `b` = y-intercept (value of `y` when `x = 0`)


In [2]:
from sklearn.linear_model import LinearRegression

X = [[1], [2], [3], [4], [5], [6]]
y = [40, 50, 65, 75, 90, 100]

model = LinearRegression()
model.fit(X, y)

hours = float(input("How many hours have you studied? "))
predict_marks = model.predict([[hours]])

print(f"Predicted marks: {predict_marks[0]:.2f}")


Predicted marks: 874.71


## 🔐 Logistic Regression

Logistic Regression is a classification algorithm used to predict **binary outcomes** (e.g., Yes/No, 0/1, Pass/Fail).

---

### ✅ Steps:

**Step 1:** Collect labeled data (with class labels like 0 and 1)  
**Step 2:** Find a pattern between inputs (X) and the binary output (y)  
**Step 3:** Apply the logistic (sigmoid) function to map values between 0 and 1  
**Step 4:** Predict the class based on a threshold (usually 0.5)

---

### 📈 Logistic (Sigmoid) Function:

\[
\sigma(z) = \frac{1}{1 + e^{-z}}
\]

Where:

- `z = mx + b` (similar to linear regression)
- `σ(z)` = probability that output belongs to class 1
- Output is always between **0 and 1**

---

### 🧮 Prediction Rule:

If  
\[
\sigma(z) \geq 0.5 \Rightarrow \text{Class 1 (Positive)}
\]  
Else  
\[
\sigma(z) < 0.5 \Rightarrow \text{Class 0 (Negative)}
\]

---

### 📌 Use Case Examples:

- Spam detection (Spam or Not Spam)  
- Disease diagnosis (Positive or Negative)  
- Exam result (Pass or Fail)


In [3]:
from sklearn.linear_model import LogisticRegression

X = [[1], [2], [3], [4], [5], [6], [7], [8]]
y = [0, 0, 0, 0, 1, 1, 1, 1]

model = LogisticRegression()
model.fit(X, y)

hours = float(input("How many hours have you studied? "))
predict_result = model.predict([[hours]])

if predict_result[0] == 1:
    print("Predicted result: Pass")
else:
    print("Predicted result: Fail")


Predicted result: Fail


# 🧠 K-Nearest Neighbors (KNN) – Choosing `k = 2`

In K-Nearest Neighbors (KNN), the `k` value represents the **number of nearest neighbors** used to classify a new data point.

## 🔢 Why Choose `k = 2`?

- With `k = 2`, the algorithm looks at the **2 closest neighbors** to a new input.
- The predicted class is based on the **majority class** of these 2 neighbors.
- It helps **reduce the impact of outliers** compared to `k = 1`.
- Suitable when you want the model to be **slightly less sensitive** to noise than with `k = 1`, but still responsive to local patterns.

## ⚠️ Considerations

- `k = 2` can lead to **ties** (1 vote for each class), so many implementations break ties by:
  - Picking the class with the **lower label** (e.g., `0` over `1`)
  - Or **randomly selecting** between the tied classes
- Small `k` values like 1 or 2 make the model **more flexible** but also more sensitive to noise.

## ✅ When is `k = 2` a Good Choice?

- When you have **balanced data**.
- When the dataset is **small**, and you want a **quick, simple classifier**.
- For **experimentation** or to observe how classification boundaries shift with different `k`.


In [5]:
from sklearn.neighbors import KNeighborsClassifier

X = [
    [180, 7],
    [200, 7.5],
    [250, 8],
    [300, 8.5],
    [330, 9],
    [360, 9.5]
]

y = [0, 0, 0, 1, 1, 1]

model = KNeighborsClassifier(n_neighbors=2)
model.fit(X, y)

sample = [[320, 9]]
prediction = model.predict(sample)

print(f"Predicted class: {prediction[0]}")


Predicted class: 1


# 🌳 Decision Tree Classifier – Understanding the Basics

A **Decision Tree Classifier** is a supervised learning algorithm that splits the data into branches based on feature values, forming a tree-like structure to make decisions.

## 🧠 Why Use a Decision Tree?

- It is **easy to understand and interpret**.
- Can handle both **numerical and categorical** data.
- **No need for feature scaling**.
- Works well for both **classification** and **regression** problems.

## 🔍 How It Works

- The model splits data by asking a series of **yes/no questions** (based on feature thresholds).
- Each internal node represents a **decision rule**.
- Each leaf node represents a **predicted class**.
- The algorithm chooses splits that **maximize information gain** (or minimize impurity).

## ⚠️ Considerations

- Decision Trees can **overfit** if not properly controlled.
- They are sensitive to **small changes in data**.
- To improve performance:
  - Set limits like `max_depth`, `min_samples_split`
  - Use **pruning** or **ensemble methods** (like Random Forests)

## ✅ When is a Decision Tree a Good Choice?

- When you need a **model that is easy to explain** to non-technical stakeholders.
- When **interpretability** is more important than raw accuracy.
- When working with **tabular, structured data**.
- When the dataset has **clear logical rules** or thresholds.

## 🧾 Example Use Cases

- Medical diagnosis (e.g., disease vs. no disease)
- Customer segmentation (e.g., high value vs. low value)
- Loan approval decisions (approve vs. reject)


In [4]:
from sklearn.tree import DecisionTreeClassifier

# Features: [age, hours_of_sleep]
X = [
    [25, 8],   # young, sleeps well → tea
    [30, 7],   # adult, normal sleep → tea
    [45, 5],   # older, less sleep → coffee
    [50, 4],   # older, little sleep → coffee
]

# Target labels: 0 = tea, 1 = coffee
y = [0, 0, 1, 1]

# Create and train the model
model = DecisionTreeClassifier()
model.fit(X, y)

# Predict for a new sample
sample = [[50, 6]]  # middle-aged, moderate sleep
prediction = model.predict(sample)

print(f"Predicted class: {prediction[0]}")  # Output will be 0 (tea) or 1 (coffee)


Predicted class: 1
