### 1. The Algorithms (The "Brains")
These are the mathematical models we used to find patterns.

| Algorithm | Type | Description | Use When... |
| :--- | :--- | :--- | :--- |
| **Linear Regression** | Regression | Draws a **straight line** through data. | You want to predict a specific **Number** (Salary, Price). |
| **Logistic Regression** | Classification | Uses probability (S-curve) to separate data. | You want to predict a **Category** (Yes/No, Spam/Ham). |
| **Decision Tree** | Classification | Creates a **Flowchart** of If/Else rules. | You want a model that explains *why* it made a decision. |
| **Random Forest** | Ensemble | Creates **100+ Trees** and makes them **Vote**. | You want high accuracy and to know **Feature Importance**. |
| **K-Means** | Clustering | Finds **groups** based on distance. (No labels!) | You have data but **no answers** (Unsupervised). |

---

### 2. The Concepts (The "Workflow")
These are the steps you take in every project.

*   **`X` (Features):** The input data (the questions). usually a Table (2D).
*   **`y` (Target):** The output data (the answers). usually a List (1D).
*   **`train_test_split`:**
    *   *Concept:* Splitting data into **Training** (Textbook) and **Testing** (Exam).
    *   *Why?* To prevent the model from memorizing the answers ("Overfitting").
*   **`.fit(X, y)`:** The "Study" phase. The model learns the patterns.
*   **`.predict(X)`:** The "Test" phase. The model guesses the answers for new data.

---

### 3. Evaluation Metrics (The "Report Card")
How do we know if the model is good?

*   **For Regression (Numbers):**
    *   **MAE (Mean Absolute Error):** "On average, the prediction is wrong by this amount." (Lower is better).
*   **For Classification (Categories):**
    *   **Accuracy Score:** The percentage of correct guesses (e.g., 0.80 = 80%).
    *   **Confusion Matrix:** A grid showing exactly *where* the model made mistakes (e.g., How many times it confused a Cat for a Dog).

---

### 4. Hyperparameters (The "Settings")
These are the knobs **YOU** turn before training to change how the model learns.

*   **`test_size`** (used in Split):
    *   *Definition:* What % of data to save for the exam.
    *   *Standard:* `0.2` (20%).
*   **`random_state`** (used everywhere):
    *   *Definition:* A "seed" number to make sure randomness is the **same** every time you run the code.
    *   *Standard:* `42` (Just a tradition).
*   **`max_depth`** (Decision Tree):
    *   *Definition:* How many "questions" deep the tree can go.
    *   *Why?* A small number (e.g., 2 or 3) keeps the model simple. A huge number makes it memorize (Overfit).
*   **`n_estimators`** (Random Forest):
    *   *Definition:* How many trees to build in the forest.
    *   *Standard:* `100` (More is usually better, but slower).
*   **`n_clusters`** (K-Means):
    *   *Definition:* How many groups you want the computer to find.
*   **`cv`** (Grid Search):
    *   *Definition:* "Cross Validation". Splits the training data into smaller chunks (e.g., 5) to double-check the score.

---

### 5. Advanced Tools (The "Pro" Level)

*   **`joblib`**:
    *   *What:* Saves the trained model to a file (`.pkl`).
    *   *Why:* So you can use the brain in a Website without training again.
*   **`GridSearchCV`**:
    *   *What:* Automatically tries every combination of hyperparameters (e.g., tries 10 trees, then 50, then 100).
    *   *Why:* Finds the **best** model for you automatically.
*   **`Pipeline`**:
    *   *What:* A factory line that connects **Data Cleaning** -> **Model Training**.
    *   *Why:* It prevents data leakage and simplifies code for production (Web Apps).