Here is the **Master Summary** of everything we learned today. You can copy and paste this directly into your Jupyter Notebook (Markdown Cell) to keep as your "Cheat Sheet".

-----

# ðŸ“˜ Data Science Interview Cheat Sheet

**Strategy:** "From Daily Life to Finance"

## 1\. Data Cleaning (The Foundation)

Before AI can learn, the data must be clean.

### **A. Handling Missing Data (NaN)**

  * **Concept:** Do we delete the row or guess the value?
  * **Life Example:** A song missing the "Artist Name" in a playlist.
  * **Finance Example:** A transaction missing a "Merchant Name" (You can't delete money\!).
  * **Code:**
    ```python
    # Option 1: Drop (Delete)
    df.dropna()

    # Option 2: Fill (Impute)
    df.fillna(df.mean())
    ```

### **B. String to Number (Cleaning "Dirty" Text)**

  * **Concept:** Computers can't do math on symbols (`$`, `,`, `+`).
  * **Life Example:** Buying an apple for "$5.00". You type `5.00` in the calculator, not `$\`.
  * **Code:**
    ```python


     # 1. Remove Symbols (Regex=False is safer for beginners!)
    df['Price'] = df['Price'].str.replace('$', '', regex=False)

    # 2. Convert to Number
    df['Price'] = pd.to_numeric(df['Price'])
    ```

### **C. Duplicates (The Trap\!)**

  * **Concept:** `duplicated()` counts copies. `nunique()` counts originals.
  * **Life Example:** 10,840 signatures on a class sheet, but only 34 unique students.
  * **Code:**
    ```python
    # Count the bad copies
    print(df.duplicated().sum())

    # Count the unique items (The truth)
    print(df['App'].nunique())

    # Remove duplicates
    df = df.drop_duplicates(subset=['App'], keep='first')
    ```

-----

## 2\. Encoders (Translating for AI)

AI only speaks **Math**. It doesn't understand "Cat" or "Dog". We must translate.

| Encoder Type | Rule | Life Example | Finance Example | Python Tool |
| :--- | :--- | :--- | :--- | :--- |
| **Label Encoding** | **Order Matters** ($0 < 1 < 2$) | T-Shirt Sizes (S, M, L) | Credit Ratings (AAA, AA, B) | `LabelEncoder` |
| **One-Hot Encoding** | **No Order** (Just switches) | Fruits (Apple, Banana) | Branch City (London, NYC) | `pd.get_dummies` |
| **Target Encoding** | **Use the Average** | Cities (Too many names) | Zip Codes (Risk Score) | `df.groupby().mean()` |

-----

## 3\. Machine Learning (The Brain)

We choose the model based on the **Question**.

### **A. Regression (The Line)**

  * **Question:** "How much?" (Quantity)
  * **Goal:** Predict a **Number**.
  * **Life Example:** Taxi Meter (Distance $\rightarrow$ Price).
  * **Model:** `LinearRegression()`
  * **Code:**
    ```python
    from sklearn.linear_model import LinearRegression
    model = LinearRegression()
    model.fit(X_train, y_train) # Predicts Price, Temp, Sales
    ```

### **B. Classification (The Switch)**

  * **Question:** "Yes or No?" (Category)
  * **Goal:** Predict a **Group**.
  * **Life Example:** Is this email Spam? (Yes/No).
  * **Model:** `LogisticRegression()`
  * **Code:**
    ```python
    from sklearn.linear_model import LogisticRegression
    # Target must be 0 or 1
    model = LogisticRegression()
    model.fit(X_train, y_train) # Predicts Pass/Fail, Fraud/Safe
    ```

### **C. The Truth (Confusion Matrix)**

  * **Trap:** Accuracy can lie (A model that always says "No Fraud" is 99% accurate but useless).
  * **Solution:** Use a **Confusion Matrix** to see False Positives and False Negatives.

-----

### **Final Advice for the Interview**

1.  **Don't memorize code.** Memorize the **Logic**.
2.  **Explain simply.** Use the "Taxi" or "Spotify" examples.
3.  **Check your types.** Always use `.info()` and `.unique()` before you start.

**Good luck, Future Data Scientist\! ðŸš€**