# Week one lecture and topic notes

Here are practical examples of how both linear and quadratic classifiers might be used in real-world applications:

### 1. **Linear Classifier Example: Spam Detection**
   - **Scenario:** In email spam detection, a linear classifier can be used to separate spam from non-spam emails.
   - **Features:** Words or phrases in the email, email length, presence of certain keywords (e.g., "free," "win").
   - **Decision Boundary:** The classifier might use a hyperplane to classify emails as either spam or not based on the presence of certain words.
   - **Why Linear?** Spam vs. non-spam can often be separated using a combination of features that have a linear relationship, such as the frequency of specific spam-indicating words.
   - **Classifier Used:** Logistic regression or linear Support Vector Machine (SVM).
   - **Outcome:** The model might learn that emails containing the words "free" and "money" above certain thresholds are classified as spam, while others are not.

### 2. **Quadratic Classifier Example: Handwriting Recognition**
   - **Scenario:** In handwriting recognition (e.g., recognizing digits in the MNIST dataset), a quadratic classifier might be used to distinguish between the handwritten digits.
   - **Features:** Pixel intensity values for different regions of the digit images.
   - **Decision Boundary:** A quadratic decision boundary allows the model to better differentiate between classes, such as the digit "0" and the digit "8," which might have overlapping shapes in certain linear projections.
   - **Why Quadratic?** The shapes of different handwritten digits are not linearly separable, meaning you need more flexible decision boundaries that can curve around clusters of data.
   - **Classifier Used:** Quadratic Discriminant Analysis (QDA).
   - **Outcome:** The quadratic classifier can model the more complex relationships between pixel intensities that characterize different digits, offering better accuracy than a linear model.

### Summary of Differences:
- **Linear Classifier (Spam Detection):** Works well when the features (such as word frequencies) are roughly linearly related to the output (spam or not).
- **Quadratic Classifier (Handwriting Recognition):** Necessary when the data (like pixel intensity values) have non-linear patterns, and a curved boundary is needed to accurately classify the inputs.

These examples highlight how the complexity of the data and relationships between features can guide the choice between linear and quadratic classifiers.

Generative and discriminative models are two fundamental approaches in machine learning, each with distinct ways of understanding and modeling data.

### 1. **Generative Models**
   - **Definition:** Generative models learn the joint probability distribution \( P(X, Y) \) of the input features \( X \) and the labels \( Y \). Once this distribution is learned, the model can be used to generate new data points or classify by estimating \( P(Y|X) \) using Bayes' Theorem.
   - **How They Work:**
     - They model how the data was generated.
     - First, they learn the distribution of the input features \( P(X|Y) \) for each class \( Y \), and the prior probability \( P(Y) \).
     - Then, for a new input \( X \), they compute the posterior probability \( P(Y|X) \) using Bayes' Theorem and choose the label with the highest probability.
   - **Common Applications:**
     - **Data generation:** Creating new examples of images, text, or other data.
     - **Density estimation:** Learning the distribution of the data.
     - **Anomaly detection:** Identifying instances that don't fit the learned distribution.
   - **Examples of Generative Models:**
     - Naive Bayes Classifier
     - Hidden Markov Models (HMMs)
     - Gaussian Mixture Models (GMMs)
     - Variational Autoencoders (VAEs)
     - Generative Adversarial Networks (GANs)

   **Pros:**
   - Can generate new data points.
   - Can handle missing data in some cases by modeling the joint distribution.
   
   **Cons:**
   - Generally more computationally intensive.
   - May not always lead to better classification performance compared to discriminative models.

### 2. **Discriminative Models**
   - **Definition:** Discriminative models directly model the conditional probability \( P(Y|X) \), i.e., the probability of the labels \( Y \) given the features \( X \), without attempting to model how the data was generated. They focus purely on the boundary between different classes.
   - **How They Work:**
     - They focus on distinguishing between classes.
     - They do not model the input data distribution \( P(X) \); instead, they learn to map inputs \( X \) to labels \( Y \) as directly as possible.
   - **Common Applications:**
     - **Classification tasks:** Where the goal is to assign labels to given data points.
     - **Regression tasks:** Where the model predicts continuous values.
   - **Examples of Discriminative Models:**
     - Logistic Regression
     - Support Vector Machines (SVMs)
     - Decision Trees
     - Random Forests
     - Neural Networks (when used for classification)

   **Pros:**
   - Often simpler and more efficient for classification tasks.
   - Directly optimized for classification, often leading to better performance.
   
   **Cons:**
   - Cannot generate new data.
   - Require full data to be present (no natural handling of missing data).

### Key Differences:
| Aspect                 | **Generative Models**                              | **Discriminative Models**                         |
|------------------------|---------------------------------------------------|---------------------------------------------------|
| **What They Learn**     | Joint distribution \( P(X, Y) \)                  | Conditional distribution \( P(Y|X) \)             |
| **Purpose**             | Can generate new data, model full distribution    | Classify data, focused on decision boundaries     |
| **Examples**            | Naive Bayes, GANs, VAEs, HMMs                     | Logistic Regression, SVMs, Decision Trees         |
| **Advantages**          | Can model data generation process, handle missing data | Typically better at classification tasks          |
| **Disadvantages**       | More complex, not always best for classification  | Cannot generate new data                          |

### Practical Example:
- **Generative Model Example:** A **GAN** (Generative Adversarial Network) can be used to generate realistic images of faces, even creating new, previously unseen faces by learning the joint distribution of pixel values and classes.
  
- **Discriminative Model Example:** A **Support Vector Machine** (SVM) is used for classifying emails as spam or not spam. It focuses purely on distinguishing spam emails from non-spam, based on features extracted from the email, without modeling how those emails were generated.

In summary, generative models are useful when you need to understand or generate data, while discriminative models are often more efficient and effective for classification tasks.