# Machine Learning

### [Background]

#### Problem : Detecting Spam Emails
(Deciding whether or not a given email is spam or not_spam)

If we were to use : **Traditional Programming**, then we'll have to 

- write explicit rules for recognizing every possible spam message

- Eg. If the message contains - *"win money", "free offer"*, etc then label it as Spam. Else not_spam.

**Issues**
- not feasiable if the rules are too many or complex, easy to bypass (eg. using - "freee" instead of "free"), no learning, poor generalization

---

# Machine Learning  
(A better solution)

Machine learning is a subfield of **artificial intelligence** that focuses on **learning from the data** itself by **recognizing pattern**s in it and making decisions based on the learned patterns **without being explicitly programmed**.

---

## What is needed (to make a machine learn)?
1. **Dataset** (a collection of examples or instances of the given problem)
2. **Algorithm** (step by step procedure or a set of rules)

### What the machine does? 
Uses the **algorithm** to make a **model** based on the given dataset. Then uses the model to predict and solve a problem.

---

# Types of Machine Learning

(based on the availability and nature of the data)


---

## 1. Supervised Learning

Here, **Dataset** - is a collection of labeled examples {(x<sub>i</sub>,y<sub>i</sub> )}<sup>N</sup><sub>i=1</sub>

where, **x<sub>i</sub>** - **feature vector** 
For eg.,
$$
x_i=[x^{(1)}, x^{(2)}, x^{(3)},....x^{(D)}]
$$
where $x^{(1)}, x^{(2)}, x^{(3)},...$ can be features like - frequency of particular words, length of the email, presence of special characters, etc. for spam or not_spam problem.

**y<sub>i</sub>** - **label** (can be anything like a class (spam or not_spam) or a real number (probability of a email being spam or not_spam))


**Goal :** Produce a model that takes a feature vector as input and output the label for it.

---

#### Support Vector Machine (SVM) 

- a supervised learning algorithm

- Where : labels - classes
- Originally a binary classification problem but can be extended to multiclass.
- requires positive examples (class of interest) - labelled as +1  & negative examples (the other class) - labelled as -1 

- creates a decision boundary (a hyperplane) to seperate the two categories of data

-  prefer that the hyperplane separates positive examples from negative ones with the **largest margin** (Margin: is the distance between the decision boundary and the points of the class that are nearest to it).
- Why largest Margin? (for better generalization and making the model more roboust to noise).

---

<img src="Images/SVM.png" width= 500> <img src="Images/svm_b_2.png" width= 500>
<img src="Images/svm_a_1.png" width= 500> <img src="Images/svm_d_1.png" width= 500>

---

- **Decision boundary** (the hyperplane)  given by,
    $$
    \mathbf{wx} - b = 0
    $$
    where $\mathbf{w}$ - a weight vector, $\mathbf{x}$ - feature vector, b - bias
            
- A label is given by,
    $$
    y = sign(\mathbf{wx}-b)
    $$

- **Model** defined as,
    $$
    f(x) = sign(\mathbf{w^*x}-b^*)
    $$
    where, $\mathbf{w^*}$ - optimised value of $\mathbf{w}$, $b^*$ - optimised value of b

- Constrains : $y_i (\mathbf{wx} - b) \ge 1$ (Why? Explained later...)
- Minimize $||\mathbf{w}||$ (so that the margin is large)
- Eg. , checking if a message is spam or not_spam.

---

# 2. Unsupervised Learning

Here, the **Dataset** - is a collection of unlabelled examples {x<sub>i</sub>}<sup>N</sup><sub>i=1</sub>

where **x<sub>i</sub>** - feature vector

**Goal :** Create a model that takes an input vector and returns something that give information about the structure/pattern of the data. 
**Eg -** grouping of mails into different categories without spam and not spam labels.

---
# 3. Semi-Supervised Learning

Here, the **Dataset** - collection of both labelled and unlabeled data

**Goal:** Same as Supervised learning. The unlabelled data is used to improve the model's understanding of the structure of the data distribution.
**Eg -** a combination of both - emails with spam and not_spam labels & without labels.

This type of learning is important as most of our data is unlabelled.

---

# 4. Reinforcement Learning

Here the **Dataset :** has No fixed data, the **agent** (machine) makes its own dataset.

Few terms:- 
- **Environment $(E)$**: The external system the agent interacts with.
- **State $(S)$**: A specific situation or configuration of the environment.
- **Action $(A)$**: A move or decision the agent can make in a given state.
- **Reward $(R)$**: Feedback signal indicating the value of an action taken.
- **Optimal actions**: The best actions that maximize long-term rewards.

In this the machine lives in an **environment $(E)$** , precieves the **state$(S)$** of the environment and takes **action $(A)$** in every states. The different actions taken leads to different **rewards $(R)$**. Over time it learns what are the **optimal actions** in a particular state. 

$$
E=(S,A,P,R,Î³) \text{ }
$$
where **E -** Environment, **A -** action, **P-** transition function,
**R -** rewards, **$\boldsymbol{\gamma}$** - reward factor

---

**Eg.** Solving a maze problem, 
<img src="Images/Maze_1.png" width = 500>  

**Goal :** To learn a **Policy** (a function that takes feature vector of a state as input and give the optimal action that can be taken in that state as output).

---

# Why the Model Work on New Data?

Because of **statistical generalizations**: 

- If the data is random and representative then we can make statements or predictions about the whole population or future data.

**PAC learning theory** explores this in more detail (but due to the limitations (assumptions it make) we don't use it now a days). 

