# Problem 1

Consider a traning set of n samples and labels

$$
S_ n=\{ (x^{(i)},y^{(i)}):i=1,\ldots ,n\}
$$

where $x^{(i)}\in \mathbb {R}^2$ and $y^{(i)}\in \{ 1,-1\}$

Suppose that we are able to find a linear classifier with parameters $\theta$ and $\theta _0^\prime$

$$
\displaystyle y^{(i)}\left(\theta ^\prime \cdot x^{(i)}+\theta _0^\prime \right)>0
$$
for $i=1,\ldots , n$

Let $\hat{\theta }$ and $\hat{\theta _0}$ be parameters of the maximum margin linear classifier, if it exist obtained by minimizing 
$$
\displaystyle  \displaystyle \frac{1}{2}\left\|  \theta  \right\| ^2\qquad \text {subject to } \, \, y^{(i)}\left(\theta \cdot x^{(i)}+\theta _0\right)\geq 1\, \, \, \text {for all } \, i=1,\, \ldots , n.
$$

True or False. (As usual, “True" means always true; “False" means not always true.)

Yes, the statement is **True**.

The minimization problem defined by:

$$
\min_{\theta, \theta_0} \frac{1}{2} \|\theta\|^2 \quad \text{subject to} \quad y^{(i)}(\theta \cdot x^{(i)} + \theta_0) \geq 1 \quad \text{for all} \ i=1,\dots,n,
$$

has a solution **if and only if** the training examples $S_n = \{(x^{(i)}, y^{(i)}): i = 1, \dots, n \}$ are linearly separable. 

### Explanation:
- **Linear separability** means that there exists a hyperplane (defined by $\theta$ and $\theta_0$) that can perfectly separate the data points into two distinct classes, where all points with $y^{(i)} = 1$ are on one side of the hyperplane and all points with $y^{(i)} = -1$ are on the other side.
  
- If the data is linearly separable, then there exists a solution to the above optimization problem where a margin (distance from the hyperplane to the nearest data points) is maximized. The solution minimizes $\|\theta\|^2$, which corresponds to maximizing the margin between the classes.

- However, if the data is **not** linearly separable, there is no hyperplane that can satisfy the constraint $y^{(i)}(\theta \cdot x^{(i)} + \theta_0) \geq 1$ for all $i$, and hence the minimization problem does not have a solution in that case.

Thus, the minimization problem has a solution **if and only if** the training data is linearly separable.

This also means that The training examples $S _n$ are linearly separable under our assumptions.

# Problem 2

In an infinite-horizon discounted MDP, there are three states x, y1, y2 and only one action a; and the MDP dynamics are independent of the action a as shown below:

![image.png](attachment:image.png)

The discount factor is denoted by $\gamma$ ($0 < \gamma < 1$).
The instant reward is set as 1 for starting in state y1 and 0 elsewhere:

Define $V^*(y_1)$ as the optimal value function of the state y1. Compute $V^*(y_1)$ via Bellman's Equation in terms of $\gamma$ and p

### Bellman's Equation for $V^*(y_1)$:
Bellman's equation expresses the value of a state as the immediate reward plus the discounted future value of the next state under the optimal policy.

For state $y_1$, the transitions are as follows:
1. With probability $p$, the system stays in state $y_1$.
2. With probability $1 - p$, the system transitions to state $y_2$.

The **instant reward** for starting in state $y_1$ is **1**, and the value of transitioning to $y_2$ depends on the value of $V^*(y_2)$.

Thus, the Bellman equation for $V^*(y_1)$ is:
$$
V^*(y_1) = 1 + \gamma \left( p V^*(y_1) + (1 - p) V^*(y_2) \right),
$$
where:
- $1$ is the immediate reward for being in $y_1$,
- $\gamma$ is the discount factor,
- $p$ is the probability of staying in $y_1$,
- $1 - p$ is the probability of transitioning to $y_2$,
- $V^*(y_2)$ is the value function for state $y_2$.

### Bellman Equation for $V^*(y_2)$:
From the diagram, we observe that state $y_2$ only transitions to itself with probability $1$, and the reward for $y_2$ is 0. Hence, the value function for $y_2$ is simply:
$$
V^*(y_2) = 0.
$$

### Substituting $V^*(y_2)$ into the Equation for $V^*(y_1)$:
Now substitute $V^*(y_2) = 0$ into the equation for $V^*(y_1)$:
$$
V^*(y_1) = 1 + \gamma \left( p V^*(y_1) + (1 - p) \times 0 \right),
$$
$$
V^*(y_1) = 1 + \gamma p V^*(y_1).
$$

### Solving for $V^*(y_1)$:
To isolate $V^*(y_1)$, subtract $\gamma p V^*(y_1)$ from both sides:
$$
V^*(y_1) - \gamma p V^*(y_1) = 1,
$$
$$
V^*(y_1) (1 - \gamma p) = 1,
$$

### Final Answer:
The optimal value function for state $y_1$ is:

$$
V^*(y_1) = \frac{1}{1 - \gamma p}
$$

