# 1 Optimization and probability

1. Let $x_1, \dots, x_n$ be real numbers representing positions on a number line. Let $w_1, \dots, w_n$ be positive real numbers representing the importance of each of these positions. Consider the quadratic function: $f(\theta) = \frac{1}{2} \sum_{i=1}^n w_i (\theta - x_i)^2$. What value of $\theta$ minimizes $f(\theta)$? What problematic issues could arise if some of the $w_i$'s are negative?
Note: You can think about this problem as trying to find the point $\theta$ that's not too far away from the $x_i$'s. Over time, hopefully you'll appreciate how nice quadratic functions are to minimize.

*Solution*


We'll find a local minima whenever the derivative of the function equals zero. Obtaining the first derivative:

$$
\begin{align}
f^{\prime}(\theta) &= \frac{1}{2}\sum_{i=1}^n 2w_i(\theta-x_i) \\
                   &= \sum_{i=1}^n w_i(\theta-x_i) \\
                   &= \theta\sum_{i=1}^n w_i -\sum_{i=1}^n w_ix_i
\end{align}
$$

And:

$$
\begin{align}
\theta\sum_{i=1}^n w_i -\sum_{i=1}^n w_ix_i &= 0 \\
\Rightarrow \theta &= \frac{\sum_{i=1}^n w_ix_i}{\sum_{i=1}^n w_i}
\end{align}
$$

In order for that value of theta to be a local minima, the second derivative evaluated there must be positive. Calculating the second derivative:

$$
\begin{align}
f^{\prime\prime}(\theta) &= \sum_{i=1}^n w_i
\end{align}
$$

If some of the $w_i$'s are negative, there could happen that (depending on the negative magnitude) the whole sum gets a negative value, and a local minima won't be possible.

I think $\theta$ is the a distance to all the $x_i$'s with their respective weights. We want to minimize it.

----------------------------------------------

2. In this class, there will be a lot of sums and maxes. Let's see what happens if we switch the order. Let $f(\mathbf x) = \sum_{i=1}^d \max_{s \in \{1,-1\}} s x_i$ and $g(\mathbf x) = \max_{s \in \{1,-1\}} \sum_{i=1}^d s x_i$, where $\mathbf x = (x_1, \dots, x_d) \in \mathbb{R}^d$ is a real vector. Does $f(\mathbf x) \le g(\mathbf x)$, $f(\mathbf x) = g(\mathbf x)$, or $f(\mathbf x) \ge g(\mathbf x)$ hold for all $\mathbf x$? Prove it.
Hint: You may find it helpful to refactor the expressions so that they are maximizing the same quantity over different sized sets.

*Solution*

Let $S$ be the value $s\in\{1,-1\}$ that maximizes $g(\mathbf x)$. Then, for every $i$:

$$Sx_i\leq \max_{s\in\{1,-1\}}sx_i$$

So it clearly holds that $g(\mathbf x)\leq f(\mathbf x)$ for all $\mathbf x$

-------------------------------------------

3. Suppose you repeatedly roll a fair six-sided die until you roll a $1$ (and then you stop). Every time you roll a $2$, you lose $a$ points, and every time you roll a 6, you win $b$ points. You do not win or lose any points if you roll a 3, 4, or a 5. What is the expected number of points (as a function of $a$ and $b$) you will have when you stop?
Hint: it is recommended to think of defining a recurrence.

*Solution*

Let $X$ be the random variable that describes the number of points obtained in a roll. That is:

$$X = \left\{\begin{array}{ccc}
        -a & \mbox{if} & \mbox{dice rolls 2} \\
        b & \mbox{if} & \mbox{dice rolls 6}
        \end{array}
        \right.
$$
Its expected value is:
$$
\begin{align}
E[X] &= -a\left(\frac{1}{6}\right)+b\left(\frac{1}{6}\right) \\
     &= \frac{b-a}{6}
\end{align}$$

Let $p_i$ be the expected number of points at a given time $i$. It is clear that $p_1 = E[X] = \frac{b-a}{6}$. Then:

$$
p_i = \left\{\begin{array}{ccc}
        \frac{b-a}{6} & \mbox{if} & i=1 \\
        p_{i-1}+\frac{b-a}{6} & \mbox{if} & i>1
        \end{array}
        \right.
$$

On the other side, we can model the number of rolls it takes to finish the game as a random variable with a geometric distribution with $p = \frac{1}{6}$. The expected number of rolls it takes to finish the game is $\frac{1}{p} = 6$. So, after six games, the expected score is $p_6 = 6\left(\frac{b-a}{6}\right) = b-a$

Here's a simulation:

In [25]:
import random

scores = []
for rep in range(1000000):
    score = 0
    roll = random.choice([1,2,3,4,5,6])
    while roll != 1:
        if roll == 2:
            score -= 3 ##some a value
        elif roll == 6:
            score += 3 ##some b value
        roll = random.choice([1,2,3,4,5,6])
    scores.append(score)
    
sum(scores)/len(scores)

-0.000378

------------------------------------

4. Suppose the probability of a coin turning up heads is $0 \lt p \lt 1$, and that we flip it 7 times and get $\{ \text{H}, \text{H}, \text{T}, \text{H}, \text{T} , \text{T}, \text{H} \}$. We know the probability (likelihood) of obtaining this sequence is $L(p) = p p (1-p) p (1-p) (1-p) p = p^4(1-p)^3$. What value of $p$ maximizes $L(p)$? What is an intuitive interpretation of this value of $p$?
Hint: Consider taking the derivative of $\log L(p)$. You can also directly take the derivative of $L(p)$, but it is cleaner and more natural to differentiate $\log L(p)$. You can verify for yourself that the value of $p$ which maximizes $\log L(p)$ must also maximize $L(p)$ (you are not required to prove this in your solution).

*Solution*

Let $f:\mathbb R \rightarrow [0,1]$