# Mathematical Underpinnings - Lab 1

## Task 1
### a) Generative approach 

We know $\pi = P(Y=1)$ and the distributions $f(x|Y=1)$ and $f(x|Y=-1)$.

#### First bullet point

In [8]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

In [8]:
# sampling from multivariate normal distribution and from Bernoulli

x = np.random.multivariate_normal(np.array([0,0]), np.eye(2), 5)
y = np.random.binomial(1, 0.5, 5)
x, y

(array([[-1.8357655 ,  0.44902948],
        [ 0.34352241, -0.88662649],
        [ 0.71461253, -0.52791778],
        [ 0.84917764,  1.53720319],
        [-0.24184231, -0.64272448]]),
 array([0, 0, 0, 0, 0]))

In [12]:
m1 = np.array([1,1])
m2 = np.array([0,0])

sigma = np.array([[1, -0.5],[-0.5, 1]])

In [40]:
# sampling
Y_generative = np.concatenate((np.ones(500),np.ones(500)*-1))
X_generative = np.concatenate((
    np.random.multivariate_normal(mean=m1, cov=sigma, size=500),
    np.random.multivariate_normal(mean=m2, cov=sigma, size=500)
),axis=0)

In [43]:
X_generative.shape

(1000, 2)

In [53]:
lr = LogisticRegression(fit_intercept=True)
lr.fit(X_generative,Y_generative)
print(lr.intercept_)
print(lr.coef_)

[-2.01569547]
[[1.89526077 2.08510365]]


In [71]:
b0 =  (m2.T @ np.linalg.inv(sigma) @ m2 - m1.T @ np.linalg.inv(sigma) @ m1)/2
print(b0)
b = np.linalg.inv(sigma) @ (m1-m2)
print(b)

-2.0
[2. 2.]


In [None]:
# roughly the same

Does the distribution of $P(Y=1|X=x)=p(y=1|x)$ correspond to a logistic model?

A hint: Use Bayes theorem to compute $p(y=1|x)$. Is it possible to represent $p(y=1|x)$ as $\frac{e^{\beta_0 + \beta x}}{1 + e^{\beta_0 + \beta x}}$?


#### Second bullet point

Find the formulas for the parameters of the logistic model (the coefficients and the intercept).

A hint: Use the representation of $p(y=1|x)$ from the first bullet point and solve for $\beta_0$ and $\beta$.

In [4]:
# computing beta_0 and beta using the formulas

In [5]:
# a logistic model
# mod_a = LogisticRegression(penalty=None)
# mod_a.fit(X_generative, Y_generative)
# (mod_a.intercept_, mod_a.coef_)

### b) Discriminative approach

We know $f(x)$ and $P(Y=1|X=x)$.

#### First bullet point

In [25]:
# sampling
# first generate x
# mix = np.random.binomial(1,p=0.5,size=1000)
# X1 = np.random.multivariate_normal(mean=m1, cov=sigma, size=1000)
# X2 = np.random.multivariate_normal(mean=m2, cov=sigma, size=1000)
# # X_disciminative = 
# np.where?

# Y_discriminative = np.random.binomial(-2, -1/(1+np.exp(-2-np.array([-2,-2])*X_disciminative)))

In [72]:
#what distinguishes - in generative we can choose how much n in classes, and in discriminative we can't 

In [None]:
# mod_b = LogisticRegression(penalty=None)
# mod_b.fit(X_disciminative, Y_discriminative)

#### Second bullet point

In [7]:
# plt.subplot(1, 2, 1)
# plt.scatter(..., c=Y_generative)
# plt.ylim(-4,4)
# plt.xlim(-4,4)
# plt.subplot(1, 2, 2)
# plt.scatter(... , c=Y_discriminative)
# plt.ylim(-4,4)
# plt.xlim(-4,4)
# plt.show()

What distinguishes the generative approach from the discriminative approach?

### c)

To sample from $f_{X|Y=-1}$, first, we will give an answer to Q1.

Q1. A hint: use Bayes theorem for $p(x|y=-1)$ and use law of total probability for $p(x)$.

We want to have: 
$$
p(y=1|x) = \frac{e^{\beta_0 + \beta_1x}}{1 + e^{\beta_0 + \beta_1x}} \quad (*)
$$
and we know that: $P(X=x|Y=1) = e^{-x}$.

From Bayes:
$$
P(y=1|x) = \frac{\pi e^{-x}}{\pi e^{-x} + (1-\pi)p(x|y=-1)}
$$

Therefore from (*):
$$
p(x|y=-1)=\frac{\frac{\pi}{1-\pi}e^{-\beta_0}}{(1+\beta_1)}[(\beta_1+1)e^{-(\beta_1+1)x}] \sim cEXP(1+\beta_1)
$$
Which is the answer to __Q1__.

__Q2__: Yes, as long as we adjust $\beta_1$ so that the the pdf integrates to 1 (c = 1).

__Q3__: It's not.

__SOLUTION__:
$$
c = 1 \implies \beta_0 = -ln(\frac{(\beta+1)(1-\pi)}{\pi})
$$
Therefore for $n_1 = 1000$, $n_1 = 2000$, $\beta_1 = 1$, we have $\pi = \frac{1}{3}$  
$$
\beta_0 = -ln(4)
$$

In [9]:
# computing beta_0 assuming that n_1 = 1000, n_2 = 2000 and b_1 = 1
-np.log(4)

-1.3862943611198906

In [22]:
# let's check if the formula is correct
b1 = 1

n1 = 1000
x1 = np.random.exponential(1, size=n1)

n2 = 2000
x2 = np.random.exponential(1/(1+b1), size=n2)

y = np.concatenate((np.ones(n1), np.zeros(n2)))
X = np.concatenate((x1,x2))

In [23]:
lr = LogisticRegression(penalty=None)
lr.fit(X.reshape(-1,1),y)
lr.intercept_
#close enough

array([-1.37690256])

Q2, Q3. A hint: what is the distribution of $f_{X|Y=-1}$? What ia a norming constant?

...

If that is doable, given $\beta_1$ and $\pi$ compute $\beta_0$.

A hint: Of course it is, compute $\beta_0$.

## Task 2

### a)

$R(a,a^*) = \mathbb{E} \mathcal{L}(f(X), Y) = \mathbb{E}(aX - Y)^2 = ...$,

In our task we know $a^* = 1$.

In [268]:
# def population_risk(a, sigma_eps):
#     return ...

### b)

In [269]:
# sampling

Empirical risk: $\frac{1}{n} \sum_{i=1}^n (ax_i - y_i)^2$

In [270]:
# def empirical_risk(a, x, y):
#     return ...

### c)

In [37]:
# code, plot

### d)

Excess risk: $$E(\hat a, a^*) =R(\hat a, a^*) - \textrm{inf}_{a \in A_0} R(a, a^*)$$

In [38]:
# excess risk
# ...

In [39]:
# simulations with fixed sample size

In [276]:
# simulations for various sample sizes

In [40]:
# a nice plot visualising the results (how the unconditional excess risk changes with a sample size)