# EBA3500 Exercises 8: Omnibus tests

## Exercise 1: $R^2$ without categorical variables
We presented the distribution of $R^2$ in the context of categorical variables, but this is not necessary. The distribution
$$
R^2 \sim \textrm{Beta}\left(\frac{K-1}{2},\frac{n-K}{2}\right)
$$
is true whenever the responses are normal with no non-zero $\beta$s for any covariates, i.e.,
$$
y = \beta_0 + \epsilon.
$$
### (a)
Suppose $y = \beta_0 + \epsilon$ for a normal $\epsilon$, and let $X_1,X_2,...,
X_{100}$ be $100$ normal variables sampled independently of $y$. Let the number of categorical variables be $10$. For instance:


In [1]:
import numpy as np
rng = np.random.default_rng(seed = 313)

k = 10
n = 100
x = rng.normal(0, 1, (n, k))
y = rng.normal(3, 2, n)

Now suppose you calculate the $R^2$ from the regression $y \sim \beta_0 + \beta_1 x_1 + ... + \beta_{100} x_{100}$. What is the distribution of $R^2$?

### (b)
By suitably modifying the code in the lecture notes, simulate the $R^2$ a total of `n_reps = 10 ** 4` times for the setup above, where `x` is kept fixed for each simulation. Make a histogram and plot the density of the theoretical density atop of it.

### (c)
Now suppose that
$$y = 1 + 2x_1 + \epsilon$$
where $\epsilon$ and $x_1,...,x_{100}$ are normal with mean 0 and standard deviation $1$. Let $n = 100$, and simulate both `x` and `y` a total of `n_reps = 10 ** 4` times, and calculate the $R^2$ obtained from the regression
$y \sim \beta_0 + \beta_1 x_1 + ... + \beta_{20} x_{20}$ for each iteration.

Put the $R^2$s into a histogram. Add the Beta density that holds true under the null-hypothesis to the histogram. Does it fit well? Why, or why not?




## Exercise 2: $R^2$ and $F$
Remember that 
$$
F(R^2) = \frac{n-K}{K-1}\frac{R^{2}}{1-R^{2}},
$$
where $n$ is the number of observations and $K$ the number of categories.
Keeping $n$ and $K$ fixed, $F$ is a *bijection* between $[0,1]$ and $[0,\infty)$. This means that there is a unique function $F^{-1}$ so that $F^{-1}(F(x)) = x$ for all $x\in [0,1]$.

### (a)
Show that $F^{-1}(x) = \frac{x}{\frac{n-K}{K-1}+x}$.

### (b)
Suppose $n = 100$ and $K=6$, and suppose $F = 230$. Can you deduce what $R^2$ is? If so, what is it?

### (c)
Assume $n>K>1$. For what values of $R^2$ is $F=0$?



## Exercise 3: Simulating with different errors


### (a) Simulation function
Make a function `rsq_sim` that takes `n, x, n_reps, error` as input, then does
the following `n_reps` times.
1. Lets `y` be `n` observations from `error`, a function that takes `n` as its first positional argument,
3. Estimates the regression model `y ~ x1 + x2 + x3 ... x[K]`, where $K$ is the number of columns in a dataframe `x`,
4. Calculates the $R^2$ for said model.

The function should return an array containing `n_reps` $R^2$ values.

In [None]:
def rsq_sim(n, x, n_reps, error):

### (b) Trying out error terms
Let $n = 20, K = 6$, and generate a fixed `x` with $K$ columns. Run `rsq_sim` for the following errors:
1. `lambda n: rng.normal(0, 1, n)`
2. `lambda n: rng.exponential(1, n)`
3. `lambda n: rng.geometric(0.5, n)`
4. `lambda n: rng.standard_t(3, n)`


### (c) Plotting
Plot histograms for the data in the previous exercise in a facet grid and overlay the theoretical Beta distribution.