# Monday exercises I

## High-dimensional learning

## Curse of dimensionality
Here we are looking at different ways of understanding how distances behave in high dimensions and how this differs from their behavior in low-dimensional space. 


**Exercises 1** Consider the unit cube in $d$ dimensions, that is $[0,1]^d$ in $\mathbb{R}^d$. Assume we have a dataset that is uniformly distributed in the cube.
  - Consider a smaller cube inside of our cube with a sidelenght $0<s<1$. What does $s$ need to be in 50 dimensions cover $1\%$ of the data? In general, if d is the dimension, what is the sidelength $s$ of a smaller cube that contains a proportion $p$ of the data? 
  - Create a figure that shows how the volume increases as the side length of a cube increases from $0$ to $1$ in dimensions $1, 2, 5, 10, 20, 50, 100$. 


In [None]:
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

In [None]:
# Solution: Write it here
# The volume of the small cube is ?




**Exercise 2.** Another way of understanding the curse of dimensionality is to consider the multivariate standard normal distribution. Intuition from lower dimensions suggests that most points of the multivariate normal are close to the origin and therefore have a relatively small norm. 
  - For a random sample of 1000 normally distributed data points in dimensions $1, 2, 5, 10, 20, 50, 100$, plot the distribution of the norms. 
  - Describe what you see.


In [None]:
# Solution: Write it here


# We see that the norm increases with dimension.


3. Yet another way of visualizing the curse of dimensionality is to look at distances between uniformly sampled data. The normalized distance in $d$ dimensions is the distance divided by square root of $d$. Because the maximum distance in d-dimensional hypercube is $\sqrt{d}$, the normalized distance will be between 0 and 1.
  - For dimensions $1, 2, 5, 10, 100, 1000, 10000, 100000$, sample $1000$ points uniformly at random in the unit cube. 
  - Compute the difference between the maximum and the minimum distance divided by the minimum distance. In other words, we want
  $$\frac{d_{\max}-d_{\min}}{d_{min}}$$
  What do you expect this to be?
  - Compute the mean normalized distances
  $$\frac{1}{\sqrt{d}} \cdot \text{mean of all distances}$$
  What do you expect this to be?
  - How do you interpret the results? Hint: view the minimum distance $d_{\min}$.

In [None]:
# Might be helpful to inport
from scipy.spatial.distance import pdist

In [None]:
# Write your solution here.

## Learning in high dimensions
Here we look at how learning behavior changes in higher dimensions when compared to learning in low dimensions. 

### Exercise Part 1

1. Create a simple dataset with $n=50$ data points, where $x$ is uniformly distributed between 0 and 10 and $y$ is given by $y=5\sin(x) + 0.2x^2 + \varepsilon $, where $\varepsilon$ is standard normally distributed. 
2. Split the data into training, validation and testdata. Visualize the training data.
3. Train 4 different models on this dataset. You can for example use the onces we imported from `sklearn` (you do not need to set up a neural net). Visualize the results.



In [None]:
# These might be helpful
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.metrics import mean_squared_error

In [None]:
## Write your solution here

### Exercise Part 2

4. Randomly rotate your data $x$ in 100-dimensional space. Use for example `scipy.stats.special_ortho_group`. Repeat steps 2-3. 
5. Add noise to your original training data. Repeat steps 2-3. 
6. Randomly rotate your data $x$ in 100-dimensional space, then add noise to your training data. Repeat steps 2-3.
7. Summarize your results. What have you found?
8. Create another dataset with \(n=50\) data points, where $x_1, \dots, x_{100}$ are uniformly distributed between 0 and 10 and y is given by $y=\sum_{i=1}^{100} \sin(a_i x_i) + b_i x_i^2 + \varepsilon$, where \(a_i, b_i\) are uniformly distributed between 0 and 1 and \(\varepsilon\) is standard normally distributed. Train the same models on this dataset and summarize the results. What have you found?

In [None]:
# This might be helpful to get a random rotation in dim 100
from scipy.stats import special_ortho_group

In [None]:
# Exercise 4: Just rotation

dim = 100

rot = special_ortho_group.rvs(dim, random_state=0)

def add_dim(x):
    return np.hstack((x.reshape(-1, 1), np.zeros((len(x), dim-1))))

def rotate(x):
    return add_dim(x).dot(rot)

In [None]:
# Write your solution

### Exercises Part 3

8. Create another dataset with \(n=50\) data points in $\mathbb{R}^{100}$, where $x[1], \dots, x[100]$ are uniformly distributed between 0 and 10 and y is given by
$$y=\sum_{i=1}^{100} (\sin(a_i x_i) + b_i x_i^2) + \varepsilon$$
where $a_i, b_i$ are uniformly distributed between 0 and 1 and $\varepsilon$ is standard normally distributed. Train the same models on this dataset and summarize the results. What have you found?

In [None]:
# Write your solution
