# Conceptual exercises from Chapter 2

## 2.4.1
> For each of parts (a) through (d), indicate whether we would generally expect the performance of a fexible statistical learning method to be better or worse than an infexible method. Justify your answer.

### (a)
> The sample size $n$ is extremely large, and the number of predictors $p$ is small.

Flexible methods will perform better, as we are unlikely to fit noise when $n$ is large compared to $p$.

### (b)
> The number of predictors $p$ is extremely large, and the number of observations $n$ is small.

Here we need extremely inflexible models. There is simple not enough observations to allow us any flexibility.

### (c)
> The relationship between the predictors and response is highly non-linear.

Most inflexible methods are linear, so it sounds prudent to use a flexible method. However, the responses in (a) and (b) still applies. If we have a small sample size and large number of predictors, we still have to use an inflexible method.

### (d)
> The variance of the error terms, i.e., $\sigma^2=\text{Var}(\epsilon)$, is extremely high.

It seems plausible that inflexible models would work best. But a high error variance doesn't matter in and of itself - what matters is how large the variance is compared to the features. (Recall equation (2.1, p.15)). To see why, think about a setting with very small error variance, e.g., $\sigma=0.001$. If we multiply $Y=f(X)+\epsilon$ by $1,000,000$ we get a standard deviation of $1000$, which is equivalent to a variance of $1,000,000$. But most methods will perform just as well on $1000Y=1000f(X)+1000\epsilon$ as it did on the original data! In essence, what matter is the ratio between $f(x)$ and $\sigma$: If $f(x)$ has a huge scale, then it's ok for $\sigma$ to have huge scale. If $f(x)$ has a small scale, then we will have to fit an inflexible model.

Figure 1 shows two data sets with the same error variance $1,000,000$. To the left, the variability in $x$ is small, and the plot looks like random noise. On the right the variability in $x$ is large, as should be evident from the $x$-axis. Still, it is possible to fit a rather flexible model, as a pattern is evident.


In [None]:
#| layout-ncol: 2
#| echo: False
#| fig-cap: 
#|   - "Figure 1(a): Large error variance, small $x$."
#|   - "Figure 1(b): Large error variance, large $x$. True functional relationship in blue."

import matplotlib.pylab as plt
import numpy as np
rng = np.random.default_rng(seed=1)
x = rng.uniform(-2,2,400)
eps = 1000*rng.standard_normal(400)

plt.clf()
plt.plot(x, x**2*np.sin(2*np.pi*x)+eps, "b.")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
plt.clf()
plt.plot(1000*x, 1000*(x**2*np.sin(2*np.pi*x))+eps, "r.")
t = np.linspace(x.min(), x.max(), 1000)
plt.plot(1000*t, 1000*(t**2*np.sin(2*np.pi*t)))
plt.xlabel("x")
plt.ylabel("y")
plt.show()

## 2.4.2
> Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide $n$ and $p$.

### (a)
> We collect a set of data on the top 500 frms in the US. For each frm we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary.

Salary is a continuous variable, so this is a regression problem. Since we are interested in understanding what factors affect CEO salary, this is an inference problem. The number of features is $p=3$, namely profit, number of employees, and industry. Finally, $n=500$ since we are dealing with $500$ firms.

### (b)
> We are considering launching a new product and wish to know whether it will be a success or a failure. We collect data on 20 similar products that were previously launched. For each product we have recorded whether it was a success or failure, price charged for the product, marketing budget, competition price, and ten other variables.

### (c)
> We are interested in predicting the % change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets. Hence we collect weekly data for all of 2012. For each week we record the % change in the USD/Euro, the % change in the US market, the % change in the British market, and the % change in the German market.

## 2.4.3

### (a)
>  Provide a sketch of typical (squared) bias, variance, training error, test error, and Bayes (or irreducible) error curves, on a single plot, as we go from less fexible statistical learning methods towards more fexible approaches. The $x$-axis should represent the amount of fexibility in the method, and the y-axis should represent the values for each curve. There should be fve curves. Make sure to label each one.

### (b)
> Explain why each of the fve curves has the shape displayed inpart (a).