# Conceptual

## Excercise 1

**Review Core Concepts:**
- **Flexibility:** Refers to how complex a model the statiscal learning method can fit. High flexibility allows fitting intricate patterns (e.g., non-linear relationshop), while low flexibility imposes stronger constraints (e.g., linearity).
- **Inflexible Methods:** They have **low variance** (the fitted model doesn't change much with different training sets) but potentially **high bias** (the model's assumptions might be too simple to capture the true underlying relationship f). (Example: Linear Regression...)
- **Flexible Methods:** They have **low bias** (can approximate complex true relationships f) but potentially **high variance** (can fit the noise in the training data, leading to overfitting and poor perfomance on unseen data). (Example: KNN with small K, high-degree polynomial regression...)
- **Perfomance:** Typically measured by prediction accuracy on unseen **test data**. This is related to the Test Mean Squared Error (MSE) for regression or Test Error Rate for classification.
- **Bias-Variance Trade-off:** TestError $\approx$ Bias^2 + Variance + Irreducible Error($\sigma^2$). The goal is to minimize this total error. Increasing flexibility generally decreases bias but increases variance. The optimal method balances this trade-off.

**Solved:**

**(a) The sample size $\textit{n}$ is extremely large, and the number of predictors $\textit{p}$ is small.**
- Expected Performance: A flexible method will likely perform better.
- Justification:
    - Bias: A flexible method has lower bias and can capture potentially complex underlying relationships between the predictors and the response.
    - Variance: The main drawback of flexible methods is high variance (overfitting). However, with an *extremely large sample size (**n**)*, the risk of overfitting decreases significantly. The model can learn complex patterns from the abundant data without being overly influenced by the noise in any small subset of it.
    - Trade-off: Since large **n** mitigates the high variance issue associated with flexible models, we can benefit from their low bias to achieve better overall performance, especially if the true relationship **f** has some non-linearity. The small number of predictors **p** also helps, avoiding the "curse of dimensionality" which can make even large datasets seem sparse and hamper flexible methods

**(b) The number of predictors *p* is extremely large, and the number of observations *n* is small.**
- Expected Performance: An inflexible method will likely perform better.
- Justification:
    - Bias: An inflexible method might have higher bias if the true relationship f is complex
    - Variance: When p >> n (many predictors, few observations), flexible methods have too much freedom. They can find spurious patterns in the training data and fit the noise almost perfectly, leading to extreme overfitting and very high variance. Their performance on unseen test data will be very poor
    - Trade-off: In the p >> n scenario, controlling variance is paramount. Inflexible methods, by imposing strong structural assumptions (like linearity), significantly restrict the model complexity and thus keep the variance low. Even if they introduce some bias, the massive reduction in variance compared to a flexible model typically leads to better test performance. Flexible methods suffer greatly from the curse of dimensionality here

**(c) The relationship between the predictors and response is highly non-linear**
- Expected Perfomance: A flexible method will likely perform better
- Jusitfication:
    - Bias: Inflexible methods (like linear regression) inherently assume a simple, often linear, structure. They will be unable to capture the tru non-linear patterns, resulting in high bias. Flexible methods are specifically designed to handle such complex, non-linear structures, and thus will have much lower bias
    - Variance: Flexible methods will still generally have higher variance than inflexible ones.
    - Trade-off: Because the true relationship is highly non-linear, the high bias of an inflexible method will be a major source of error. 

**(d) The variance of the error terms, i.e. $\sigma^2=Var(\epsilon)$, is extremely high**
- Expected Performance: An inflexible method will likely perform better
- Justification:
    - Bias: The bias characteristics remain as usual (flexible = low bias, inflexible = potentially high bias if f is complex)
    - Variance: Due to the extremely high variance of error terms, there is a lot of noise in that data. Flexible methods, due to their ability to fit complex patterns, are highly susceptible to fitting this noise in the training data. This leads to a model that wiggles a lot to accommodate noisy points, resulting in high variance (overfitting). Inflexible methods are less sensitive to individual noisy data points
    - Trade-off: When $\sigma^2$ is very high, the "signal" (f) is harder to discern from the "noise" ($\epsilon$). Trying to capture intricate patterns with a flexible model is likely just fitting the noise, dramatically increasing the model's variance. An inflexible method, while potentially missing some nuances of f (higher bias), will avoid chasing the noise and maintain lower variance.

## Excercise 2

**Review Core Concepts:**
- **Regression vs. Classification:**
    - Regression: The response variable (Y) is quantitive (numerical). We want to predict a continuous value (e.g., salary, price, percentage change)
    - Classification: The response variable (Y) is qualitative (categorical). We want to predict a class label (e.g., success/failure, spam/ham, industry type)
- **Inference vs. Prediction:**
    - Inference: The primary goal is to understand the relationship between the predictors (X) and the response (Y). We want to know *which* predictors affect the response and *how*. The exact form and interpretation of the model **f** are important
    - Prediction: The primary goal is to accurately predict the value of the response (Y) for new observations, given their predictor value (X). The exact form of the model **f** mgiht be treated as a "black box" as long as it yields accurate predictions
- **n:** The number of observations in the dataset
- **p:** The number of predictor variables (features) available for each observation

**Solved**

**Question (a)**
- Problem Type: Regression. The response variable we are interested in is CEO salary which is a numerical value
- Goal: Inference. The stated goal is "understanding which factors affect CEO salary"
- n = 500 (the top 500 firms)
- p = 3 (profit, number of employees, industry)

**Question (b)**
- Problem Type: Classification. The reponse variable is whether a product launch is a *success* or a *failure*
- Goal: Prediction. The goal is to know whether it will be a success or a failure involving using the model built from pass data to predict the outcome for a future, unsenn product
- n = 20 (Data was collected on "20 similar products")
- p = 13 (price charged, marketing budget, competition price, and ten other variables)

**Question (c)**
- Problem Type: Regression. The response variable is the % change in the USD/Euro exchange rate. A percentage change is a numerical value.
- Goal: Prediction. The goal is to predict the % change in the USD/Euro exchange rate
- n $\approx$ 52 weeks (weekly data for all of 2012) 
- p = 3 (the % change in the US market, the % change in the British market, the % change in the German market)

## Excercise 3

## Excercise 4

## Excercise 5

## Excercise 6

## Excercise 7