## Conceptual

## 1


For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer.

(a) The sample size n is extremely large, and the number of predictors p is small.

(b) The number of predictors p is extremely large, and the number of observations n is small.

(c) The relationship between the predictors and response is highly non-linear.

(d) The variance of the error terms, i.e. $\sigma^2 = Var(\epsilon)$, is extremely high.

Answers:

(a) A large number of samples and a lower number of predictors and thus lower number of dimensions mean a flexible method can catch more complex relationships and avoid overfitting. The flexible method is better.

(b) Here it's the opposite. The flexible method is worse as it increases the chance for overfitting.

(c) The flexible method is better as it could find the non-linear relationships in the response.

(d) $\sigma^2 = Var(\epsilon)$ is the irreducible error. If it is high, the flexible model finds relationships between the response and the large irreducible error (could result in overfitting). Threrfore a flexible model performs worse the an inflexible model that is more robust against the large irreducible error.

## 2

Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p.

(a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary.

(b) We are considering launching a new product and wish to know whether it will be a success or a failure. We collect data on 20 similar products that were previously launched. For each product we have recorded whether it was a success or failure, price charged for the product, marketing budget, competition price, and ten other variables.

(c) We are interest in predicting the %-change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets. Hence we collect weekly data for all of 2012. For each week we record the %-change in the USD/Euro, the %-change in the US market, the %-change in the British market, and the %-change in the Germanmarket.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#from mpl_toolkits.mplot3d import axes3d
import seaborn as sns

from sklearn.preprocessing import scale
#import sklearn.linear_model as skl_lm
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline
sns.set_style('darkgrid')

In [11]:
advert = pd.read_csv('Advertising.csv')
advert.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
Unnamed: 0    200 non-null int64
TV            200 non-null float64
radio         200 non-null float64
newspaper     200 non-null float64
sales         200 non-null float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB


In [13]:
advert.drop('Unnamed: 0', axis=1, inplace=True)
advert.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


Table 3.4: For the Advertising data, least squares coefficient estimates of the multiple linear regression of number of units sold on radio, TV, and newspaper advertising budgets.

In [34]:
lm_advert = smf.ols('sales ~ TV + radio + newspaper', advert).fit()
lm_advert.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.9389,0.312,9.422,0.000,2.324,3.554
TV,0.0458,0.001,32.809,0.000,0.043,0.049
radio,0.1885,0.009,21.893,0.000,0.172,0.206
newspaper,-0.0010,0.006,-0.177,0.860,-0.013,0.011


## (a) Intercept

The corresponding test (with null and alternative hypotheses):

$$H_{0}: \beta_{0} = 0
\\H_{a}: \beta_{0} \neq 0
$$


$\beta_{0}$ is the value that $y$ takes when $x_{1} = x_{2} = x_{3} = 0$. That is, it is the `sales` we expect when there is no TV, radio or newspaper advertising.

Since p < 0.0001, we have significant evidence to reject $H_{0}$ and accept the alternative hypothesis. This basically just means we have strong evidence that `sales` would *not* be zero in the absence of `TV`, `radio` & `newspaper` advertising.


\