# Question 11

In [1]:
import numpy as np
import pandas as pd
from matplotlib.pyplot import subplots
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor as VIF
from statsmodels.stats.anova import anova_lm
from statsmodels.formula.api import ols
from ISLP import load_data
from ISLP.models import (ModelSpec as MS, summarize, poly)

In this problem we will investigate the *t*-statistic for the null hypothesis $H_0: \beta=0$ in simple linear regression without an intercept. To begin, we generate a predictor $x$ and a response $y$ as follows.

In [2]:
rng = np.random.default_rng(1)
x = rng.normal(size=100)
y = 2 * x + rng.normal(size=100)

## Part A

Perform a simple linear regression of $y$ onto $x$, without an intercept. Report the coefficient estimate $\hat{\beta}$, the standard error of this coefficient estimate, and the *t*-statistic and *p*-value associated with the null hypothesis $H_0: \beta = 0$. Comment on these results. (You can perform regression without an intercept using the keywords argument `intercept=False` to `ModelSpec()`.)


In [11]:
# transform the data into a pandas DataFrame
data = pd.DataFrame({'x': x, 'y': y})

X = MS('x', intercept=False).fit_transform(data)
Y = data['y']
model = sm.OLS(Y, X)
results = model.fit()

summarize(results)

Unnamed: 0,coef,std err,t,P>|t|
x,1.9762,0.117,16.898,0.0


This linear regression estimate correlates with the input function that we created the data with, which was $Y = 2 \times X$. The coefficient here represents the estimated $\hat{\beta}$ value. The standard error here of $0.117$ represents the random noise that we generated in the function. Finally, since the probability observing a value greater than $|t|$ is basically 0, we can reject the null hypothesis. 

## Part B

Now perform a simple linear regression of $x$ onto $y$ without an intercept, and report the coefficient estimate, its standard error, and the corresponding *t*-statistic and *p*-values associated with the null hypothesis $H_0: \beta = 0$. Comment on these results.


In [13]:
X2 = MS('y', intercept=False).fit_transform(data)
Y2 = data['x']
model2 = sm.OLS(Y2, X2)
results2 = model2.fit()

summarize(results2)


Unnamed: 0,coef,std err,t,P>|t|
y,0.3757,0.022,16.898,0.0


These results also make sense. The coefficient represents a number approximating 0.5, which would be the slope if the values were swapped. The *p*-value is 0, meaning that we can reject the null hypothesis.

## Part C

What is the relationship between the results obtained in (a) and (b)?

Both results have coefficients that are very similar to the expected slope of the line. Furthermore, the *t*-values are exactly the same, and the *p*-values are approximately 0 for both, showing that we can reject the null hypothesis in both cases. 

## Part D-F

*skipped*