### Draw Conclusions

We will now compute the p-value to determine if our test is significant, i.e. can we trust that the explained variance means what we think it means? If we have a high $R^2$, does it really mean that there is correlation? If we have only two datapoints, then no! And our p-value should indicate that by giving us a high value. With 2 data points, we should have a VERY high probability of seeing an $R^2$ of 1, but that does not indicate that the two variables are correlated. 

In [27]:
# f_pval = ols_model.f_pvalue

# print("p-value for model significance = ", round(f_pval,4))

p-value for model significance =  0.0046


The tl;dr:

We can conclude that the correlation between the model and the dependent variable is significant. 
Our model is valuable! 

The full explanation: 

For the model with no independent variables, the intercept-only model, all of the model’s predictions equal the mean of the dependent variable. Consequently, if the overall F-test is statistically significant, your model’s predictions are an improvement over using the mean.

- If less than 0.05, you're OK => conclude that your regression model fits the data better than the model with no independent variables, meaning the independent variables in your model improve the fit.   
- If greater than 0.05, it's probably better to stop using this set of features. 

If none of your independent variables are statistically significant, you can expect the overall F-test to also not be statistically significant.   

Occasionally, however, the tests can produce conflicting results. This disagreement can occur because the F-test of overall significance assesses all of the coefficients jointly whereas the t-test for each coefficient examines them individually. For example, the overall F-test can find that the coefficients are significant jointly while the t-tests can fail to find significance individually.   

How can this happen? The F-test sums the predictive power of all independent variables and determines that it is unlikely that all of the coefficients equal zero. However, it’s possible that each variable isn’t predictive enough on its own to be statistically significant. In other words, your sample provides sufficient evidence to conclude that your model is significant, but not enough to conclude that any individual variable is significant.

## Evaluate Part 3: Feature Significance

How important is each feature (independent variable) in predicting the target variable?
We use the p-value resulting from the t-test to evaluate that. 

In the case of univariate regression, we only have 1 variable, so significance from the f-test equates to significance from the independent variable's t-test. 

For multivariate cases, F-tests can evaluate multiple model terms simultaneously, which allows them to compare the fits of different linear models. In contrast, t-tests can evaluate just one term at a time.  Therefore, we use the results of t-tests to decide which variables are most important or have the strongest relationships with our target. 

### T-test

**The t-test for feature significance**

- **The null hypothesis** states that the model without this variable fits the data as well as your model. (Significance  > 0.05)   

- **The alternative hypothesis** says that your model fits the data better with that independent variable than the model without that variable (Significance F <= 0.05)  

Any independent variable with a p-value of <= 0.05 contributes to a better model than without it.  

Let's look at all the metrics the model gives us...

In [28]:
# ols_model.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.826
Model:,OLS,Adj. R-squared:,0.791
Method:,Least Squares,F-statistic:,23.67
Date:,"Wed, 01 Apr 2020",Prob (F-statistic):,0.00461
Time:,14:58:26,Log-Likelihood:,-19.128
No. Observations:,7,AIC:,42.26
Df Residuals:,5,BIC:,42.15
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,12.5111,14.641,0.855,0.432,-25.124,50.146
x,0.8512,0.175,4.866,0.005,0.402,1.301

0,1,2,3
Omnibus:,,Durbin-Watson:,0.983
Prob(Omnibus):,,Jarque-Bera (JB):,0.776
Skew:,0.124,Prob(JB):,0.678
Kurtosis:,1.388,Cond. No.,737.0


Looking at $P>|t|$ for $x$, we can see a p-value of 0.005. As we would expect in a univariate model, this is basically equivalent to the P(F-statistic), the p-value for the F statistic. 

### Confidence Intervals

**Parameters and confidence intervals**

We also can see our coefficents (the intercept and the coefficient for x), which are our model parameters in regression. 

With each parameter, we have an associated confidence interval. How do we make sense of these confidence intervals? 

As a review of the coefficents in a univariate problem:  

1. Coefficient 1 (y-intercept, $b_{0}$): It is where the regression line crosses the y-axis, or the value of y when x = 0. E.g. when trying to predict final grades, if all exams ($X$) have a value of 0, then we would predict $y$ to be 0. 

2. Coefficient 2 (slope, $m$, $b_{1}$): It is the amount we expect $y$ to increase by when $X$ increases by 1 (or decrease if $b_{1} < 0$).  E.g. If $X$ is exam 1, then for every 1 point increase in exam 1, we would expect $b_{1}$ increase in the final grade.  


A 95% confidence interval is the default output and is seen here through the range of [0.025, 0.975]. That range covers 95% of the curve (.975 - .025 = .95).

The confidence interval of [.402,1.301] is the interval we would expect the actual value of $b_{1}$ to fall in 95% of future observations.

The confidence interval of [-25.124	50.146] is the interval we would expect the actual value of $b_{0}$ to fall in 95% of future observations.

- 95% CI => 95% of the time the indicated parameter will fall in that range.
- A narrow interval means more confidence in the value presented.
- A wide interval indicates less confidence in the value presented.