# 3.3 - How reliable are the model's parameters?

In 3.2 we measure how much the model's predictions deviate from the real data. Now we focus on the parameters themselves, and consider how much we can trust that they are correct. For example, when applying linear regression to a dataset of sales against advertising, we may get a slope parameter of 0.1. In this case, how are we sure that there is indeed a positive relation between advertising and sales? How can we guarantee that the real slope isn't -0.05, and we got 0.1 just because of noise?

There are a few ways to approach this.

## Bootstrapping
Imagine we repeat a measurement on the same features a number of times. The response variable we measure will have some dispersion, induced by the noise in measurements (at the very least). If we model a dataset with a single realization per feature, we will have access to only one of those trials, which are all dispersed. As a consequence, the coefficients we get will not be the true coefficients, and each realization will lead to models with different coefficients. If we have access to several realizations, we can then see how much they disperse. If we assume the noise is unbiased, we can also get an idea of where the true value (at the center) is.  

![](images/dispersion-different-realizations.png)

This is super useful, but of course requires several different realizations. If we get only a single realization (i.e., a single dataset), the idea of _boostrapping_ allows us to effectively create different synthetic realizations, thereby *allowing us to estimate errors and distributions of the coefficients from a single realization*.

If the dataset has $N$ observations, we can randomly select a subset of size $N_c < N$ with potential duplicates. Each sample is treated as a different realization.

## Feature importance graph
From the bootstrapped method, one can compute the mean and standard deviation over the coefficients. An easy way to visualize this is to then plot a feature importance graph. For each feature, get the corresponding coefficient's mean and std values and plot them on the x-axis. 

![](images/feature-importance-graph.png)

## Confidence intervals

A confidence interval of a response variable is an interval in which there is a fixed change (e.g., 95%) that the true response variable intervals will be inside. For example, for a 95% confidence interval, there is a 95% chance that the true value is inside. 

This is estimated from the parameters.  
