<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-do-we-build-linear-regression-models?" data-toc-modified-id="Why-do-we-build-linear-regression-models?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why do we build linear regression models?</a></span><ul class="toc-item"><li><span><a href="#Learning-more-about-the-relationships" data-toc-modified-id="Learning-more-about-the-relationships-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning more about the relationships</a></span></li><li><span><a href="#Use-the-model-in-the-backwards-direction" data-toc-modified-id="Use-the-model-in-the-backwards-direction-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Use the model in the backwards direction</a></span></li><li><span><a href="#Using-the-model's-predictions" data-toc-modified-id="Using-the-model's-predictions-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Using the model's predictions</a></span></li></ul></li><li><span><a href="#What-is-$𝑅^2$-measuring?" data-toc-modified-id="What-is-$𝑅^2$-measuring?-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>What is $𝑅^2$ measuring?</a></span></li><li><span><a href="#Why-$R^2$-is-not-sufficient-to-judge-a-model's-regression-performance" data-toc-modified-id="Why-$R^2$-is-not-sufficient-to-judge-a-model's-regression-performance-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Why $R^2$ is not sufficient to judge a model's regression performance</a></span></li><li><span><a href="#Which-other-metrics-should-be-used-instead?" data-toc-modified-id="Which-other-metrics-should-be-used-instead?-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Which other metrics should be used instead?</a></span></li></ul></div>

# The misguided reliance on $R^2$ to judge model's performance #

This notebook shows, by example, why relying on $R^2$ to judge a model's performance is misguided and misleading.

**Summary**

* A very high $R^2$ value for a linear regression model could still be insufficient.
* A low $R^2$ value can in many cases still be valuable.
* The intention of your regression model is the important determinant for choosing an appropriate metric, and a suitable metric is probably ***not*** $R^2$.

## Why do we build linear regression models?

Historical data is collected and a relationship between the input(s) and the output is calculated. This relationship is often a linear regression model. 

The purpose is most often to use that calculated relationship and based on it, is then to make a prediction of some future output, given a new input. 

As said, this is the most comment reason for building that linear regression model. So here are the ways linear regression is used: 

1. to learn more about what this relationship between inputs and output is
2. to later turn the model around, and find the inputs, in order to get a desired output (i.e. using the model in the backwards direction)
3. and then the most common reason: to get good predictions of the output, based on the inputs (i.e. in the forwards direction).

Let's look at each of these in turn.

### Learning more about the relationships

The coefficient, $b_1$ of the linear regression $y = b_0 + b_1 x_1$, shows the average effect on the output, $y$, for a one unit increase in the input $x_1$. This is learning about our system.

If you built a regression model between $x_1 = $ temperature measured in Celsius of your system (input) and the $y=$ pH (output) you   might get a regression model of $$ y=4.5 + 0.12 x_1$$
from which you learn two things:
* that every 1 degree increase in temperature, leads, on average, to an increase of pH by 0.12 units
* that the expected pH when using a temperature of $x_1 = 0$ degrees Celsius, leads to an output pH of 4.5 units.

But consider two cases: what if I told you the $R^2$ of this model was 0.2, or it was 0.94. How does this change your learnings? We will come to this in the next section, where we learn what $R^2$ is measuring.

### Use the model in the backwards direction

To continue the above example, at what temperature do we operate the system to reach a pH of $y=5.7$? Provided we keep things constant at the same conditions as when we acquired the historical data to build the model [that is far-reaching requirement], we can turn the model around, and calculate that $$x = \dfrac{y - 4.5}{0.12}$$

Again, consider two cases of a low and high $R^2$: how reliable is this usage of the regression model under those 2 scenarios?

### Using the model's predictions 

This scenario is the one most people are familiar with. Continuing the above, it is asking what the expected (or predicted) pH would be for a given new input value of temperature, $x_1$. For example, at a new temperature that we have never operated at before of 13°C, we expect an output pH of $4.5 + 0.12 \times 13 = 6.06$ pH units.

And again, what value do we have from a model with an $R^2$ which is around 0.2, or a model with $R^2$ of 0.94?

## What is $𝑅^2$ measuring?

The $R^2$ value is nothing more than a measure of how strongly two variables are correlated. It is the square root of the correlation coefficient between $x$ and $y$.

## Why $R^2$ is not sufficient to judge a model's regression performance

Here are two simple reasons why $R^2$ is not the correct metric to judge how well you can predict a new output, $y$, from a new input $x$:

1. If you switch the historical data around, and make $y$ the $x$ and let $x$ become $y$, then you get ***exactly the same*** $R^2$ value. That does not make sense. A metric of a model's prediction ability **must** depend on what is the input and the output.
2. What if I told you that I can tell you what the $R^2$ value will be, before even calculating the model's slope and intercept? * That does not make sense. How can a good metric of prediction performance be calculated before even fitting the prediction model? 

The above would be the equivalent of calculating the prediction ability of a neural network before even fitting it; or flipping the input and outputs around and getting the same performance metric.


*Look at the equation for $R^2$: it is the correlation between $x$ and $y$ and does not depend on any model you have fitted.

## Which other metrics should be used instead?

* learning from the model: use CI's
* model backwards: same 
* prediction: use PIs and SE to judge model
