# Introduction to Quantitative Finance

Copyright (c) 2019 Python Charmers Pty Ltd, Australia, <https://pythoncharmers.com>. All rights reserved.

<img src="img/python_charmers_logo.png" width="300" alt="Python Charmers Logo">

Published under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See `LICENSE.md` for details.

Sponsored by Tibra Global Services, <https://tibra.com>

<img src="img/tibra_logo.png" width="300" alt="Tibra Logo">


## Module 2.3: Testing and Benchmarking

### 2.3.2 Confidence Intervals

A statistical analysis without discussion of confidence or range is not complete.

Statistics is about dealing with uncertainty, and when a statistical analysis gives a confident "the mean is 3.2" result, there is information missing here, specifically around how confident we are in that result and where we can reasonably expect the *actual* value to end up. This is also why political polls jump around so much in the news - they don't really, just that newspapers rarely report confidence intervals, so when sample mean naturally jumps around, this is the only value that is reported.

Confidence intervals are a key measure to use here, and one of the easiest to explain, especially to non-statistical stakeholders. A confidence interval for a given estimate, at a given threshold X% is an interval for where X% of the expected values sit in that interval. Let's look at an example:

In [26]:
%run setup.ipy

In [2]:
# Module from 1.3.2 - Multivariate OLS

In [3]:
import quandl

interest_rates = quandl.get("RBA/F13_FOOIRATCR")
interest_rates = interest_rates[interest_rates.columns[0]]  # Extract the first column, whatever it is called
interest_rates.name = "InterestRate"  # Rename, as the original had a long name. Hint: don't use spaces or special chars

In [4]:
inflation = quandl.get("RBA/G01_GCPIAGSAQP")
inflation.columns = ['Inflation']

In [5]:
au_dollar = quandl.get("BUNDESBANK/BBEX3_M_AUD_USD_CM_AC_A01")['Value']
au_dollar.name = "AUDUSD"

In [6]:
data = pd.concat([interest_rates, inflation, au_dollar], axis=1)  # Combines multiple series into a DataFrame

In [7]:

import statsmodels.formula.api as smf
est = smf.ols(formula='Inflation ~ InterestRate + AUDUSD', data=data).fit()  # Does the constant for us

In [8]:
est.summary()

0,1,2,3
Dep. Variable:,Inflation,R-squared:,0.111
Model:,OLS,Adj. R-squared:,0.095
Method:,Least Squares,F-statistic:,7.109
Date:,"Mon, 13 May 2019",Prob (F-statistic):,0.00123
Time:,12:58:32,Log-Likelihood:,-86.6
No. Observations:,117,AIC:,179.2
Df Residuals:,114,BIC:,187.5
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2334,0.303,0.770,0.443,-0.367,0.834
InterestRate,0.0675,0.018,3.770,0.000,0.032,0.103
AUDUSD,0.0617,0.367,0.168,0.867,-0.665,0.789

0,1,2,3
Omnibus:,70.538,Durbin-Watson:,1.746
Prob(Omnibus):,0.0,Jarque-Bera (JB):,624.332
Skew:,1.788,Prob(JB):,2.68e-136
Kurtosis:,13.737,Cond. No.,57.9


If we let our `InterestRate` variable be $X_1$, then we can see that our model gives a 95% confidence interval between a low of 0.030 and a high of 0.102 (your results may change, see this line):

<img src="img/confidence_interval_highlight.png">

A key part of the need for a confidence interval is that we almost always have error in our estimate, at the very least due to the sample we take. This is true even for things that seem ground truth. For instance, if we are predicting stock prices based on the close price, we have a sampling error - if the market closed a second earlier, we may get a different close price.

Confidence intervals can be calculated lots of different ways, and the general process is the same whether your process is a classical statistical one, a Bayesian or a Simulation methodology.

* For classical statistics, many distribution types have a method to compute the confidence interval, based on manipulation of the equations of those distributions. See `scipy.stats.norm.interval` for information on how to do this for a normal distribution. One example that most have heard about is "for a normal distribution, 95% of values fit within two standard errors".
* For Bayesian statistics and simulations, the formulas and simulations create confidence intervals through their varied predictions. Create many bootstrap samples (same size as the original sample, however sampled with replacement) and compute the statistic for each bootstrap sample. After sorting, take the value 2.5% of the way through the data, and 97.5% of the way through. This range is the 95% confidence interval. This process is generally known as the bootstrap method.

As noted in the previous notebook, the value of 95 in "95% confidence interval" has literally no special meaning - it is just a value many people choose. Don't use this value without some thought about it, especially if you are making decisions related to this value.

A common usage of confidence intervals is to see if a given value sits within it, provide a pseudo-likelihood that value is "possible". For instance, if the 95% confidence interval for the slope of a line contains 0, some would say there is a possibility of "no correlation" between the two. As noted in the last notebook, this confuses the term and is not a reliable methodology to use.

Like p-values, confidence intervals are often misinterpreted. If you get a confidence interval of $a$ and $b$, this does **not** mean that 95% of individual measurements will fit between $a$ and $b$. It means that when we take samples (of the same size we used to compute the confidence interval), 95% of the values for the calculated statistic (such as the mean), will fit inside those bounds.

As a note on reporting, you generally shouldn't give confidence intervals to too many decimal places. Saying "the confidence interval for average height is between 162 and 182cm" is better than saying "the confidence interval is between 162.243cm and 182.976cm", because the latter makes it seem like the process is much more formal than it really is. Remember you likely just used a 95% confidence interval because that's what most people use.

Using the bootstrap method, we are not limited to computing the confidence interval on the mean, as we are with more classical statistics (well, not limited, but it is very hard to do much else). The resampling method used in bootstrap statistics allows for us to calculate arbitrary statistics from the dataset.

Confidence intervals are affected by two main factors:

* Sample size. Larger sample sizes lead to lower confidence intervals, due to the fact that the sample is "more like" the population, by virtue of having more of the population in it. Therefore, all samples are "more like" each other, and our interval will be smaller.
* Variation within the population. Confidence intervals are wider when the variation in the population itself is wider. This makes samples "less like" each other, leading to greater different values in different samples.

#### Exercise

1. Plot the interest rates data above using `altair`, as a line plot.
2. Add error bars to your plot. See the Altair gallery for examples on how to do this.
3. Fit a normal distribution to the means-of-samples of the interest rate data (i.e. sample the data, compute the mean, repeat many times). Compute the confidence interval using the `scipy.stats.norm.interval` method.
4. Use a bootstrap method, where you sort all the sample means and take the 95% confidence interval as the "middle 95%" noted above.
5. Compare the results from (3) and (4)

*For solutions, see `solutions/confidence_intervals.py`*

### Worked Example

In this worked example, we will compute the 90% confidence interval for the proportion of times the following is true:

    If the price of IBM increases on a given day, Microsoft will increase the following day.

To do this, we first get our data, and then take a sample. We'll use daily closing prices to determine "increase". 

Note also we aren't testing a correlation. We don't care so much about "if IBM drops, will Microsoft drop?", just that if IBM increases, Microsoft will).

In [9]:
ibm = quandl.get("WIKI/IBM")['Close']

In [10]:
msft = quandl.get("WIKI/MSFT")['Close']

In [11]:
msft.head()

Date
1986-03-13    28.00
1986-03-14    29.00
1986-03-17    29.50
1986-03-18    28.75
1986-03-19    28.25
Name: Close, dtype: float64

In [12]:
# Combine to make analysis easier
stocks = pd.DataFrame({"ibm": ibm, "msft": msft})

Next, we compute the two intermediate pieces of information:

1. Did IBM increase on the day?
2. Did MSFT increase on the day?

We can then offset (2) to be "Did MSFT increase the day after?":

In [13]:
stocks['ibm_up'] = stocks['ibm'].diff() > 0
stocks['msft_up'] = stocks['msft'].diff() > 0

In [14]:
stocks.dropna(inplace=True, how='any')  # Removes rows missing some data. Effectively starts from MSFT IPO

In [15]:
stocks.head()

Unnamed: 0_level_0,ibm,msft,ibm_up,msft_up
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1986-03-13,150.5,28.0,True,False
1986-03-14,150.38,29.0,False,True
1986-03-17,150.88,29.5,True,True
1986-03-18,152.38,28.75,True,False
1986-03-19,151.63,28.25,False,False


In [16]:
# Offset msft_up to be "will it increase tomorrow?"
stocks['msft_up_tomorrow'] = stocks['msft_up'].shift(-1)  # -1 "shifts upwards"

In [17]:
stocks.head()

Unnamed: 0_level_0,ibm,msft,ibm_up,msft_up,msft_up_tomorrow
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1986-03-13,150.5,28.0,True,False,True
1986-03-14,150.38,29.0,False,True,True
1986-03-17,150.88,29.5,True,True,False
1986-03-18,152.38,28.75,True,False,False
1986-03-19,151.63,28.25,False,False,False


In [18]:
# Extract just the values for which our premise is true:
premise_true = stocks[stocks['ibm_up']]

In [19]:
# Get an overall estimate for our conclusion being true as well:
premise_true['msft_up_tomorrow'].mean()

0.49712858926342074

Here, our estimate is 49% that if IBM increased today, MSFT will increase tomorrow. Effectively random.

Let's compute the confidence interval for this using the bootstrap method:

In [20]:
def compute_msft_follows_ibm_statistic(original_data):
    # Encapsulates code above for "MSFT increase follows an IBM increase"
    sample = original_data.sample(replace=True, n=len(original_data))
    return sample['msft_up_tomorrow'].mean()


In [21]:
compute_msft_follows_ibm_statistic(premise_true)

0.48664169787765293

In [22]:
number_experiments = 10000

values = np.array([compute_msft_follows_ibm_statistic(premise_true) for i in range(number_experiments)])


In [23]:
def compute_confidence_interval(values, ci=0.95):
    """Computer confidence interval for the given values"""
    assert 0 < ci <= 1
    n = len(values)
    lower = int(n * (1-ci)/2) 
    upper = int(n * (1-((1-ci)/2)))
    assert upper > lower  # Can be lower == upper if not enough samples
    sorted_values = np.sort(values)
    return sorted_values[lower], sorted_values[upper]


In [24]:
compute_confidence_interval(values, ci=0.9)

(0.48414481897627965, 0.4961298377028714)

While the value is *near* 0.50, which would indicate that there is no value to our assumption, we can see that the confidence bound is actually less than 0.50 at the 0.9 confidence level. 

We could misinterpret this and suddenly start trading on the pattern "if IBM increases, short MSFT". Our evidence does support this idea, but proper backtesting would be needed. Note though, the importance of the confidence interval in this decision. By the mean alone, the value was so close to 0.50 that most would just write it off as "roughly a coin flip, so no further research to be done". A small amount of coding gets us confidence intervals and "there may be a slight edge here that is exploitable".

That said, always check the confidence intervals. Remember that a confidence interval of 0.90 roughly equates to "if we do 10 experiments at a CI of 0.9, one in ten will be wrong". Here is the same analysis for different ci levels:

In [25]:
print("CI\tFalse positives in...")
for ci in [0.5, 0.75, 0.8, 0.85, 0.9, 0.95, 0.975, 0.99, 0.999, 0.9999, 0.99999]:
    number_wrong = int(1/(1-ci))
    print("{ci:.5f}\t1 in {number_wrong}".format(**locals()))

CI	False positives in...
0.50000	1 in 2
0.75000	1 in 4
0.80000	1 in 5
0.85000	1 in 6
0.90000	1 in 10
0.95000	1 in 19
0.97500	1 in 39
0.99000	1 in 99
0.99900	1 in 999
0.99990	1 in 10000
0.99999	1 in 100000


Those last few values are colloquially referred to as "three 9s", "four 9s" and "five 9s" and so on, especially in studies of reliability.

#### Exercise

Modify the worked example to test this hypothesis:

    If IBM drops by more than 5% on a given day, MSFT will increase the following day.
    
Provide a single estimate for the probability of this happening, as well as a confidence interval.
    
#### Extended Exercise

Modify further to test this hypothesis:

    If IBM drops by more than 5% over a given week, MSFT will increase the following week.