In [1]:
from scipy import stats
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Bootstrapping

Bootstrapping = Sampling with replacement

1. Randomly choose a sample
2. Write it down
3. Put it back in the data (replacement)
4. Repeat

â€¢ Bootstrapped sample = Sample generated from bootstrapping


## Non-parametric confidence interval

- Non-parametric analogue of stats.norm.interval
- Sample with replacement
- Compute test statistic
- Record it
- Repeat
- Creates an empirical distribution

### Normal confidence intervals vs Bootstrap confidence intervals

**Normal confidence intervals**

- Requires data to be normally distributed
- Computed based only on mean and
standard error
- Inference valid only for normal data
- Very fast to compute
  
**Bootstrap confidence intervals**
- Allows for any distribution
- Computed directly from data by resampling
- Inference valid for any data
- Much slower to compute

**Use cases for bootstrapping**

- When working with non-normal data
- Ranked data
- Skewed data
- When normal confidence intervals return questionable values
- Work with any statistic we like

In [6]:
investments_df = pd.read_csv('https://assets.datacamp.com/production/repositories/6125/datasets/3605df54bcd4f5dec36590c3724f594fa5a3890d/investments_VC.csv')

In [8]:
# Select just the companies in the Analytics market
analytics_df = investments_df[investments_df['market'] == 'Analytics']

# Confidence interval using the stats.norm function
norm_ci = stats.norm.interval(confidence=0.95,
                             loc=analytics_df['private_equity'].mean(),
                             scale=analytics_df['private_equity'].std() / np.sqrt(analytics_df.shape[0]))

# Construct a bootstrapped confidence interval
bootstrap_ci = stats.bootstrap(data=(analytics_df['private_equity'], ),
                              statistic=np.mean)

print('Normal CI:', norm_ci)
print('Bootstrap CI:', bootstrap_ci.confidence_interval)

Normal CI: (-695062.1822300982, 4049988.441010135)
Bootstrap CI: ConfidenceInterval(low=340562.0619718085, high=7641944.643204782)


These two return quite different results, even though they're estimating the same thing! The primary reason for the difference is that the mean is relatively small and spread out, so the standard error is large. That causes the normal confidence interval to contain negative values.

Q: Why are negative values in the confidence interval problematic for inference?

A: Since the average private equity funding cannot be negative, conclusions from this confidence interval are questionable.

Q: Why, on the other hand, does the bootstrap confidence interval not contain negative values?

A: A bootstrap confidence interval is created by sampling from the original data, which does not contain negative values.


## Combining evidence from p-values

Researchers may come to different conclusions of the same experiment. Why?

- Different samples - Different conclusions
- Culprit: Effect size

Solution: Testing a list of p-values that validate the same Null Hypothesis
- Fisher's method
    - Different samples/studies
    - Same null hypothesis
    - Different p-values
    - At least one should reject the null
- Combines evidence from multiple studies

In [9]:
p_values = [0.052, 0.12, 0.09, 0.051]
fishers_stat, p_value = stats.combine_pvalues(p_values)
print(p_value < 0.05)

True


Above 4 experiments testing the same null hypothesis are all less than 0.05 significance level, however they are quite close to the significance level. The combined_pvalues test has the a P value <0.05. Thus, we can conclude that, while no test individually showed statistical significance, the combination of evidence from all of the tests suggest there is indeed a significant effect present.

In [10]:
p_values = [0.01, 0.51, 0.81, 0.49]
fishers_stat, p_value = stats.combine_pvalues(p_values)
print(p_value < 0.05)

False


Contrast that with the case where one study had a low p-value of zero-point-zero-one, and yet all others had significantly larger p-values. In this case, Fisher's method suggests that none of the studies should have rejected the null, and perhaps the one study which did is merely a fluke.

## Permutation tests
- Shuffles samples
- Observes outcome
- Observed difference looks like a random outcome?

**Permutation tests for mean difference**

In [13]:
new_satisfaction = [94, 85, 79, 91, 82]
old_satisfcation = [90, 87, 77, 85, 82]

# Group together our data
data = (new_satisfaction, old_satisfcation)

# Define our test statistic
def statistic(x, y):
    return np.mean (x) - np.mean (y)

# Compute a permutation test for the difference in means
res = stats. permutation_test(data, statistic, n_resamples=1000, vectorized=False, alternative='greater')
print(res.pvalue)

0.2976190476190476


## Permutation tests vs Bootstrapping

**Permutation tests**
- Build a null distribution by randomly shuffling data
- Tests for significance of an outcome

**Bootstrapping**
- Build a probability distribution by randomly sampling data
- Creates a confidence interval showing most likely outcomes

In [14]:
btc_sp_df=pd.read_csv('https://assets.datacamp.com/production/repositories/6125/datasets/62594304a29e7820dd01ee8970456ef9decf8aa2/btc_sp.csv')
btc_sp_df['Pct_Daily_Change_BTC']=(btc_sp_df['Open_BTC']-btc_sp_df['Close_BTC'])/btc_sp_df['Open_BTC']
btc_sp_df['Pct_Daily_Change_SP500']= (btc_sp_df['Open_SP500']-btc_sp_df['Close_SP500'])/btc_sp_df['Open_SP500']

In [16]:
# Define a function which returns the Pearson R value
def statistic(x, y):
	return stats.pearsonr(x,y)[0]

# Define the data as the percent daily change from each asset
data = (btc_sp_df['Pct_Daily_Change_BTC'],btc_sp_df['Pct_Daily_Change_SP500'])

# Compute a permutation test for the percent daily change of each asset
res = stats.permutation_test(data, statistic, 
           n_resamples=1000,
           vectorized=False, 
           alternative='greater')

# Print if the p-value is significant at 5%
print(res.pvalue < 0.05)

True


Notice how the ability to conduct this test hinged only on your ability to write a statistic function and collect data! Hopefully this shows you the power of a permutation test, and how it can be used in a broad range of situations. You put absolutely no assumptions on your data, and yet you were still able to conclude that the observed greater volatility in Bitcoin over SP500 is indeed statistically significant.