*This jupyter notebook is part of Arizona State University's course CAS 523 (Methods for Complex Systems Science: Statistics and Dimensionality Reduction) and was written by Bryan Daniels.  It was last updated August 17, 2022.*

*This assignment takes some inspiration from the Heavy-Tailed Distributions exercise in Quantitative Economics with Python, by Thomas J. Sargent & John Stachurski: https://python.quantecon.org/heavy_tails.html.*

# Using the language of uncertainty

Probability distributions are used to precisely specify our expectations and uncertainty about complex systems.  In statistical learning, we often encounter a situation in which we wish to find a distribution that best represents the noisy variation in given data.  Here, we will practice fitting a distribution to data and using the language of statistics to communicate about our uncertainty.

## Load and plot the data

We will use typical python packages for numerical work and plotting:

In [None]:
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18}) # increases font size on plots
from pathlib import Path # to handle file paths across all operating systems

Financial markets are a good example of complex systems for which uncertainty is a key concept.  Let's look at some financial data: specifically, how the price of a large corporation's stock changes over time.

In [None]:
symbol,start_date,end_date = 'AMZN','2015-1-1', '2022-8-1'
data_filename = 'percent_changes_{}_{}_to_{}.pkl'.format(symbol,start_date,end_date)
data_path = Path('data/yfinance/'+data_filename)

# # use this code to download ticker data directly if you have the yfinance python package installed
# import yfinance as yf
# prices = yf.download(symbol,start_date,end_date)['Adj Close']
# percent_changes = prices.pct_change()[1:] * 100
# percent_changes.to_pickle(data_path)

# otherwise, you can use the following code to load data that I saved for you
percent_changes = pd.read_pickle(data_path)

Here we have loaded the day-to-day percent changes in the price of a share of Amazon's stock.

Check out what the data looks like by running the following:

In [None]:
percent_changes

In [None]:
percent_changes.plot()
plt.ylabel('Percent change');

In [None]:
percent_changes.plot.hist(bins=50)
plt.xlabel('Percent change');

❓ **Find the worst day in terms of percentage loss for Amazon during these years.  What percent of the stock's value was lost?** *Hint: `percent_changes.min()` will find the smallest value, and `percent_changes[percent_changes==val]` will find entries that have a given value. You can check your answer [here](https://www.washingtonpost.com/business/2022/04/29/markets-wall-street-april/).*

✳️ **Answer:** 

❓ **Given the number of days in the dataset, roughly how likely do you think it is that Amazon's stock will lose 14% of its value tomorrow?** *Hint: I'm not looking for anything fancy here—just a very rough estimate.*

✳️ **Answer:** 

## Fit a normal distribution

How can we more precisely characterize our uncertainty about how we expect Amazon's stock price to change tomorrow?  What if I asked you to estimate the probability that there was a 20% loss or 20% gain?  What about 50%?  Even though there were never jumps that large in the dataset, it's certainly not impossible.  But roughly how likely?

For this, we will assume a mathematical form for the probability distribution and fit its parameters to the data.  The simplest distribution we might fit is the normal (aka Gaussian) distribution.  A normal distribution corresponds to the assumption that stock price deviations are the result of summing many small, uncorrelated contributions.

(Note that our approach of simply fitting a single probability distribution is, while useful, rather naive.  An expert in stock trading would certainly have a lot to say about how stock prices could be more predictable—for instance, large changes in price are much more likely after the release of earnings reports.  This is one of many examples we will see in this class of a tradeoff between simpler models that are easier to analyze and more complicated ones that *might* be able to make better predictions.)

To fit a normal distribution to the data, we use `scipy`'s `stats.norm.fit`:

In [None]:
paramsNormal = stats.norm.fit(percent_changes)
print(paramsNormal)

The output `paramsNormal` is a list of best-fit parameters for the normal distribution.  Let's be a little more specific in our print statement:

In [None]:
print("Best-fit parameters: mean = {:1.5}, std. dev. = {:1.5}.".format(*paramsNormal))

The following code plots the resulting probability density and compares it to the (normalized) histogram of the data:

In [None]:
percent_changes.plot.hist(bins=50,density=True)
xs = np.linspace(-15,15,100)
plt.plot(xs,[stats.norm.pdf(x,*paramsNormal) for x in xs],lw=5)
plt.xlabel('Percent change')
plt.ylabel('Probability density')
plt.title('Normal distribution');

On this scale, it's hard to see exactly how likely those rare events are out in the "tails" of the distribution.  To better see the tails, we can put the densities on a logarithmic scale using `plt.yscale`:

In [None]:
percent_changes.plot.hist(bins=50,density=True)
xs = np.linspace(-15,15,100)
plt.plot(xs,[stats.norm.pdf(x,*paramsNormal) for x in xs],lw=5)
plt.xlabel('Percent change')
plt.ylabel('Probability density')
plt.title('Normal distribution')
plt.yscale('log')

❓ **Characterize by eye how well the normal distribution fits the data.  How well does it do in the middle, closer to zero change?  What about larger changes?**

✳️ **Answer:** 

We can be more precise about how well the distribution fits by calculating the likelihood function.  This tells us how likely it is that our observed data (blue histogram above) would have been output if we were to randomly sample from the fit distribution (orange curve above).  `scipy` gives this to us in the form of a "negative log-likelihood" using the `nnlf` function:

In [None]:
nlfNormal = stats.norm.nnlf(paramsNormal,percent_changes)
print('The negative log-likelihood for the normal distribution is {:1.5}.'.format(nlfNormal))

This doesn't tell us much for now, but will be useful when we compare to other hypotheses for the distribution.  (When the likelihood is larger—corresponding to the negative log-likelihood being smaller—the fit is better.)

## Fit heavy-tailed distributions

In an attempt to do better at fitting the tails of the distribution, we will try a couple of forms that have so-called "heavy tails"—that is, they allow for extreme values to be more likely than would be possible with the exponentially decaying form of the normal distribution.

One possible heavy-tailed distribution is the log-normal, which is implemented in `scipy` as `stats.lognorm`.

❓ **Fit a log-normal distribution to the data, make a plot comparing its probability density function to a histogram of the data, and compute the negative log-likelihood.**  *Hint: `scipy.stats` is set up so that `stats.lognorm.fit` works exactly the same as `stats.norm.fit` above.*

In [None]:
# ✳️ Answer:

Now we can also compute a likelihood ratio comparing likelihoods of the normal and log-normal distributions.  If we call the negative log-likelihood for the log-normal distribution `nlfLognormal`, the following code will compute $K$, the ratio of the two likelihoods:

In [None]:
Klognormal = np.exp(-nlfLognormal - (-nlfNormal))
print('The (naive) likelihood ratio for selecting log-normal over normal is {:1.5}.'.format(Klognormal))

(Note: I call this a "naive" likelihood ratio because it does not account for the fact that the log-normal distribution includes one extra parameter.  With the freedom of that extra parameter, we expect the log-normal to be able to fit *any* data slightly better, even if it were not truly a better description of the data.  We will return to this issue of "overfitting" later in the course.)

❓ **Briefly interpret this likelihood ratio.  How does $K$ translate into confidence in selecting one model over another?**

✳️ **Answer:** 

Another heavy-tailed distribution that has been used to describe financial data is the Cauchy distribution, implemented in `scipy` as `stats.cauchy`.

❓ **Fit a Cauchy distribution to the data, make a plot comparing its probability density function to a histogram of the data, and compute the negative log-likelihood.**

In [None]:
# ✳️ Answer:

❓ **Compute the likelihood ratio $K$ comparing the Cauchy distribution to the normal distribution.**

In [None]:
# ✳️ Answer:

❓ **Interpret your results in terms of the certainty with which the data can be said to favor one of these distributions.  Given that a simple "random walk" model for price changes would produce a normal distribution, what does this say about the random walk hypothesis?**

✳️ **Answer:**

❓ **Interpret your results in terms of the likelihood of large events.  In particular, roughly how much more likely is a 10% daily loss using the Cauchy distribution as compared to the normal distribution?  How might this impact a stock trader?** *Hint: No need for exact numbers here—feel free to take estimates by eye from your plots.*

✳️ **Answer:** 

⚛️ **Bonus questions (for nothing but bragging rights):**
* Do a similar analysis for the price of a different stock or cryptocurrency.  *Hints: I downloaded the Amazon stock price data using the `yfinance` package (see https://pypi.org/project/yfinance/). To install `yfinance`  at the command line, use `pip install yfinance --upgrade --no-cache-dir`.*
* Cauchy and normal distributions are part of a larger family of "stable" distributions: https://en.wikipedia.org/wiki/Stable_distribution.  This is implemented in `scipy` as `stats.levy_stable`.  Use this package to test Benoit Mandelbrot's hypothesis that commodity (or stock) prices follow an alpha-stable distribution with $\alpha \approx 1.7$ (see https://en.wikipedia.org/wiki/Stable_distribution#Applications).  *Hint: This distribution takes much more computer time to fit.  I used a subsample of the data to run the fit in a reasonable amount of time.*

✴️ **Answer:**