# Time Series Analysis

* From stock prices to climate data, time series data are found in a wide variety of domains, and being able to effectively work with such data is an increasingly important skill for data scientists.
* Time series analysis deals with data that is ordered in time
* Some useful pandas methods:
    * `df['col'].pct_change()` : percent change
    * `df['col'].diff()` : difference
    * `df['ABC'].corr(df['XYZ'])`: correlation (for pd Series)
* __Google Trends__ allows users to see how often a term is searched for.

* A first step when analyzing a time series is to visualize the data with a plot. 
* Stock and bond markets in the U.S. are closed on different days. For example, although the bond market is closed on Columbus Day (around Oct 12) and Veterans Day (around Nov 11), the stock market is open on those days. One way to see the dates that the stock market is open and the bond market is closed is to convert both indexes of dates into sets and take the difference in sets.

### Two Time Series
* Often, two time series vary together
* __correlation coefficient:__ a measure of how much two series vary together; a correlation of 1 means two series have a perfect linear correlation with no deviations. A correlation of 0 means no correlation whatsoever.
#### Common Mistake: Correlation of two trending series 
* Just because two time series seem to be trending together in the same direction(s) does not mean that they are correlated.
* Example: Dow Jones and UFO sightings
* If you're looking at the correlation of two stocks, you should look at the comparison of their returns, not their levels
* Compute percent changes in each, then compute correlation
* Scatter plots are also useful for visualizing the correlation between the two variables.

```
#Compute percent change using pct_change()
returns = stocks_and_bonds.pct_change()
#Compute correlation using corr()
correlation = returns['SP500'].corr(returns['US10Y'])
print("Correlation of stocks and interest rates: ", correlation)
#Make scatter plot
plt.scatter(returns['SP500'], returns['US10Y'])
plt.show()
```

* Two trending series may show a strong correlation even if they are completely unrelated. This is referred to as "spurious correlation". That's why when you look at the correlation of say, two stocks, you should look at the correlation of their returns and not their levels.
```
#Compute correlation of levels
correlation1 = levels['DJI'].corr(levels['UFO'])
print("Correlation of levels: ", correlation1)
#Compute correlation of percent changes
changes = levels.pct_change()
correlation2 = changes['DJI'].corr(changes['UFO'])
print("Correlation of changes: ", correlation2)
```
### Simple Linear Regression

* Linear Regression aka OLS
* Python packages to perform regressions:
    * In statsmodels:
        * `sm.OLS(y, x).fit()`
    * In numpy:
        * `np.polyfit(x, y, deg=1)`
    * In pandas:
        * `pd.ols(y, x)`
    * In scipy:
        * `stats.lingress(x, y)`
* __Beware:__ The order of x and y is not consistent across all packages
* R-squared measures how closely the data fit the regression line.
    * the R-squared in a simple regression is related to the correlation between the two variables. 
    * the magnitude of the correlation is the square root of the R-squared and the sign of the correlation is the sign of the regression coefficient.

```
#Import the statsmodels module
import statsmodels.api as sm
#Compute correlation of x and y
correlation = x.corr(y)
print("The correlation between x and y is %4.2f" %(correlation))
#Convert the Series x to a DataFrame and name the column x
dfx = pd.DataFrame(x, columns=['x'])
#Add a constant to the DataFrame dfx
dfx1 = sm.add_constant(dfx)
#Regress y on dfx1
result = sm.OLS(y, dfx1).fit()
#Print out the results and look at the relationship between R-squared and the correlation above
print(result.summary())
```

## Autocorrelation
* Autocorrelation is the correlation of a single Time Series with a lagged copy of itself. 
* Aka 'Serial Correlation'
* Often, when we refer to Autocorrelation, we mean __'Lag-one autocorrelation'__
#### What does it mean when a Series has a positive or a negative autocorrelation?
* __Mean Reversion:__ Negative autocorrelation
* __Momentum__ or __Trend-Following:__ Positive autocorrelation
* Example: Traders on Wall Street use autocorrelation to make money
* Individual stocks have historically negative autocorrelation

* `df.resample(rule= 'M', how= 'last')`
    * 'M' = monthly 
    * how = how to do the resampling; you can use the first date, the last date, or even the average
* __Mean reversion__ in stock prices: prices tend to bounce back, or revert, towards previous levels after large moves, which are observed over time horizons of about a week. 
* A more mathematical way to describe mean reversion is to say that stock returns are negatively autocorrelated.
```
#Convert the daily data to weekly data
MSFT = MSFT.resample(rule='W', how='last')
#Compute the percentage change of prices
returns = MSFT.pct_change()
#Compute and print the autocorrelation of returns
autocorrelation = returns['Adj Close'].autocorr()
print("The autocorrelation of weekly returns is %4.2f" %(autocorrelation))
```

* When you look at daily changes in interest rates, the autocorrelation is close to zero. However, if you resample the data and look at annual changes, the autocorrelation is negative. This implies that while short term changes in interest rates may be uncorrelated, long term changes in interest rates are negatively autocorrelated. 
```
#Compute the daily change in interest rates 
daily_diff = daily_rates.diff()
# Compute and print the autocorrelation of daily changes
autocorrelation_daily = daily_diff['US10Y'].autocorr()
print("The autocorrelation of daily interest rate changes is %4.2f" %(autocorrelation_daily))
#Convert the daily data to annual data
yearly_rates = daily_rates.resample(rule='A').last()
#Repeat above for annual data
yearly_diff = yearly_rates.diff()
autocorrelation_yearly = yearly_diff['US10Y'].autocorr()
print("The autocorrelation of annual interest rate changes is %4.2f" %(autocorrelation_yearly))
```

## Autocorrelation Function
* __ACF:__ AutoCorrelation Function; shows not only the lag 1 autocorrelation, but the entire autocorrelation function for different lags
* Any significant non-zero correlations implies that the series can be forecast from the past 
* ACF useful for model selection
* __Plot ACF in Python:__
    * `from statsmodels.graphics.tsaplots import plot_acf`
    * input x is series or array 
    * `plot_acf(x, lags=20, alpha=0.05)`
    * the argument `lags` indicates how many lags of the autocorellation function will be plotted.
    * the `alpha` argument sets the width of the confidence interval.
* __Confidence Interval of ACF:__
    * In `plot_acf` the argument `alpha` determines the width of the confidence interval.
    * `alpha` = chance the if true autocorrelation is zero, it will fall outside blue band
    * Example: `alpha` = 0.05 = 5% chance
    * Confidence bands are wider if:
        * alpha is lower
        * fewer observations
    * Under some simplifying observations, 95% confidence bands are +/- 2/