## Importing modules for Analysis

#### Loading Time series Data

The most common way to import time series data in Python is by using the pandas library. You can use the read_csv() from pandas to read the contents of a file into a DataFrame.

Once your data is loaded into Python, you can display the first rows of your DataFrame by calling the .head(n=5) method, where n=5 indicates that you want to print the first five rows of your DataFrame.

In [16]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import  datetime

In [3]:
UFO = pd.read_csv("UFO.csv")
DJI = pd.read_csv("DJI.csv")

In [4]:
print(UFO.head())
print(DJI.head())

   Date  Value
0  1941      1
1  1942      2
2  1943      9
3  1944      9
4  1945      9
   Date   Value
0  1941  110.96
1  1942  119.40
2  1943  135.89
3  1944  152.32
4  1945  192.91


### Correlation of datasets 

Two trending series may show a strong correlation even if they are completely unrelated. This is referred to as "spurious correlation". That's why when you look at the correlation of say, two stocks, you should look at the correlation of their returns and not their levels.

To illustrate this point, calculate the correlation between the levels of the stock market and the annual sightings of UFOs. Both of those time series have trended up over the last several decades, and the correlation of their levels is very high. Then calculate the correlation of their percent changes. This will be close to zero, since there is no relationship between those two series.

### A Popular Strategy Using Autocorrelation

One puzzling anomaly with stocks is that investors tend to overreact to news. Following large jumps, either up or down, stock prices tend to reverse. This is described as mean reversion in stock prices: prices tend to bounce back, or revert, towards previous levels after large moves, which are observed over time horizons of about a week. A more mathematical way to describe mean reversion is to say that stock returns are negatively autocorrelated.

This simple idea is actually the basis for a popular hedge fund strategy. If you're curious to learn more about this hedge fund strategy (although it's not necessary reading for anything else later in the course), see here.

You'll look at the autocorrelation of weekly returns of MSFT stock from 2012 to 2017. You'll start with a DataFrame MSFT of daily prices. You should use the .resample() method to get weekly prices and then compute returns from prices. Use the pandas method .autocorr() to get the autocorrelation and show that the autocorrelation is negative. Note that the .autocorr() method only works on Series, not DataFrames (even DataFrames with one column), so you will have to select the column in the DataFrame.

In [21]:
MSFT = pd.read_csv("MSFT.csv")

In [22]:
MSFT.head()

Unnamed: 0,Date,Adj Close
0,8/6/2012,26.107651
1,8/7/2012,26.377876
2,8/8/2012,26.438896
3,8/9/2012,26.587088
4,8/10/2012,26.517351


In [26]:
MSFT['Date'] = pd.to_datetime(MSFT.Date)

In [28]:
MSFT.head()

Unnamed: 0,Date,Adj Close
0,2012-08-06,26.107651
1,2012-08-07,26.377876
2,2012-08-08,26.438896
3,2012-08-09,26.587088
4,2012-08-10,26.517351


In [30]:
# Convert the daily data to weekly data
MSFT = MSFT.resample(rule='W', how='last')

# Compute the percentage change of prices
returns = MSFT.pct_change()

# Compute and print the autocorrelation of returns
autocorrelation = returns['Adj Close'].autocorr()
print("The autocorrelation of weekly returns is %4.2f" %(autocorrelation))

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'