# Time Series Analysis 1

In the first lecture, we are mainly concerned with how to manipulate and smooth time series data.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
import os
import time

In [None]:
import numpy as np
import pandas as pd

In [None]:
! python3 -m pip install --quiet gmaps

In [None]:
import gmaps
import gmaps.datasets

## Dates and times

### Timestamps

In [None]:
now = pd.to_datetime('now')

In [None]:
now

In [None]:
now.year, now.month, now.week, now.day, now.hour, now.minute, now.second, now.microsecond

In [None]:
now.month_name(), now.day_name()

### Formatting timestamps

See format [codes](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)

In [None]:
now.strftime('%I:%m%p %d-%b-%Y')

### Parsing time strings

#### `pandas` can handle standard formats

In [None]:
ts = pd.to_datetime('6-Dec-2018 4:45 PM')

In [None]:
ts

#### For unusual formats, use `strptime`

In [None]:
from datetime import datetime 

In [None]:
ts = datetime.strptime('10:11PM 02-Nov-2018', '%I:%m%p %d-%b-%Y')

In [None]:
ts

### Intervals

In [None]:
then = pd.to_datetime('now')
time.sleep(5)
now = pd.to_datetime('now')

In [None]:
now - then

### Date ranges

A date range is just a collection of time stamps.

In [None]:
dates = pd.date_range(then, now, freq='s')

In [None]:
dates

In [None]:
(then - pd.to_timedelta('1.5s')) in dates

### Periods

Periods are intervals, not a collection of timestamps.

In [None]:
span = dates.to_period()

In [None]:
span

In [None]:
(then + pd.to_timedelta('1.5s')) in span

## Lag and lead with `shift`

We will use a periodic time series as an example. Periodicity is important because many biological phenomena are linked to natural periods (seasons, diurnal, menstrual cycle) or are intrinsically periodic (e.g. EEG, EKG measurements).

In [None]:
index = pd.date_range('1-1-2018', '31-1-2018', freq='12h')

You can shift by periods or by frequency. Shifting by frequency maintains boundary data.

In [None]:
wave = pd.Series(np.sin(np.arange(len(index))), index=index)

In [None]:
wave.shift(periods=1).head(3)

In [None]:
wave.shift(periods=1).tail(3)

In [None]:
wave.shift(freq=pd.Timedelta(1, freq='D')).head(3)

In [None]:
wave.shift(freq=pd.Timedelta(1, freq='D')).tail(3)

#### Visualizing shifts

In [None]:
wave.plot()
pass

In [None]:
wave.plot(c='blue')
wave.shift(-1).plot(c='red')
pass

In [None]:
wave.plot(c='blue')
wave.shift(1).plot(c='red')
pass

In [None]:
(wave - wave.shift(-6)).plot(c='blue')
(wave - wave.shift(-3)).plot(c='red')
pass

Embedding the time series with its lagged version reveals its periodic nature.

In [None]:
plt.scatter(wave, wave.shift(-1))
pass

### Find percent change from previous period

In [None]:
wave.pct_change().head()

`pct_change` is just a convenience wrapper around the use of `shift`

In [None]:
((wave - wave.shift(-1, freq='12h'))/wave).head()

## Resampling and window functions


The `resample` and window method have the same syntax as `groupby`, in that you can apply an aggregate function to the new intervals.

### Resampling

Sometimes there is a need to generate new time intervals, for example, to regularize irregularly timed observations.

#### Down-sampling

In [None]:
index = pd.date_range(pd.to_datetime('1-1-2018'), periods=365, freq='d')

In [None]:
series = pd.Series(np.arange(len(index)), index=index)

In [None]:
series.head()

In [None]:
sereis_weekly_average = series.resample('w').mean()
sereis_weekly_average.head()

In [None]:
sereis_monthly_sum = series.resample('m').sum()
sereis_monthly_sum.head()

In [None]:
sereis_10day_median = series.resample('10d').median()
sereis_10day_median.head()

#### Up-sampling

For up-sampling, we need to figure out what we want to do with the missing values. The usual choices are forward fill, backward fill, or interpolation using one of many built-in methods.

In [None]:
upsampled = series.resample('12h')

In [None]:
upsampled.asfreq()[:5]

In [None]:
upsampled.ffill().head()

In [None]:
upsampled.bfill().head()

In [None]:
upsampled.interpolate('linear').head()

### Window functions

Window functions are typically used to smooth time series data. There are 3 variants - rolling, expanding and exponentially weighted. We use the Nile flooding data for these examples.

In [None]:
df = pd.read_csv('data/nile.csv', index_col=0)

In [None]:
df.head()

In [None]:
df.plot()
pass

#### Rolling windows generate windows of a specified width

In [None]:
ts = pd.DataFrame(dict(ts=np.arange(5)))
ts['rolling'] = ts.rolling(window=3).sum()
ts

In [None]:
rolling10 = df.rolling(window=10)
rolling100 = df.rolling(window=100)

In [None]:
df.plot()
plt.plot(rolling10.mean(), c='orange')
plt.plot(rolling100.mean(), c='red')
pass

#### Expanding windows grow as the time series progresses

In [None]:
ts['expanding'] =  ts.ts.expanding().sum()
ts

In [None]:
df.plot()
plt.plot(df.expanding(center=True).mean(), c='orange')
plt.plot(df.expanding().mean(), c='red')
pass

#### Exponentially weighted windows place more weight on center of mass

In [None]:
n = 10
xs = np.arange(n, dtype='float')[::-1]
xs

Exponentially weighted windows without adjustment.

In [None]:
pd.Series(xs).ewm(alpha=0.8, adjust=False).mean()

Re-implementation for insight.

In [None]:
α = 0.8
ys = np.zeros_like(xs)
ys[0] = xs[0]
for i in range(1, len(xs)):
    ys[i] = (1-α)*ys[i-1] + α*xs[i]
ys

Exponentially weighted windows with adjustment (default)

In [None]:
pd.Series(xs).ewm(alpha=0.8, adjust=True).mean()

Re-implementation for insight.

In [None]:
α = 0.8
ys = np.zeros_like(xs)
ys[0] = xs[0]
for i in range(1, len(xs)):
    ws = np.array([(1-α)**(i-t) for t in range(i+1)])
    ys[i] = (ws * xs[:len(ws)]).sum()/ws.sum()
ys

In [None]:
df.plot()
plt.plot(df.ewm(alpha=0.8).mean(), c='orange')
plt.plot(df.ewm(alpha=0.2).mean(), c='red')
pass

Alternatives to $\alpha$

Using `span`
$$
\alpha = \frac{2}{\text{span} + 1}
$$

Using `halflife`
$$
\alpha = 1 - e^\frac{-\log{2}}{t_{1/2}}
$$

Using `com`
$$
\alpha = \frac{1}{1 + \text{com}}
$$


In [None]:
df.plot()
plt.plot(df.ewm(span=10).mean(), c='orange')
plt.plot(1+ df.ewm(alpha=2/11).mean(), c='red') # offfset for visibility
pass

## Correlation between time series

Suppose we had a reference time series. It is often of interest to know how any particular time series is correlated with the reference. Often the reference might be a population average, and we want to see where a particular time series deviates in behavior.

In [None]:
! python3 -m pip install --quiet pandas_datareader

In [None]:
import pandas_datareader.data as web

We will look at the correlation of some stocks.

```
QQQ tracks Nasdaq
MSFT is Microsoft
GOOG is Gogole
BP is British Petroleum
```

We expect that the technology stocks should be correlated with Nasdaq, but maybe not BP.

In [None]:
df = web.DataReader(['QQQ', 'MSFT','GOOG', 'BP'], 'stooq')
#                    api_key=os.environ['IEX_SECRET_KEY'])

In [None]:
df = df[['Close']].reset_index()

In [None]:
df

In [None]:
df = df.set_index(( 'Date',     ''))

In [None]:
df.head()

In [None]:
df.columns

In [None]:
df.rolling(100).corr(df[('Close', 'QQQ')]).plot()
pass

## Visualizing space and time data

Being able to visualize events in space and time can be impressive. With Python, often you need a trivial amount of code to produce an impressive visualization.

For example, lets generate a heatmap of crimes in Sacramento in 2006, and highlight the crimes committed 10 seconds before midnight.

See the [gmaps](https://github.com/pbugnion/gmaps) package for more information.

In [None]:
sacramento_crime = pd.read_csv('data/SacramentocrimeJanuary2006.csv', index_col=0)

In [None]:
sacramento_crime.index = pd.to_datetime(sacramento_crime.index)

In [None]:
sacramento_crime.head()

In [None]:
gmaps.configure(api_key=os.environ["GOOGLE_API_KEY"])

In [None]:
locations = sacramento_crime[['latitude', 'longitude']]

In [None]:
late_locations = sacramento_crime.between_time('23:59', '23:59:59')[['latitude', 'longitude']]

In [None]:
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations))
markers = gmaps.marker_layer(late_locations)
fig.add_layer(markers)
fig