# Time Series

This section will walk through simple time series analysis, and forecasting

In [1]:
%matplotlib inline
from fbprophet import Prophet
import numpy as np
import pandas as pd


In [2]:
%%time
# Resampling data from minute interval to day
bit_df = pd.read_csv('../data/coinbaseUSD_1-min_data_2014-12-01_to_2018-01-08.csv')
# Convert unix time to datetime
bit_df['date'] = pd.to_datetime(bit_df.Timestamp, unit='s')
# Reset index
bit_df = bit_df.set_index('date')
# Rename columns so easier to code
bit_df = bit_df.rename(columns={'Open':'open', 'High': 'hi', 'Low': 'lo', 
                       'Close': 'close', 'Volume_(BTC)': 'vol_btc',
                       'Volume_(Currency)': 'vol_cur', 
                       'Weighted_Price': 'wp', 'Timestamp': 'ts'})
# Resample and only use recent samples that aren't missing
bit_df = bit_df.resample('d').agg({'open': 'first', 'hi': 'max', 
    'lo': 'min', 'close': 'last', 'vol_btc': 'sum',
    'vol_cur': 'sum', 'wp': 'mean', 'ts': 'min'}).iloc[-1000:]
# drop last row as it is not complete
bit_df = bit_df.iloc[:-1]

CPU times: user 2.65 s, sys: 538 ms, total: 3.19 s
Wall time: 4 s


In [None]:
# needs ds and y columns
ts = (bit_df
    .reset_index()
    .rename(columns={'date': 'ds', 'close': 'y'})
[['ds', 'y']]
)

In [None]:
ts.dtypes


In [None]:
ts.set_index('ds').plot(figsize=(14,10))

In [None]:
m = Prophet(daily_seasonality=True)
m.fit(ts)

In [None]:
# Make a future object and predict into it
future = m.make_future_dataframe(periods=24)
forecast = m.predict(future)
forecast

In [None]:
forecast.T

In [None]:
# plot the prediction, include the uncertainty lines
ax = m.plot(forecast, uncertainty=True)

In [None]:
# look at the trend, yearly, weekly and daily componentsb
ax = m.plot_components(forecast)

## Exercise - Snow Data

* Use prophet to predict 100 days in the future of Snow Depth (SNWD) 
* What month has the most snow

Data from https://www.ncdc.noaa.gov/cdo-web/search

Data at ``../data/snow-alta-1990-2017.csv``

Documentation - https://www1.ncdc.noaa.gov/pub/data/cdo/documentation/GHCND_documentation.pdf


* STATION_NAME (max 50 characters) is the (usually city/airport name). Optional
output field.
* STATION - 17 characters) is the station identification code. Please see
http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
* NAME - name of the station
* LATITUDE
* LONGITUDE
* ELEVATION - meters
* DATE - YYYY-MM-DD
* DAPR - Number of days included in the multiday precipitation total (MDPR)
* DAPR_ATTRIBUTES
* DASF - Number of days included in the multiday snowfall total (MDSF)
* DASF_ATTRIBUTES 
* MDPR -  Multiday precipitation total (mm or inches as per user preference; use with DAPR and DWPR, if
available)
* MDPR_ATTRIBUTES
* MDSF - Multiday snowfall total (mm or inches as per user preference)
* MDSF_ATTRIBUTES
* PRCP - Precipitation (mm or inches as per user preference, inches to hundredths on Daily Form pdf file)
* PRCP_ATTRIBUTES 
* SNOW -  Snowfall (mm or inches as per user preference, inches to tenths on Daily Form pdf file)
* SNOW_ATTRIBUTES
* SNWD -  Snow depth (mm or inches as per user preference, inches on Daily Form pdf file)
* SNWD_ATTRIBUTES
* TMAX - Maximum temperature (Fahrenheit or Celsius as per user preference, Fahrenheit to tenths on
Daily Form pdf file
* TMAX_ATTRIBUTES 
* TMIN - Minimum temperature (Fahrenheit or Celsius as per user preference, Fahrenheit to tenths on
Daily Form pdf file
* TMIN_ATTRIBUTES
* TOBS - Temperature at the time of observation (Fahrenheit or Celsius as per user preference)
* TOBS_ATTRIBUTES
* WT01 - Fog, ice fog, or freezing fog (may include heavy fog)
* WT01_ATTRIBUTES
* WT03 - Thunder
* WT03_ATTRIBUTES
* WT04 - Ice pellets, sleet, snow pellets, or small hail
* WT04_ATTRIBUTES
* WT05 -  Hail (may include small hail)
* WT05_ATTRIBUTES
* WT06 - Glaze or rime
* WT06_ATTRIBUTES
* WT11 -  High or damaging winds
* WT11_ATTRIBUTES

## Try using log of data

Prediction may work better if we tweak the data. In this case let's try taking the log of the bitcoin price

In [None]:
ts2 = ts.assign(y=lambda x: np.log(x.y))
ts2.set_index('ds').plot()

In [None]:
m2 = Prophet() #dont need daily_seasonality=True)
m2.fit(ts2)
future2 = m2.make_future_dataframe(periods=24)
forecast2 = m2.predict(future2)


In [None]:
# plot the prediction, include the uncertainty lines
ax = m2.plot(forecast2, uncertainty=True)

In [None]:
ax = m2.plot_components(forecast2)

## Exercise - Log of Time Series

* Run the snow calculation using the log of the snow depth. Does it track better? (Hint: might need to add 1 before logging)