# Exploring Time Series

## Time Series Manipulation using Pandas

In [2]:
# Creating a date range with hourly frequency

import pandas as pd
from datetime import datetime
import numpy as np

date_rng = pd.date_range(start='1/1/2018', end='1/08/2018', freq='H')

In [3]:
date_rng

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
               '2018-01-01 02:00:00', '2018-01-01 03:00:00',
               '2018-01-01 04:00:00', '2018-01-01 05:00:00',
               '2018-01-01 06:00:00', '2018-01-01 07:00:00',
               '2018-01-01 08:00:00', '2018-01-01 09:00:00',
               ...
               '2018-01-07 15:00:00', '2018-01-07 16:00:00',
               '2018-01-07 17:00:00', '2018-01-07 18:00:00',
               '2018-01-07 19:00:00', '2018-01-07 20:00:00',
               '2018-01-07 21:00:00', '2018-01-07 22:00:00',
               '2018-01-07 23:00:00', '2018-01-08 00:00:00'],
              dtype='datetime64[ns]', length=169, freq='H')

In [4]:
type(date_rng[0])

pandas._libs.tslibs.timestamps.Timestamp

Now let´s create an example dataframe with the timestamp data we just created

In [5]:
df = pd.DataFrame(date_rng, columns=['date'])
df['data'] = np.random.randint(0,100,size=(len(date_rng)))

df.head()

Unnamed: 0,date,data
0,2018-01-01 00:00:00,23
1,2018-01-01 01:00:00,73
2,2018-01-01 02:00:00,3
3,2018-01-01 03:00:00,10
4,2018-01-01 04:00:00,44


If we want to do time series manipulation, we’ll need to have a date time index so that our data frame is indexed on the timestamp.

In [6]:
#Convert the dataframe index to a datetime index 

df['datetime'] = pd.to_datetime(df['date'])
df = df.set_index('datetime')
df.drop(['date'], axis=1, inplace=True)
df.head()


Unnamed: 0_level_0,data
datetime,Unnamed: 1_level_1
2018-01-01 00:00:00,23
2018-01-01 01:00:00,73
2018-01-01 02:00:00,3
2018-01-01 03:00:00,10
2018-01-01 04:00:00,44


In [7]:
# Filter data with only day 2.

df[df.index.day == 2]

Unnamed: 0_level_0,data
datetime,Unnamed: 1_level_1
2018-01-02 00:00:00,27
2018-01-02 01:00:00,80
2018-01-02 02:00:00,9
2018-01-02 03:00:00,67
2018-01-02 04:00:00,26
2018-01-02 05:00:00,13
2018-01-02 06:00:00,22
2018-01-02 07:00:00,68
2018-01-02 08:00:00,48
2018-01-02 09:00:00,75


In [8]:
# Filtering data between two dates

df['2018-01-04':'2018-01-06']

Unnamed: 0_level_0,data
datetime,Unnamed: 1_level_1
2018-01-04 00:00:00,61
2018-01-04 01:00:00,62
2018-01-04 02:00:00,24
2018-01-04 03:00:00,76
2018-01-04 04:00:00,41
...,...
2018-01-06 19:00:00,44
2018-01-06 20:00:00,14
2018-01-06 21:00:00,45
2018-01-06 22:00:00,45


We could take the min, max, average, sum, etc., of the data at a daily frequency instead of an hourly frequency as per the example below where we compute the daily average of the data:

In [9]:
df.resample('D').mean()

Unnamed: 0_level_0,data
datetime,Unnamed: 1_level_1
2018-01-01,45.083333
2018-01-02,46.833333
2018-01-03,48.333333
2018-01-04,52.625
2018-01-05,47.791667
2018-01-06,37.333333
2018-01-07,45.666667
2018-01-08,68.0


In [10]:
df['rolling_sum'] = df.rolling(3).sum()
df.head(10)

Unnamed: 0_level_0,data,rolling_sum
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-01-01 00:00:00,23,
2018-01-01 01:00:00,73,
2018-01-01 02:00:00,3,99.0
2018-01-01 03:00:00,10,86.0
2018-01-01 04:00:00,44,57.0
2018-01-01 05:00:00,9,63.0
2018-01-01 06:00:00,73,126.0
2018-01-01 07:00:00,83,165.0
2018-01-01 08:00:00,4,160.0
2018-01-01 09:00:00,95,182.0


It only starts having valid values when there are three periods over which to look back.

This is a good chance to see how we can do forward or backfilling of data when working with missing data values.

In [11]:
df['rolling_sum_backfilled'] = df['rolling_sum'].fillna(method='backfill')
df.head()

Unnamed: 0_level_0,data,rolling_sum,rolling_sum_backfilled
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-01-01 00:00:00,23,,99.0
2018-01-01 01:00:00,73,,99.0
2018-01-01 02:00:00,3,99.0,99.0
2018-01-01 03:00:00,10,86.0,86.0
2018-01-01 04:00:00,44,57.0,57.0


It’s often useful to be able to fill your missing data with realistic values such as the average of a time period, but always remember that if you are working with a time series problem and want your data to be realistic, you should not do a backfill of your data.

When working with time series data, you may come across time values that are in Unix time. Unix time, also called Epoch time is the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.

**How to convert epoch time to real time?**


In [12]:
epoch_t = 1529272655
real_t = pd.to_datetime(epoch_t, unit='s')
real_t

Timestamp('2018-06-17 21:57:35')

In [13]:
# Now, let's convert it to Pacific time

real_t.tz_localize('UTC').tz_convert('US/Pacific')

Timestamp('2018-06-17 14:57:35-0700', tz='US/Pacific')

## Use case:

In the following example we will only take in data from a uni-variate time series. That means we really are only considering the relationship between the y-axis value the x-axis time points. We’re not considering outside factors that may be effecting the time series.

A common mistake beginners make is they immediately start to apply ARIMA forecasting models to data that has many outside factors.

In [16]:
import pandas as pd
data = pd.read_csv('../assets/electric_production.csv', index_col=0)
data.head()

Unnamed: 0_level_0,IPG2211A2N
DATE,Unnamed: 1_level_1
1939-01-01,3.3335
1939-02-01,3.359
1939-03-01,3.4353
1939-04-01,3.4607
1939-05-01,3.4607


In [21]:
data.index = pd.to_datetime(data.index)

In [22]:
data.columns = ['Energy Production']

In [31]:
import chart_studio.plotly
import cufflinks as cf
data.iplot(title="Energy Production Jan 1985--Jan 2018")

KeyboardInterrupt: 

In [None]:
from chart_studio.plotly import plot_mpl
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(data, model="multiplicative")
fig = result.plot()
plot_mpl(fig)

In [None]:
from pyramid.arima import auto_arima
stepwise_model = auto_arima(data, start_p=1, start_q=1,
                           max_p=3, max_q=3, m=12,
                           start_P=0, seasonal=True,
                           d=1, D=1, trace=True,
                           error_action='ignore',  
                           suppress_warnings=True, 
                           stepwise=True)
print(stepwise_model.aic())

In [None]:
train = data.loc['1985-01-01':'2016-12-01']
test = data.loc['2017-01-01':]

In [None]:
stepwise_model.fit(train)

In [None]:
future_forecast = stepwise_model.predict(n_periods=37)

In [None]:
future_forecast = pd.DataFrame(future_forecast,index = test.index,columns=[‘Prediction’])
pd.concat([test,future_forecast],axis=1).iplot()

In [None]:
pd.concat([data,future_forecast],axis=1).iplot()

Source: 

https://towardsdatascience.com/how-to-forecast-time-series-with-multiple-seasonalities-23c77152347e

https://medium.com/@josemarcialportilla/using-python-and-auto-arima-to-forecast-seasonal-time-series-90877adff03c

https://towardsdatascience.com/time-series-analysis-in-python-an-introduction-70d5a5b1d52a

https://github.com/WillKoehrsen/Data-Analysis/tree/master/additive_models

https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b

https://towardsdatascience.com/basic-time-series-manipulation-with-pandas-4432afee64ea