Using saas.csv or log data from API usage:

- Split data (train/test) and resample by any period, except daily, and aggregate using the sum.
- Forecast, plot and evaluate using each of the 4 parametric based methods we discussed:
  - Simple Average
  - Moving Average
  - Holt's Linear Trend Model
  - Based on previous year/month/etc., this is up to you.

In [182]:
import numpy as np
import pandas as pd

from statsmodels.tsa.api import Holt

In [155]:
df = pd.read_csv("/Users/cris/codeup-data-science/ds-methodologies-exercises/time_series/saas.csv")

In [156]:
df["Month_Invoiced"] = pd.to_datetime(df["Month_Invoiced"])
df = df.set_index(df["Month_Invoiced"])

In [157]:
df = df.resample("MS").sum()

In [158]:
#getting the length of train and test data
train_ratio = int(df.shape[0] * 0.7)
test_ratio = int(df.shape[0] * 0.3)

In [159]:
train = df[:train_ratio+1]
test = df[train_ratio+1:]

In [160]:
test.head()

Unnamed: 0_level_0,Customer_Id,Invoice_Id,Subscription_Type,Amount
Month_Invoiced,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2016-11-01,15269114685,84209779594,10675.0,53375.0
2016-12-01,15158997779,83812629131,10664.0,53320.0
2017-01-01,15061820113,83483733340,10679.0,53395.0
2017-02-01,14963781540,83144826839,10688.0,53440.0
2017-03-01,14888950475,82931777530,10696.0,53480.0


### Simple Average

In [167]:
forecast = pd.DataFrame(test["Amount"],index=test.index,columns=["actual"])
forecast["actual"] = test["Amount"]
forecast["simple_average"] = forecast.actual.mean() #average of the summation of values in test

### Moving Average

In [179]:
moving_ave = train["Amount"].rolling(7).mean().iloc[-1]
forecast["moving_average"] = moving_ave

In [180]:
forecast.head()

Unnamed: 0_level_0,actual,simple_average,moving_average
Month_Invoiced,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-11-01,53375.0,53574.642857,52678.571429
2016-12-01,53320.0,53574.642857,52678.571429
2017-01-01,53395.0,53574.642857,52678.571429
2017-02-01,53440.0,53574.642857,52678.571429
2017-03-01,53480.0,53574.642857,52678.571429


### Holt's Linear Model

In [184]:
model = Holt(train["Amount"]).fit(smoothing_level = .3,smoothing_slope=.1)

In [186]:
forecast["holt_linear"] = model.forecast(test.shape[0])

### Previous Cycle (year by year, on every month)

In [214]:
df.iloc['2016-11':'2017-12']

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.datetimes.DatetimeIndex'> with these indexers [2016-11] of <class 'str'>

In [208]:
predicted

Int64Index([11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int64', name='Month_Invoiced')

In [201]:
pred = train["2014"] + train.diff(365).mean()

In [213]:
forecast.tail()

Unnamed: 0_level_0,actual,simple_average,moving_average,holt_linear,prev_year
Month_Invoiced,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-08-01,53820.0,53574.642857,52678.571429,56288.876775,
2017-09-01,53925.0,53574.642857,52678.571429,56557.493618,
2017-10-01,53850.0,53574.642857,52678.571429,56826.110462,
2017-11-01,53860.0,53574.642857,52678.571429,57094.727305,
2017-12-01,53805.0,53574.642857,52678.571429,57363.344149,
