# Time-Series: Prepare Exercises
### Kwame V. Taylor
The end result of this exercise should be a file named ```prepare.py```.

Using your store items data:

1. Convert date column to datetime format.
2. Plot the distribution of sale_amount and item_price.
3. Set the index to be the datetime variable.
4. Add a 'month' and 'day of week' column to your dataframe.
5. Add a column to your dataframe, sales_total, which is a derived from sale_amount (total items) and item_price.
6. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions and be able to re-run the functions and get the same results.

Using the OPS data acquired in the Acquire exercises ```opsd_germany_daily.csv```, complete the following:

1. Convert date column to datetime format.
2. Plot the distribution of each of your variables.
3. Set the index to be the datetime variable.
4. Add a month and a year column to your dataframe.
5. Fill any missing values.
6. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions and be able to re-run the functions and get the same results.

In [1]:
# imports
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from time import strftime

# default viz settings
plt.rc('figure', figsize=(10, 8))
plt.rc('font', size=14)
plt.rc('lines', linewidth=2, c='m')

from acquire import get_opsd_germany, get_df, merge_items_stores_sales

## Stores items data

In [None]:
# acquire the data
items = get_df('items')
stores = get_df('stores')
sales = get_df('sales')
items.head(2)

In [None]:
df = merge_items_stores_sales(sales, stores, items)
df.head(5)

#### Convert date column to datetime format.

In [None]:
strftime('%a, %d, %b %Y %H:%M:%S %Z')

In [None]:
df.sale_date = pd.to_datetime(df.sale_date, format='%a, %d %b %Y %H:%M:%S %Z')
df.dtypes

#### Plot the distribution of sale_amount and item_price.

In [None]:
df[['sale_amount']].plot.hist()

In [None]:
df[['item_price']].plot.hist(bins=15)

#### Set the index to be the datetime variable.

In [None]:
df = df.set_index('sale_date').sort_index()
df.head(2)

#### Add a 'month' and 'day of week' column to your dataframe.

In [None]:
df['month'] = df.index.month
df['weekday'] = df.index.day_name()

df.head(3)

#### Add a column to your dataframe, sales_total, which is a derived from sale_amount (total items) and item_price.

In [None]:
df['sales_total'] = df.sale_amount * df.item_price
df.head(3)

#### Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions and be able to re-run the functions and get the same results.

I'll now put this code into a function in my prepare.py file for time-series exercises.

## OPS Germany data

In [None]:
# acquire the data
df = get_opsd_germany()
df.head(5)

#### Convert date column to datetime format.

In [None]:
strftime('%a, %d, %b %Y %H:%M:%S %Z')

In [None]:
strftime('%Y-%m-%d')

In [None]:
df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
df.dtypes

#### Plot the distribution of each of your variables.

In [None]:
df.plot.hist()

#### Set the index to be the datetime variable.

In [None]:
df = df.set_index('sale_date').sort_index()
df.head(2)

#### Add a month and a year column to your dataframe.

In [None]:
df['month'] = df.index.month
df['year'] = df.index.year

df.head(3)

#### Fill any missing values.

In [None]:
df.isna().sum()

In [None]:
df.fillna(0)

#### Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions and be able to re-run the functions and get the same results.

I'll put this code into a function in my prepare.py file as well.