# Prerequisite: Set-up S3 bucket

In this notebook, we are going to use the provided e-commerce dataset to demonstrate the functionality of Amazon Lookout For Metrics.

Data set-up workflow:
1. Create bucket
2. Uncompress dataset
3. Save data to bucket

## Import libraries

In [None]:
import os
import shutil
import zipfile
import pathlib

import pandas as pd

import boto3
import utility

### Create bucket

As mentioned above, data needs to exist somewhere. Let's run the next cell to create a bucket for you to use.

In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
s3_bucket = account_id + "-lookoutmetricscf"

region = "us-west-2"
utility.create_bucket(s3_bucket, region=region)

s3_bucket

### Uncompress dataset

Let's uncompress the provided e-commerce data.

In [None]:
data_dirname = os.path.join("./data")

if os.path.exists(data_dirname):
    shutil.rmtree(data_dirname)
os.makedirs(data_dirname)

zip_filename = os.path.join("./ecommerce.zip")

with zipfile.ZipFile( zip_filename, "r" ) as zip_fd:
    zip_fd.extractall(data_dirname)

Before we proceed to the next step let's take a quick look at the folder structure. Specifically, notice that we only have one `input.csv` in the `backtest`. Whereas, data in the `live` folder is broken down into days (ex: `20201001` for October 1, 2020) and hours (ex: `0200` for 2:00AM)

Also, notice that our data goes far into the future. Of course, this is unrealistic of any real-world scenarios but it works for this demontration.

In [None]:
paths = utility.DisplayablePath.make_tree(pathlib.Path('data'))
for path in paths:
    print(path.displayable())

Now when you take a quick look into the data, you will notice the schema for `backtest` and `live` data is identical. 

In [None]:
backtest_df = pd.read_csv('data/ecommerce/backtest/input.csv')
backtest_df.head()

In [None]:
live_sample_df = pd.read_csv('data/ecommerce/live/20201208/0000/20201208_000000.csv')
live_sample_df.head()

### Save data to bucket

Finally, let save the data into to our s3 bucket.

In [None]:
!aws s3 sync {data_dirname}/ecommerce/ s3://{s3_bucket}/ecommerce/

To make things easier on yourself we are going to leverage the magic functions of Ipython in order to save a few variables for later.

In [None]:
%store s3_bucket