# Exploratory Data Analysis

Before we start training models and running inference, let's take a brief look in the contents of the dataset, containing telemetry from moontracers deployed all around the world.

Data files have been aggregated and anonymized for your security and convenience.

## Loading and exploring data with Pandas

The pandas library offers tools to effectively load, manipulate and analyse data.  

In [None]:
!pip install --upgrade pip

In [None]:
!pip install --upgrade pandas

In [None]:
import pandas as pd

A pandas DataFrame is a table-like data structure used to store and manipulate data in memory.
Let's create one from a CSV file on disk.

In [None]:
filename = "moontracer-dataset.csv.gz"
data = pd.read_csv(filename, 
                   parse_dates = ["timestamp"], 
                   dtype = {"device_id":"string"}, 
                   index_col = "timestamp")
data.info()

Let's look at the first records to see if data as lodad correctly:

In [None]:
data.head()

And basic statistics for each column:

In [None]:
data.describe()

Let's take a loook at the data count per device:

In [None]:
data.groupby("device_id").count()

In [None]:
%store data

# Amazon Web Services

In this workshop we'll use AWS services to store, process and visualize data. Let's ensure we can use the AWS command line interface by creating a S3 bucket and uploading our dataset to it.

In [None]:
import random
import string 

rand_id = ''.join(random.choices(string.ascii_lowercase + string.digits, k=8))
%store rand_id
bucket = "mt-ml-workshop-{}".format(rand_id)
%store bucket
bucket

In [None]:
!aws s3 mb "s3://{bucket}"

In [None]:
!aws s3 cp "moontracer-dataset.csv.gz" "s3://{bucket}/"

In [None]:
!aws s3 ls "s3://{bucket}"

# Battery Forecasting

Now that we understand the dataset, let's use Amazon Sagemaker and the DeepAR Algorithm to [prevent battery outages](mt-battery-deepar.ipynb).