# Exercise 1: Data Exploration

Use the code below to explore the raw timeseries data. Come up with 5 observations that might help inform an algorithm that you build using this data.

## Imports

In [None]:
import os

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

## Load Data

In [None]:
data_dir = 'data'
filenames = [os.path.splitext(f)[0] for f in sorted(os.listdir(data_dir))]
fs = 256

In [None]:
data = []
for f in filenames:
    subject = f.split('_')[0]
    activity = f.split('_')[1]
    path = os.path.join(data_dir, f + '.csv')
    df = pd.read_csv(path)
    df = df.loc[: df.last_valid_index()]
    data.append((subject, activity, df))

## Offline

Working in an offline notebook on your local machine is probably easier and faster for this exercise. See the instructions in the introductory lesson for this course to get started.

Use the plotting code below to visualize the data.

### Pick a backend

You can only pick one matplotlib backend so delete two lines of code from the cell below before running it.

In [None]:
# Use this backend if you are on MacOS
%matplotlib osx  
# Use this backend if you are not. 
%matplotlib qt   
# A third backend to try if the ones above don't work. 
%matplotlib tk   

### Sequentially Plot the Data

You can interact with the plots with your mouse. Press any key on the keyboard to go to the next plot.

In [None]:
for subject, activity, df in sorted(data, key=lambda x: x[1]):
    ts = np.arange(len(df)) / fs
    plt.clf()
    plt.plot(ts, df.accx, label='x')
    plt.plot(ts, df.accy, label='y')
    plt.plot(ts, df.accz, label='z')
    plt.title('{}_{}'.format(subject, activity))
    plt.legend()
    plt.ylim((-25, 25))
    plt.draw()
    while not plt.waitforbuttonpress():
        pass

## Inside the Udacity workspace

Inside a VM you won't be able to open a new window so you have to plot the data inline. It's a lot of data to plot interactively so you may have to be patient. After examining the data from one activity class it's a good idea to clear that cell's output so you free up memory in the notebook. Click on the cell with the plots, then in the top menu `Cell` > `Current Outputs` > `Clear`

In [None]:
%matplotlib inline

`mpld3` will allow you to interact with the plots but if you run the following line of code the workspace could crash while generating the graphs. It is **highly** suggested you run the following data **without** `mpld3` and if you are interested in a particular graph to enable this and generate the graph individually. 

In [None]:
import mpld3
mpld3.enable_notebook()

#### Plot biking data

In [None]:
for subject, activity, df in data:
    if activity != 'bike':
        continue
    ts = np.arange(len(df)) / fs
    plt.figure(figsize=(12, 8))
    plt.plot(ts, df.accx, label='x')
    plt.plot(ts, df.accy, label='y')
    plt.plot(ts, df.accz, label='z')
    plt.title('{}_{}'.format(subject, activity))
    plt.legend()
    plt.ylim((-25, 25))
    plt.draw() 

#### Plot running data

In [None]:
for subject, activity, df in sorted(data, key=lambda x: x[1]):
    if activity != 'run':
        continue
    ts = np.arange(len(df)) / fs
    plt.figure(figsize=(12, 8))
    plt.plot(ts, df.accx, label='x')
    plt.plot(ts, df.accy, label='y')
    plt.plot(ts, df.accz, label='z')
    plt.title('{}_{}'.format(subject, activity))
    plt.legend()
    plt.ylim((-25, 25))
    plt.draw()

#### Plot walking data

In [None]:
for subject, activity, df in sorted(data, key=lambda x: x[1]):
    if activity != 'walk':
        continue
    ts = np.arange(len(df)) / fs
    plt.figure(figsize=(12, 8))  
    plt.plot(ts, df.accx, label='x')
    plt.plot(ts, df.accy, label='y')
    plt.plot(ts, df.accz, label='z')
    plt.title('{}_{}'.format(subject, activity))
    plt.legend()
    plt.ylim((-25, 25))
    plt.draw()

## Observations

What do you notice about the data that might be helpful when we start building a classifier?