# Data in a day


## Import libraries

In [None]:
# requests is for making internet requests (http://docs.python-requests.org/en/master/)
import requests

# numpy is for maths (http://www.numpy.org/)
import numpy as np

# 🐼 is to work with tables of data (http://pandas.pydata.org/)
import pandas as pd

# sklearn is for machine learning (http://scikit-learn.org)
from sklearn import linear_model

# MIGHT REMOVE THIS AS PANDAS PROVIDES A MORE CONVENIENT INTERFACE FOR PLOTTING
# # matplotlib is to make plots
# import matplotlib.pyplot as plt

# matplotlib is to make plots, pandas using it under the hood
# Display plots in this page rather than open another page
%matplotlib inline

## Source the data

Use the firebase API to get the ball dropping data and then store it

In [None]:
response = requests.get('https://newton-decoded.firebaseio.com/falls.json')
json_data = response.text

## Explore and transform the data

Let's have a look at what comes back from the API - it doesn't look very friendly to work with.

In [None]:
json_data

Luckily for us pandas knows how to read JSON and turn it into something more table like - this is called a **dataframe**

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
df = pd.read_json(json_data,orient='index')

Let's have a look at the first few rows of our transformed data

In [None]:
df.head()

This is better, but ideally we'd like to:
- rename the labels from x and y to something more meaningful i.e. height and time^2
- reset the ball drop id's (e.g. -L7ZzkuY0BKkDcQO2BWN) to 1,2,3... etc

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html
df= df.rename(columns={'x':'height','y':'time^2'})
df.head()

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html
df = df.reset_index(drop=True)
df.head()

Perfect! Now let's have a look at some summary stats for our data

In [None]:
df.describe()

Now it's time to visualise the data

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
df.plot.scatter(x='height',y='time^2', xlim=[0,3], ylim=[0,1])

## Building a model

In [None]:
# http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
model = linear_model.LinearRegression()
model.fit(df[['height']],df[['time^2']])

## Evaluate your model

Now we've built a model we can make predictions. We'll add the results to a new column of our dataframe for convenience

In [None]:
df['prediction'] = model.predict(df[['height']])
df.head()

Looking at tables of numbers is not great for humans. Eyeballing the model is a great to get a feel for what we have built

In [None]:
axis1 = df.plot.scatter(x='height',y='time^2', xlim=[0,3], ylim=[0,1])
df.plot.line(x='height',y='prediction', xlim=[0,3], ylim=[0,1], ax=axis1)

We can also be quantitative when we evaluate our predictions by looking at the error between our predictions and our model - using the R-Squared value, 1 is perfect, 0 is terrible.

In [None]:
# http://scikit-learn.org/stable/modules/model_evaluation.html#r2-score-the-coefficient-of-determination
model.score(df[['height']],df[['time^2']])