#  Barclays x GA: Python Day 5 - Projects

---

<a id="learning-objectives"></a>
## Learning Objectives
*After completing this notebook, you will be able to:*

- Work with time series
- Use APIs to get data from the web

<a id="time-series"></a>

## <font color='blue'> Time series data

A **time series** is a series of data points that's indexed (or listed, or graphed) in time order. Most commonly, a time series is a sequence that's taken at successive equally spaced points in time. Time series are often represented as a set of observations that have a time-bound relation, which is represented as an index.

Time series are commonly found in sales, analysis, stock market trends, economic phenomena, and social science problems.

These data sets are often investigated to evaluate the long-term trends, forecast the future, or perform some other form of analysis.


Let's take a look at some IBM stock data to get a feel for what time series data look like.

In [None]:
import pandas as pd
from datetime import timedelta
import matplotlib.pyplot as plt

ibm = pd.read_csv("data/intraday_1min_IBM.csv")

Take a high-level look at the data. What are we looking at?

In [None]:
ibm.head()

Use `describe` to get summary statistics for each column.

In [None]:
ibm.describe()

Let's inspect the column types in our dataset.

In [None]:
ibm.dtypes

We can see the `timestamp` column is being treated like a string. It would be better if Python treated that column like a `date-time` column, so we could perform calculations like 

* Sorting share prices by date
* Filtering the data by date
* Working out rolling averages over time


We can convert columns to the `datetime` type using the `to_datetime()` function.

In [None]:
ibm['timestamp'] = pd.to_datetime(ibm['timestamp'])

Now when we check our column types, we can see the `timestamp` column has been converted into a `datetime` column.

In [None]:
ibm.dtypes

We can set the index of the `DataFrame` to be our `timestamp` column.

In [None]:
ibm.head()

In [None]:
ibm.index=ibm['timestamp']

Now, instead of accessing the rows of our data by the index number, we can access them using the timestamps!

In [None]:
ibm.head()

Once our data is converted, we can create line plots

In [None]:
ibm[['open']].plot(rot=90,figsize=(16,8));

<a id="numbers"></a>

# <font color='blue'> APIs 

Let's start by importing the **requests** library, which we'll be using to make API requests

In [None]:
import requests

Let's make a request to the astronauts API and view the resulting JSON. The first thing we do is make a GET request. This is really simple!

In [None]:
astro_request = requests.get('http://api.open-notify.org/astros.json')

The thing we get back from a GET request is a `request` object.

In [None]:
type(astro_request)

This is an object that has a few different bits of information bundled up inside it, all of which have been sent back to us by the servers at `open-notify.org`, including...

The status code, which tells us whether the request was successful or not. A status code of `200` means the request was a success, whereas a status code of `400` means there was an error. You might remember seeing `404: error` messages in your browser when you try to load a webpage that doesn't exist- that's also an example of a status code! 

We can check the status code like this:

In [None]:
astro_request.status_code

We can also access the JSON that's returned by the API; this is also bundled up inside our `request` object.

In [None]:
astro_request.json()

Let's create a variable that contains the JSON only.

In [None]:
astro_json = astro_request.json()
astro_json

Let's check it's type- it's a dictionary!

In [None]:
type(astro_json)

Now we can use our dictionary and list-indexing skills to access information inside the JSON.

In [None]:
astro_json['people']

In [None]:
astro_json['people'][0]

In [None]:
astro_json['people'][0]['name']

---
## <font color='red'> Exercise: Microsoft share prices

Now let's make an API request to AlphaVantage. Fill in the gap in the code below to make a GET request to AlphaVantage.

In [None]:
alphavantage_request = requests.get(#FILL THIS IN)
alphavantage_request


Now, check the status code of the request:

In [None]:
# FILL THIS IN 

Next, create a variable that contains the JSON returned by AlphaVantage.

In [None]:
alphavantage_json = # FILL THIS IN 

We can see that the JSON consists of two top-level keys: `Meta Data` which provides summary information about the data, and `Time Series (5min)` which provides the actual share price information we're after.

Use the correct key to retrieve `Time Series (5min)` from our `alphavantage_json` dictionary

In [None]:
alphavantage_json = alphavantage_json[#FILL THIS IN]
alphavantage_json


Working with JSON and dictionaries is great because it's very standardised, but it's not a very pretty data format to work with. 

Ideally we want a way of working with data in Python that's as visually nice to work with as a table in Excel but less useless. 

This is where the ```pandas``` library comes in. This is the most widely used library for cleaning (sometimes called ***parsing*** or ***wrangling***) data in Python. 

```pandas``` introduces some new types, the most important of which is the ```DataFrame```.

You can think of a ```DataFrame``` as being like a Python version of an Excel table. It's a way of storing data that lets us easily manipulate, clean and perform calculations with our data. 

Let's take a look at how easy it is to convert JSON/dictionaries into a ```DataFrame```.

In [None]:
import pandas as pd

In [None]:
share_df = pd.DataFrame(alphavantage_json)
share_df.head()

We'll be learning a lot more about Pandas, but for now it's enough to know what a ```DataFrame``` is. The ```head()``` function shows us a preview of the first five rows of a ```DataFrame```.

One last operation we perform on the ```DataFrame``` to flip the columns and rows around (it's more intuitive to have time along the vertical axis and features along the horizontal axis) is apply the ```transpose()``` method.

In [None]:
share_df.transpose()

---

## <font color='red'> Exercise: Google share prices
    
Now you're going to sign up for an API key with AlphaVantage to give you more flexibility to make different API requests.

First, **sign up for an API key** with AlphaVantage. Remember, you should treat an API key like a password, i.e. **never share or give it to anyone**.

Now, read the AlphaVantage documentation for the **Weekly** share price endpoint here: https://www.alphavantage.co/documentation/#weekly. 

Read about the required and optional API paramaters, and try out the demo API request in your browser by clicking on this link: https://www.alphavantage.co/query?function=TIME_SERIES_WEEKLY&symbol=IBM&apikey=demo

* Which company's share prices is this API request getting? 


* How would you change this API request to get data about Google's weekly share prices?


* Plug the symbol for Google **and your API key** into the demo URL above, to make a URL that will get you data for Google's weekly share prices. Click on the URL and view it in your browser to make sure it's working as expected.


* Then, fill in the code below to make the same API request in Python

In [None]:
google_share_price_url = # FILL THIS IN


In [None]:
google_request = requests.get(google_share_price_url)

Check the status code

Create a variable that contains the JSON only

Use your dictionary skills to extract the time series data from the JSON, i.e. get rid of the metadata.

Convert the results to a `pandas DataFrame` and switch the rows and columns around.

Preview the first 5 rows of the `DataFrame`