<font size="+3"><strong>1. Getting data from APIs</strong></font>

In [None]:
import pandas as pd
import requests

# Accessing APIs Through a URL

In this notebook, we'll extract stock market information from the [AlphaVantage](https://alphavantage.co/) API.

In [None]:
%%capture
# Get our API key from our env file
!pip install python-dotenv
import os
from dotenv import load_dotenv

# Load variables from env file
load_dotenv('env')

# Use variables
ALPHA_API_KEY = os.getenv('ALPHA_API_KEY')

Now we need to import API key into our code base. This is commonly done by creating a `config` module and then import the `settings` variable from it.

In [None]:
# Import settings
from config import settings

# Use `dir` to list attributes
dir(settings)

['Config',
 '__abstractmethods__',
 '__annotations__',
 '__class__',
 '__class_vars__',
 '__config__',
 '__custom_root_type__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__exclude_fields__',
 '__fields__',
 '__fields_set__',
 '__format__',
 '__ge__',
 '__get_validators__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__include_fields__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__json_encoder__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__post_root_validators__',
 '__pre_root_validators__',
 '__pretty__',
 '__private_attributes__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__repr_args__',
 '__repr_name__',
 '__repr_str__',
 '__rich_repr__',
 '__schema_cache__',
 '__setattr__',
 '__setstate__',
 '__signature__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__try_update_forward_refs__',
 '__validators__',
 '_abc_impl',
 '_build_values',
 '_calculate_keys',
 '_copy_and_set_values',
 '_decompo

Beautiful! We have an API key. We need to create a new URL for `"TSCO.LON"` using the base URL `"https://www.alphavantage.co/query?"` and incorporate our API key.

In [None]:
url = (
   "https://www.alphavantage.co/query?"
   "function=TIME_SERIES_DAILY&"
   "symbol=TSCO.LON&"
   "outputsize=full&"
   f"apikey={settings.alpha_api_key}"
)
print("url type:", type(url))
url

Turns out there are a lot more parameters in the documentation for the [AlphaVantage Time Series Daily API](https://www.alphavantage.co/documentation/#daily). Expand our URL to incorporate all the parameters listed in the documentation. Also, to make our URL more dynamic, we will create variable names for all the parameters that can be added to the URL.

In [None]:
ticker = "TSCO.LON"
output_size = "compact"
data_type = "json"

url = (
    "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    f"symbol={ticker}&"
    f"outputsize={output_size}&"
    f"datatype={data_type}&"
    f"apikey={settings.alpha_api_key}"
)
print("url type:", type(url))
url

# Accessing APIs Through a Request

We've seen how to access the AlphaVantage API by clicking on a URL, but this won't work for the application we're building in this project because only humans click URLs. Computer programs access APIs by making **requests**. Let's build our first request using the URL we created in the previous code.

In [None]:
response = requests.get(url=url)

print("response type:", type(response))

response type: <class 'requests.models.Response'>


That tells us what kind of response we've gotten, but it doesn't tell us anything about what it means. If we want to find out what kinds of data are actually *in* the response, we'll need to use the `dir` command.

In [None]:
# Use `dir` on our `response`
dir(response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

`dir` returns a list, and, as you can see, there are lots of possibilities here! For now, let's focus on two attributes: `status_code` and `text`.

We'll start with `status_code`. Every time we make a call to a URL, the response includes an [HTTP status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) which can be accessed with the `status_code` method. Let's see what ours is.

In [None]:
# Assign the status code for our response
response_code = response.status_code

print("code type:", type(response_code))
response_code

code type: <class 'int'>


200

Translated to English, `200` means "OK". It's the standard response for a successful HTTP request. In other words, it worked! We successfully received data back from the AlphaVantage API.

Now let's take a look at the `text`.

In [None]:
# Assign the test for our response
response_text = response.text

print("response_text type:", type(response_text))
print(response_text[:200])

response_text type: <class 'str'>
{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "TSCO.LON",
        "3. Last Refreshed": "2023-09-25",
        "4. Output Size"


This string looks like the data we previously saw in our browser when we clicked on the URL. But we can't work with data structured as JSON when it's a string. Instead, we need it in a dictionary.

In [None]:
# Use json method to access a dictionary version of the data
response_data = response.json()

print("response_data type:", type(response_data))

response_data type: <class 'dict'>


In [None]:
# Print `response_data` keys
response_data.keys()

dict_keys(['Meta Data', 'Time Series (Daily)'])

Now let's look at data that's assigned to the `"Time Series (Daily)"` key.

In [None]:
# Extract `"Time Series (Daily)"` value from `response_data`
stock_data = response_data["Time Series (Daily)"]

print("stock_data type:", type(stock_data))

# Extract data for one of the days in `stock_data`
stock_data['2023-08-14']

stock_data type: <class 'dict'>


{'1. open': '250.9000',
 '2. high': '251.2000',
 '3. low': '248.8000',
 '4. close': '250.4000',
 '5. volume': '10734130'}

Now that we know how the data is organized when we extract it from the API, let's transform it into a DataFrame to make it more manageable.

In [None]:
# Read the data from stock_data into a DataFrame named df_tesco
df_tesco = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)
print("df_tesco shape:", df_tesco.shape)
print(df_tesco.info())
df_tesco.head(10)

df_tesco shape: (100, 5)
<class 'pandas.core.frame.DataFrame'>
Index: 100 entries, 2023-09-25 to 2023-05-04
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    100 non-null    float64
 1   2. high    100 non-null    float64
 2   3. low     100 non-null    float64
 3   4. close   100 non-null    float64
 4   5. volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7+ KB
None


Unnamed: 0,1. open,2. high,3. low,4. close,5. volume
2023-09-25,270.3,272.3,267.4,268.8,8475620.0
2023-09-22,268.0,271.7944,267.3,270.9,14749700.0
2023-09-21,269.4,270.8,267.951,269.5,51629281.0
2023-09-20,273.1,274.8,268.9,270.5,16208980.0
2023-09-19,271.3,273.5,271.0,273.1,17100000.0
2023-09-18,270.5,271.94,269.6,271.5,9517546.0
2023-09-15,269.1,271.7,268.594,270.7,49860930.0
2023-09-14,264.4,267.6,263.4,266.9,8880733.0
2023-09-13,262.3,264.4997,261.8,263.7,8481579.0
2023-09-12,260.0,264.0,260.0,262.5,8849292.0


All in all, this looks pretty good, but there are a couple of problems: the data type of the dates, and the format of the headers. Let's fix the dates first. Right now, the dates are strings; in order to make the rest of our code work, we'll need to create a proper `DatetimeIndex`.

In [None]:
# Convert `df_tesco` index to `DatetimeIndex`
df_tesco.index = pd.to_datetime(df_tesco.index)

# Name index "date"
df_tesco.index.name = "date"

print(df_tesco.info())
df_tesco.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2023-09-25 to 2023-05-04
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    100 non-null    float64
 1   2. high    100 non-null    float64
 2   3. low     100 non-null    float64
 3   4. close   100 non-null    float64
 4   5. volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-09-25,270.3,272.3,267.4,268.8,8475620.0
2023-09-22,268.0,271.7944,267.3,270.9,14749700.0
2023-09-21,269.4,270.8,267.951,269.5,51629281.0
2023-09-20,273.1,274.8,268.9,270.5,16208980.0
2023-09-19,271.3,273.5,271.0,273.1,17100000.0


<div class="alert alert-info" role="alert">
    <p>Note that the rows in <code>df_tesco</code> are sorted <b>descending</b>, with the most recent date at the top. This will work to our advantage when we store and retrieve the data from our application database, but we'll need to sort it <b>ascending</b> before we can use it to train a model.</p>
</div>

Now that the dates are fixed, lets deal with the headers.

In [None]:
# Remove numbering from `df_tesco` column names
df_tesco.columns = [c.split(". ")[1] for c in df_tesco.columns]

print(df_tesco.info())
df_tesco.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2023-09-25 to 2023-05-04
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    100 non-null    float64
 1   high    100 non-null    float64
 2   low     100 non-null    float64
 3   close   100 non-null    float64
 4   volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-09-25,270.3,272.3,267.4,268.8,8475620.0
2023-09-22,268.0,271.7944,267.3,270.9,14749700.0
2023-09-21,269.4,270.8,267.951,269.5,51629281.0
2023-09-20,273.1,274.8,268.9,270.5,16208980.0
2023-09-19,271.3,273.5,271.0,273.1,17100000.0


# Defensive Programming

Defensive programming is the practice of writing code which will continue to function, even if something goes wrong. We'll never be able to foresee all the problems people might run into with our code, but we can take steps to make sure things don't fall apart whenever one of those problems happens.

So far, we've made API requests where everything works. But coding errors and problems with servers are common, and they can cause big issues in a data science project. Let's see how our `response` changes when we introduce common bugs in our code.

Let's build a `get_daily` function that gets data from the AlphaVantage API and returns a clean DataFrame.

In [None]:
def get_daily(ticker, output_size="full"):

    """Get daily time series of an equity from AlphaVantage API.

    Parameters
    ----------
    ticker : str
        The ticker symbol of the equity.
    output_size : str, optional
        Number of observations to retrieve. "compact" returns the
        latest 100 observations. "full" returns all observations for
        equity. By default "full".

    Returns
    -------
    pd.DataFrame
        Columns are 'open', 'high', 'low', 'close', and 'volume'.
        All are numeric.
    """
    # Create URL
    url = (
        "https://www.alphavantage.co/query?"
        "function=TIME_SERIES_DAILY&"
        f"symbol={ticker}&"
        f"outputsize={output_size}&"
        f"datatype=json&"
        f"apikey={settings.alpha_api_key}"
    )

    # Send request to API
    response = requests.get(url=url)

    # Extract JSON data from response
    response_data = response.json()

    if "Time Series (Daily)" not in response_data.keys():
        raise Exception(
            f"Invalid API call. Check that ticker symbol '{ticker}' is correct."
        )

    # Read data into DataFrame
    stock_data = response_data["Time Series (Daily)"]
    df = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)

    # Convert index to `DatetimeIndex` named "date"
    df.index = pd.to_datetime(df.index)
    df.index.name = "date"

    # Remove numbering from columns
    df.columns = [c.split(". ")[1] for c in df.columns]

    # Return DataFrame
    return df

In [None]:
# Test our function
df_tesco = get_daily(ticker="TSCO.LON")

print(df_tesco.info())
df_tesco.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4731 entries, 2023-09-25 to 2005-01-04
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    4731 non-null   float64
 1   high    4731 non-null   float64
 2   low     4731 non-null   float64
 3   close   4731 non-null   float64
 4   volume  4731 non-null   float64
dtypes: float64(5)
memory usage: 221.8 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-09-25,270.3,272.3,267.4,268.8,8475620.0
2023-09-22,268.0,271.7944,267.3,270.9,14749700.0
2023-09-21,269.4,270.8,267.951,269.5,51629281.0
2023-09-20,273.1,274.8,268.9,270.5,16208980.0
2023-09-19,271.3,273.5,271.0,273.1,17100000.0


How does this function deal with the two bugs we've explored in this section? Our first error, a bad URL, is something we don't need to worry about. No matter what the user inputs into this function, the URL will always be correct. But see what happens when the user inputs a bad ticker symbol. What's the error message? Would it help the user locate their mistake?

In [None]:
# Test your Exception
df_test = get_daily(ticker="TSCO.LN")

Exception: ignored