<a href="https://colab.research.google.com/github/RDeconomist/RDeconomist.github.io/blob/main/s3_workbookLoopsAPIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Richard Davies** Data Science Masterclass - 2024

In this notebook we will learn useful tools allowing us to iterate (loops) over a list of data series in order to provide multiple requests to an API, batch downloading this data.

<br>
<br>

### Lists

Lists are a simple datatype. These are written with comma separated values (items) between square brackets. Just like with numbers or strings, we can assign these to a variable using =.

In the code below we have a list of places. We define a variable "locations" and assign our list to this variable.

In [None]:
locations = ["Swansea", "Cardiff", "Newport"]   # Creating a list of locations

# We have a list of locations, let's print these out
print(locations)

<br>
<br>

### Printing items from a list

If we want to retrieve individual items in the list, we use "indexing".

**Note:** One rule to remember is that indexing starts at 0. So the array above has positions 0, 1 and 2. Asking for position 3--which would seem to be Newport--will throw an error.

In [None]:
print(locations[0])
print(locations[2])
print(locations[1])

<br>
<br>

### Loops

Any time we have repetitive code like that above, we should consider a loop. This is not just to show off. Manually copying code like the above leads to errors, and it is time consuming. Loops make you more accurate, and more efficient.

With the "for" loop we can execute a set of statements, once for each item in a list.

In [None]:
## Here is our first loop:

locations = ["Darlington", "London", "Newport"]

for i in locations:
  print(i)

<br>
<br>

### The string format method {}

To get the most out of loops, we will want to change strings in each iteration. To do this we can use something called format method. You can read more about this [here](https://www.w3schools.com/python/ref_string_format.asp).

In [None]:
# Take any string variable, and put a placeholder {} where we want to insert something:
x = "The best rugby team in the world is {}"

# Now we can use .format() to insert something into this place:
x.format('Wales')

<br>
<br>

### A loop with the format method:
We next combine the format method with a loop, in this case to print out a list of claims about football teams.

In [None]:
# First, define a sentence with the {} placeholder.
text = "The best team is {}"

# Next, define a list of team names.
teams = ['Man Utd', 'AC Milan', 'Barcelona', 'PSG', 'Bayern', 'River Plate']

# Finally, create a loop where we deal with the teams one by one.
for i in teams:
    top_team = text.format(i)    # Format `text` with team name
    print(top_team)              # Print our formatted string


---

<br>
<br>

### Looping with an API: FRED

With these buliding blocks in place, lets build something that may actually be useful. Imagine you cover the US Economy as an analyst. You make a weekly dashboard. This must take in 10 important series, each of them plotted with a line chart. The data will need to be re-downloaded each week, meaning that you are manually downloading 520 series per year, in order to keep your dashboard up to date. How can we bath process this, so that all downloads are done with one click? <br>

<br>

First, we need a list of FRED series we want to download. We'll create a list of the series codes we want data for.

For example, FRED series for GDP and unemployment are: <br>
https://fred.stlouisfed.org/series/GDP <br>
https://fred.stlouisfed.org/series/UNRATE

 *Note: Since these are codes are made up of letters and numbers, they must be string type (i.e. surrounded by " " or ' ' quotes)*


In [None]:
# Write out my list of series:
fred_series = ['GDP', 'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE']

# // Set the base url:
url_base = "https://api.stlouisfed.org/fred/series/observations?series_id={}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"

for i in fred_series:
  # Print the series code we're about to download.
  print(i)

  # Build the URL for this iteration of the loop, and check what we are getting:
  URL = url_base.format(i)
  print(URL)

  # Add some white space to our output. (This is purely so we can see what is happening below clearly)
  print("\n")

<br>
<br>

### Batch downloader: FRED.
We are now ready to build out batch downloader. The code below

1. Accesses some Python packages that we will need.
2. Sets up (defines) the elements we will use over and over again in our loop.
3. Runs the loop itself.

In [None]:
# // FIRST BATCH DOWNLOADER


# 1. PREPARATORY STEPS - ACCESS PACKAGES WE NEED

## // The "requests" package, for opening web sites and retrieving information:
import requests

## // The "json" package, for helping us make JSON easier to read:
import json

## /// Files.  This is part of Collab - allows you to upload and download files
from google.colab import files

## ------

# 2. SETTING UP THE ELEMENTS WE NEED IN OUR LOOP:

# // Pick the series that I want:
fred_series = ['GDP', 'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE']

# // Set the base url:
url_base = "https://api.stlouisfed.org/fred/series/observations?series_id={}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"

# // Set a base fileName:
file_base = "data_FRED-{}.json"

## ------

# 3. USING THE ABOVE TO RUN A LOOP:

# // Begin a loop, dealing with each series, one by one:
for i in fred_series:

    # // Print some text to make clear when iteration starts and ends:
    # // This is not necessary but can be helpful, esp with long loops:
    print("------Iteration Starts--------")
    print(i)

    # // Build the URL for this iteration of the loop, print it to check what we are getting:
    URL = url_base.format(i)
    print(URL)

    # // Request the html from the URL, and format as JSON:
    data = requests.get(URL).json()

    # // Build the filename. Print it to check what we are getting:
    fileName = file_base.format(i)
    print('Series we are downloading is', i)
    print('Data saved to', fileName)

    # /// Save the file:
    with open(fileName, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)

    # /// Download the file to local machine:
    files.download(fileName)

    # // Add some white space to our output. (This is purely so we can see what is happening below clearly)
    print("------Iteration Ends--------\n")

---