<a href="https://colab.research.google.com/github/RDeconomist/RDeconomist.github.io/blob/main/data/DSEP_2_0_loopsExamples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Richard Davies** Data Science for Economics and Policy - 2023

**Tutorial**: First loops

Loop tutorials will typically start with an example like the one below - lets take a look.

In [None]:
locations = ["Swansea", "Cardiff", "Newport"]

## OK so we have a list of locations.

## One rule to remember is that indexing starts at 0. So the array above has positions 0, 1 and 2. Asking for position 3--which would seem to be Newport--will throw an error.

## If we want to retrive the locations we can therefore write:

print(locations[0])
print(locations[1])
print(locations[2])

## Try this!


Darlington
Newport
London


Any time we have repetitive code like the above, we should consider a loop. This is not just to show off. Manually copying code like the above leads to errors, and it is time consuming. Loops make you more accrate, and more efficient.

In [None]:
## Here is our first loop:

locations = ["Darlington", "London", "Newport"]

for location in locations:
  print(location)

Darlington
London
Newport


A note on coding norms. I find the usage above really confusing. It seems to suggest that the singluar/plural matters in some way. For example, you will see lots of tutorials where you have "for fruit in fruits" or something of this nature.

The names themselves can be ignored, it is what they represent that matters. In the first line we define list of locations. The name that comes after the for is just an indentifier that iterates over list.

In [None]:
placeList = ["Darlington", "Newport", "London"]
for i in placeList:
  print(i)

Darlington
Newport
London


Lets pause to discuss this. For those of you that understand loops already, how can you mend the following code: (a) so that it works, and (b) so that it makes sense to a human?

In [None]:
words = ["lovely", "city", "is", "a", "Darlington"]
numbers = [1,2,3,4,5]
for i in numbers:
  print(words)
  print(words[i])


In [None]:
## Here is some mended code:

words = ["lovely", "city", "is", "a", "Darlington"]
numbers = [0,1,2,3,4]
richard = [4,2,3,0,1]

for i in numbers:
  print(words[i])

print("\n") # This is just a new line. Not needed just makes a break between the output.

for x in richard:
  print(words[x])

lovely
city
is
a
Darlington


Darlington
is
a
lovely
city


**Tutorial:** Using loops to batch download data from an API

**Motivation:** You are asked by your Minister to build a dashboard for the US economy. This must take in 10 important series, each of them plotted with a line chart. The data will need to be re-downloaded each month, meaning that you are manually downloading 120 series per year, in order to keep your dashboard up to date. How can we bath process this, so that all downloads are done with one click?

In [1]:
# Preliminaries 1 - the format() method:
# See: https://www.w3schools.com/python/ref_string_format.asp

# Take a sentence, and put a placeholder {} where we want to insert something:
sentence = "The best rugby team in the world is {}"
# Now we can use .format() to insert something into this place:
sentence.format('Wales')

'The best rugby team in the world is Wales'

In [2]:
# Note: 'format()' method is a pre-defined piece of code that you must use;
# But 'sentence' is just a variable name, it can be anything:
x = "The best football team in the world is {}"
x.format('Manchester United')

'The best football team in the world is Manchester United'

In [3]:
# Next note that we can put a variable within the format():
sentence = "The best team this year is {}"
team = 'Manchester City'
sentence.format(team)
# This allows us to change the sentence

'The best team this year is Manchester City'

In [4]:
# // PRELIMINARIES 2 - Using the format method in a loop:

sentence = "The best team is {}"
teams = [
    'Manchester United', 'AC Milan', 'Barcelona', 'PSG', 'Bayern Munich',
    'River Plate'
]

# // Begin a loop, dealing with series one by one:
for i in teams:

    # // Everything that follows the for loop is indented. (On my machine, three spaces)
    # // Build the URL for this iteration of the loop, and check what we are getting:
    topTeam = sentence.format(i)
    print(topTeam)


The best team is Manchester United
The best team is AC Milan
The best team is Barcelona
The best team is PSG
The best team is Bayern Munich
The best team is River Plate


In [10]:
# // PUTTING THINGS TOGETHER 1 - A loop of all our variables:

# // Set a base URL.
# // This includes everthing that does not change in our loop.
# // And a placeholder "{}" for the part that does change.
url_base = "https://api.stlouisfed.org/fred/series/observations?series_id={}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"

# NOW PICK ALL THE SERIES THAT WE ARE INTERESTED IN:
fredSeries = [
    'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE',
    'LES1252881600Q'
]

# // Begin a loop, dealing with series one by one:
for i in fredSeries:
    # // Build the URL for this iteration of the loop, and check what we are getting:
    URL = url_base.format(i)
    print(URL)


https://api.stlouisfed.org/fred/series/observations?series_id=PCEPI&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=CPIAUCSL&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=PAYEMS&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=DGS10&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=INDPRO&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=UNRATE&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
https://api.stlouisfed.org/fred/series/observations?series_id=LES1252881600Q&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json


In [15]:
# // PUTTING THINGS TOGETHER 2 - Importing some tools that we will need:

# // Opening web sites and web scraping:
import requests

# // JSON. This helps us make JSON look prettier and easier to read
import json

# /// Files.  This is part of Collab - allows you to upload and download files
from google.colab import files

# // OS. Sometimes need this for finding working directory:
import os

In [12]:
## // An aside: checking which versions of thins are running
print(requests.__version__)
print(json.__version__)

2.28.0
2.0.9


In [13]:
## // Getting data from a single API call:

url = "https://api.stlouisfed.org/fred/series/observations?series_id=DGS10&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"

# We use'requests' which we installed above:
data = requests.get(url).json()

# Print what we got
data

{'realtime_start': '2024-02-14',
 'realtime_end': '2024-02-14',
 'observation_start': '1600-01-01',
 'observation_end': '9999-12-31',
 'units': 'lin',
 'output_type': 1,
 'file_type': 'json',
 'order_by': 'observation_date',
 'sort_order': 'asc',
 'count': 16205,
 'offset': 0,
 'limit': 100000,
 'observations': [{'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-02',
   'value': '4.06'},
  {'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-03',
   'value': '4.03'},
  {'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-04',
   'value': '3.99'},
  {'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-05',
   'value': '4.02'},
  {'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-08',
   'value': '4.03'},
  {'realtime_start': '2024-02-14',
   'realtime_end': '2024-02-14',
   'date': '1962-01-09',
   'value': '4.05'},
  {'

In [14]:
# // Downloading the date from a single API call:

# // Based on the steps above, we have a variable "data" which has data on the US Government 10 year yield.

# // Set the filename, and check what we are getting:
fileName = "data_FRED-DGS10.json"
print(fileName)
# // Note: again the file name can be anything.

# /// Save the file:
with open(fileName, 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)

# /// Download the file to local machine:
files.download('data_FRED-DGS10.json')

data_FRED-DGS10.json


NameError: name 'files' is not defined

In [16]:
# // PUTTING IT ALL TOGETHER:

# // Set the base url:
url_base = "https://api.stlouisfed.org/fred/series/observations?series_id={}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"

# // Set the base fileName:
file_base = "data_FRED-{}.json"

# // Pick the series that I want:
fredSeries = [
    'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE',
    'LES1252881600Q'
]

# // Begin a loop, dealing with each series, one by one:
for i in fredSeries:

    # // In what follows below I print the iteration of the loop we are on:
    # // This is not necessary but can be helpful, esp with long loops:
    print("------Iteration Starts--------")
    print(i)

    # // Build the URL for this iteration of the loop, and check what we are getting:
    URL = url_base.format(i)
    print(URL)

    # // Request the html from the URL:
    data = requests.get(URL).json()
    print(data)

    # // Set the filename, and check what we are getting:
    fileName = file_base.format(i)
    print(fileName)

    # // Add some white space to our output. (This is purely so we can see what is happening below clearly)
    print("------Iteration Ends--------")

    # /// Save the file:
    with open(fileName, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)

    # /// Download the file to local machine:
    files.download(fileName)


------Iteration Starts--------
PCEPI
https://api.stlouisfed.org/fred/series/observations?series_id=PCEPI&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
{'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'observation_start': '1600-01-01', 'observation_end': '9999-12-31', 'units': 'lin', 'output_type': 1, 'file_type': 'json', 'order_by': 'observation_date', 'sort_order': 'asc', 'count': 780, 'offset': 0, 'limit': 100000, 'observations': [{'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'date': '1959-01-01', 'value': '15.164'}, {'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'date': '1959-02-01', 'value': '15.179'}, {'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'date': '1959-03-01', 'value': '15.189'}, {'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'date': '1959-04-01', 'value': '15.219'}, {'realtime_start': '2024-02-14', 'realtime_end': '2024-02-14', 'date': '1959-05-01', 'value': '15.227'}, {'realtime_sta

NameError: name 'files' is not defined

**Motivation**: The ONS API is more complex than FRED since it has two parts that change: the series, and the dataset. This is common to lots of APIs, including the Economics Observatory API. This tutorial discusses how to use the string format method with named placeholders in order to form usable API urls.

In [17]:
# // The basic ONS API looks like this:
"https://api.ons.gov.uk/timeseries/L55O/dataset/MM23/data"

# // Click on the link above and you will see data on inflation.

# // There are TWO things that could change here, the series and the dataset.

# // Using our placholders we can write this as:
"https://api.ons.gov.uk/timeseries/{}/dataset/{}/data"

# // We will need a way to differentiate between the two placeholders.
# // This can be done using numbers or words
"https://api.ons.gov.uk/timeseries/{1}/dataset/{2}/data"

# // Our challenge now is to fill spaces {1} and {2} with the things that we want.

# // Lets try this to build some logic:

url_base = "https://api.ons.gov.uk/timeseries/{1}/dataset/{2}/data"

URL = url_base.format("mySeries", "myDataset")
print(URL)

## Why does this not work?

IndexError: Replacement index 2 out of range for positional args tuple

In [18]:
## Remember that positions start at 0:

url_base = "https://api.ons.gov.uk/timeseries/{1}/dataset/{2}/data"

URL2 = url_base.format("mySeries", "myDataset", "apple")
print(URL2)

## Q.  What will this code produce?


https://api.ons.gov.uk/timeseries/myDataset/dataset/apple/data


In [19]:
## Remember that positions start at 0:

url_base = "https://api.ons.gov.uk/timeseries/{0}/dataset/{1}/data"

URL3 = url_base.format("L55O", "MM23")
print(URL3)

## Q.  Run this code and test that the API works...
## ! Note that 0 and O are different!


https://api.ons.gov.uk/timeseries/L55O/dataset/MM23/data


In [20]:
## Using this practically:

url_base = "https://api.ons.gov.uk/timeseries/{0}/dataset/{1}/data"
codes = [['A', 'B'], ['C', 'D']]

URL1 = url_base.format(codes[0][0], codes[1][1])
print(URL1)


https://api.ons.gov.uk/timeseries/A/dataset/D/data


In [21]:
## Using this practically:

url_base = "https://api.ons.gov.uk/timeseries/{0}/dataset/{1}/data"
codes = [['L55O', 'MM23'], ['LZVB', 'PRDY']]

URL1 = url_base.format(codes[0][0], codes[0][1])
URL2 = url_base.format(codes[1][0], codes[1][1])
print(URL1)
print(URL2)

## Discussion: how to practically make this work for policy purposes, using GitHub to store a list of codes.

https://api.ons.gov.uk/timeseries/L55O/dataset/MM23/data
https://api.ons.gov.uk/timeseries/LZVB/dataset/PRDY/data


In [22]:
## Using this practically:

url_base = "https://api.ons.gov.uk/timeseries/{0}/dataset/{1}/data"
longList = [['L55O', 'MM23'], ['LZVB', 'PRDY']]

for i in range(2):
    print(i)
    codes = longList[i]
    print(codes)
    URL = url_base.format(codes[0], codes[1])
    print(URL)

# Or alternatively:
print("\n")

for i in range(2):
    URL2 = url_base.format(longList[i][0], longList[i][1])
    print(URL2)

# Discussion: How can we use the example above practically to maintain data sets we use regularly using a simple CSV on GitHub?


0
['L55O', 'MM23']
https://api.ons.gov.uk/timeseries/L55O/dataset/MM23/data
1
['LZVB', 'PRDY']
https://api.ons.gov.uk/timeseries/LZVB/dataset/PRDY/data


https://api.ons.gov.uk/timeseries/L55O/dataset/MM23/data
https://api.ons.gov.uk/timeseries/LZVB/dataset/PRDY/data
