<a href="https://colab.research.google.com/github/EconomicsObservatory/courses/blob/main/3/s3_workbook.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Richard Davies** Data Science Masterclass - 2024

## 1. Python basics

In this notebook we will learn about some Python basics and useful methods with iterations (loops), that includes:
- basic variable types, such as String and List
- looping with lists
- use loops to batch download data from an API.

### 1.1 Assigning and viewing variables

In Python, we can declare a variable and assign it a value using the assignment operator =.

In [None]:
x = 10        # Assign variable x the integer value 10
y = x + 5     # Assign variable y the value of x + 5
z = "dog"     # Assign variable z the string value "dog"

We can print these variables to check their values

In [None]:
print(x, y, z)

10 15 dog


### 1.2 Lists

Lists are the most versatile datatype in Python. These are written as a list with comma separated values (items) between square brackets. Just like with numbers or strings, we can assign these to a variable using =.

In [None]:
locations = ["Swansea", "Cardiff", "Newport"]   # Creating a list of locations

# We have a list of locations, let's print these out
print(locations)

['Swansea', 'Cardiff', 'Newport']


If we want to retrieve individual items in the list, we use indexing.

**Note:** One rule to remember is that indexing starts at 0. So the array above has positions 0, 1 and 2. Asking for position 3--which would seem to be Newport--will throw an error.

In [None]:
print(locations[0])
print(locations[1])
print(locations[2])

Swansea
Cardiff
Newport


### 1.3 Loops

Any time we have repetitive code like the print locations above, we should consider a loop. This is not just to show off. Manually copying code like the above leads to errors, and it is time consuming. Loops make you more accurate, and more efficient.

With the for loop we can execute a set of statements, once for each item in a list.

In [None]:
## Here is our first loop:

locations = ["Darlington", "London", "Newport"]

for location in locations:
  print(x)

Darlington
London
Newport


**Note:** Often in tutorials, you will see singular/plural names used. e.g. ```for fruit in fruits```, where `fruits` is a list of fruit names.

The names themselves can be ignored, it is what they represent that matters. In the first line we define a list of locations. The name that comes after the `for` is just an indentifier that iterates over list.

In [None]:
placeList = ["Darlington", "Newport", "London"]
for i in placeList:
  print(i)

Darlington
Newport
London


The names are different but the output is exactly the same.

### 1.4 String formatting

To get the most out of loops, we need to be able to dynamically change strings in each iteration. This will be useful later when we want to want to make different API request and save data using different names in each loop iteration.

In [None]:
# Preliminaries 1 - the format() method:
# See: https://www.w3schools.com/python/ref_string_format.asp

# Take a sentence, and put a placeholder {} where we want to insert something:
sentence = "The best rugby team in the world is {}"
# Now we can use .format() to insert something into this place:
sentence.format('Wales')

'The best rugby team in the world is Wales'

**Note:** the `format()` method is a pre-defined piece of code that you must use. But `sentence` is just a variable name, it can be anything.

We can also use the `format()` method with a variable.

In [None]:
# Note: 'format()' method is a pre-defined piece of code that you must use;
# But 'sentence' is just a variable name, it can be anything:
sentence = "The best football team in the world is {}"
team = 'Manchester United'
sentence.format(team)

'The best football team in the world is Manchester United'

Now, we're ready to use the format method in a loop. We'll iterate through a list, and inject each list item into a pre-defined sentence.

In [None]:
text = "The best team is {}"  # Define a sentence with the {} placeholder.
teams = [
    'Manchester United', 'AC Milan', 'Barcelona', 'PSG', 'Bayern Munich', 'River Plate'
]   # Define a list of team names

# Begin a loop, dealing each list item one-by-one.
for i in teams:
    top_team = sentence.format(i)    # Format `text` with team name
    print(top_team)                  # Print our formatted string


The best football team in the world is Manchester United
The best football team in the world is AC Milan
The best football team in the world is Barcelona
The best football team in the world is PSG
The best football team in the world is Bayern Munich
The best football team in the world is River Plate


---

# 2. Looping through API: FRED

FRED, or Federal Reserve Economic Data, is an online database with hundreds of thousands of data series, covering both US and international data timeseries.

We'll use the loops and string formatting techniques to batch download data from the FRED API.

**Motivation:** You are asked by your Minister to build a dashboard for the US economy. This must take in 10 important series, each of them plotted with a line chart. The data will need to be re-downloaded each month, meaning that you are manually downloading 120 series per year, in order to keep your dashboard up to date. How can we bath process this, so that all downloads are done with one click?

Before writing the loop, we'll import the tools (packages) that we will need. Packages provide additional functionality to Python programming.

In [None]:
# // Opening web sites and retrieving information:
import requests

# // JSON. This helps us make JSON look prettier and easier to read
import json

First, define the FRED series we want to download. We'll create a list of the series codes we want data for. To find more data series, go to the [FRED](https://fred.stlouisfed.org/)

**Note:** Since these are codes are made up of letters and numbers, they must be string type (i.e. surrounded by " " or ' ' quotes)

In [None]:
fred_series = [
    'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE',  'LES1252881600Q'
]

for i in fred_series:
  print(i) # Print the series code we're about to download.

  # 1. Create unique api-url for that series
  url = f'https://api.stlouisfed.org/fred/series/observations?series_id={i}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json'
  print(url)

  # # 2. Download data from the URL. Requesting json format.
  # data = requests.get(url).json()

PCEPI
https://api.stlouisfed.org/fred/series/observations?series_id=PCEPI&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
CPIAUCSL
https://api.stlouisfed.org/fred/series/observations?series_id=CPIAUCSL&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
PAYEMS
https://api.stlouisfed.org/fred/series/observations?series_id=PAYEMS&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
DGS10
https://api.stlouisfed.org/fred/series/observations?series_id=DGS10&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
INDPRO
https://api.stlouisfed.org/fred/series/observations?series_id=INDPRO&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
UNRATE
https://api.stlouisfed.org/fred/series/observations?series_id=UNRATE&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
LES1252881600Q
https://api.stlouisfed.org/fred/series/observations?series_id=LES1252881600Q&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json


Second, we'll loop through each item in this list. In each loop iteration:
1. Define the API URL for that specific data series.
2.  Use the `requests` package to download the JSON data from the API. Save this data into a variable.
3. Create a unique filename for that data series.
4. Save the JSON data to separate files.

PCEPI
CPIAUCSL
PAYEMS
DGS10
INDPRO
UNRATE
LES1252881600Q


In [None]:
# // PUTTING IT ALL TOGETHER:

# // Set the base url:
url_base = "https://api.stlouisfed.org/fred/series/observations?series_id={}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json"


# // Set the base fileName:
file_base = "data_FRED-{}.json"

# // Pick the series that I want:
fredSeries = [
    'PCEPI', 'CPIAUCSL', 'PAYEMS', 'DGS10', 'INDPRO', 'UNRATE',
    'LES1252881600Q', 'MANMM101GBM189S'
]

# // Begin a loop, dealing with each series, one by one:
for i in fredSeries:

    # // In what follows below I print the iteration of the loop we are on:
    # // This is not necessary but can be helpful, esp with long loops:
    print("------Iteration Starts--------")
    print(i)

    # // Build the URL for this iteration of the loop, and check what we are getting:
    URL = url_base.format(i)
    print(URL)

    # // Request the html from the URL, and format as JSON:
    data = requests.get(URL).json()

    # // Set the filename, and check what we are getting:
    fileName = file_base.format(i)
    print('Data saved to', fileName)

    # // Add some white space to our output. (This is purely so we can see what is happening below clearly)
    print("------Iteration Ends--------\n")

    # /// Save the file:
    with open(fileName, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)

------Iteration Starts--------
PCEPI
https://api.allorigins.win/raw?url=https%3A%2F%2Fapi.stlouisfed.org%2Ffred%2Fseries%2Fobservations%3Fseries_id%3DPCEPI%26api_key%3D838a40e4f5a37b6b4d8c9cfc4b1abaff%26file_type%3Djson
Data saved to data_FRED-PCEPI.json
------Iteration Ends--------

------Iteration Starts--------
CPIAUCSL
https://api.allorigins.win/raw?url=https%3A%2F%2Fapi.stlouisfed.org%2Ffred%2Fseries%2Fobservations%3Fseries_id%3DCPIAUCSL%26api_key%3D838a40e4f5a37b6b4d8c9cfc4b1abaff%26file_type%3Djson
Data saved to data_FRED-CPIAUCSL.json
------Iteration Ends--------

------Iteration Starts--------
PAYEMS
https://api.allorigins.win/raw?url=https%3A%2F%2Fapi.stlouisfed.org%2Ffred%2Fseries%2Fobservations%3Fseries_id%3DPAYEMS%26api_key%3D838a40e4f5a37b6b4d8c9cfc4b1abaff%26file_type%3Djson
Data saved to data_FRED-PAYEMS.json
------Iteration Ends--------

------Iteration Starts--------
DGS10
https://api.allorigins.win/raw?url=https%3A%2F%2Fapi.stlouisfed.org%2Ffred%2Fseries%2Fobservatio

In [None]:
fred_series = ['LNS14024887', 'LNS14000089', 'LNS14000091', 'LNS14000093', 'LNU04000095']

for i in fred_series:
  print(i) # Print the series code we're about to download.

  # 1. Create unique api-url for that series
  url = f'https://api.stlouisfed.org/fred/series/observations?series_id={i}&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json'
  print(url)

  # 2. Download data from the URL. Requesting json format.
  data = requests.get(url).json()

LNS14024887
https://api.stlouisfed.org/fred/series/observations?series_id=LNS14024887&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
LNS14000089
https://api.stlouisfed.org/fred/series/observations?series_id=LNS14000089&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
LNS14000091
https://api.stlouisfed.org/fred/series/observations?series_id=LNS14000091&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
LNS14000093
https://api.stlouisfed.org/fred/series/observations?series_id=LNS14000093&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json
LNU04000095
https://api.stlouisfed.org/fred/series/observations?series_id=LNU04000095&api_key=22ee7a76e736e32f54f5df0a7171538d&file_type=json


Next steps to turn this data into a chart:
1. Go to the Colab Files tab and download one of the FRED series json files.
2. Go to your Github repository, and upload this file.
3. Get the example chart template, then replace the data url with your raw JSON file link.

---

## 2. Looping through API: ECO

Batch downloading data is a very useful tool. However, as we've shown, we can use APIs to pull data directly into charts, skipping the downloading and uploading steps.

Now, we'll create a loop to build us different API URLs, and use these URLs directly within our vega-lite editor to create a chart.

**Note:** go to the Economics Observatory [data-hub](https://www.economicsobservatory.com/data-hub) to find out which countries and data series are available on the ECO API.

ECO API structure: https://api.economicsobservatory.com/{3_letter_country_code}/{series_code}


For example: to get inflation data for the UK, we would use this url:
- https://api.economicsobservatory.com/gbr/infl

For France, this we get inflation data at this url:
- https://api.economicsobservatory.com/fra/infl


For this task, we'll produce the URL endpoints for Unemployment from a selection of countries. The unemployment rate series code is `unem`

In [None]:
# Define our base url with the {} placeholder for the country code.
base_url = 'https://api.economicsobservatory.com/{}/unem'

# Create a list of countries we want to get data for
countries = ['gbr', 'usa', 'can', 'aus']

for i in countries:
  print(base_url.format(i, i))

https://api.economicsobservatory.com/gbr/unem
https://api.economicsobservatory.com/usa/unem
https://api.economicsobservatory.com/can/unem
https://api.economicsobservatory.com/aus/unem


**Advanced example:** Loop within a loop.

We'll use a nested for loop to iterate through different countries and different series codes.


In [None]:
base_url = 'https://api.economicsobservatory.com/{}/{}'

countries = ['gbr', 'usa', 'fra']

eco_series = ['popu', 'infl', 'grow']

# First, set up country loop
for i in countries:
  # Second, set up series for loop
  for s in eco_series:
    print(base_url.format(i, s))    # Print out each url

https://api.economicsobservatory.com/gbr/popu
https://api.economicsobservatory.com/gbr/infl
https://api.economicsobservatory.com/gbr/grow
https://api.economicsobservatory.com/usa/popu
https://api.economicsobservatory.com/usa/infl
https://api.economicsobservatory.com/usa/grow
https://api.economicsobservatory.com/fra/popu
https://api.economicsobservatory.com/fra/infl
https://api.economicsobservatory.com/fra/grow


# END

---
---

### Loop Examples

We can iterate through a number sequence by using ```range``` function. Note that the range function is zero based.

In [None]:
for i in range(6):
  print(i)

0
1
2
3
4
5


In [None]:
for i in range(1, 6):
  print(i)

1
2
3
4
5


In [None]:
for i in range(0, 10, 4):
  print(i)

0
4
8
