# Pulling data from (open) REST APIs

[Big source of public APIs](https://rapidapi.com/collection/list-of-free-apis)

In lecture 1 we saw how to use `requests` to fetch a webpage:

In [1]:
import requests
r = requests.get('https://www.cnn.com')
r.text[0:300]

'  <!DOCTYPE html>\n<html lang="en" data-uri="cms.cnn.com/_pages/clg34ol9u000047nodabud1o2@published" data-layout-uri="cms.cnn.com/_layouts/layout-homepage/instances/homepage-domestic@published">\n  <head><style>body,h1,h2,h3,h4,h5{font-family:cnn_sans_display,helveticaneue,Helvetica,Arial,Utkal,sans-s'

If the URL is to a page that gives you HTML, we would say that we are fetching a webpage. On the other hand, if the URL is returning data in some form, we would say that we are accessing a *REST* api.
 
**REST** stands for REpresentational State Transfer. Whenever you come across the term REST, simply consider it as a "web service for accessing data, not HTML." 

We will be extracting data from web servers that intentionally offer convenient data access points through URLs. The necessary information to obtain this data typically includes:

* Base URL, including machine name, port number, and "file" path
* The names and values of parameters
* What data comes back and in what format (XML, JSON, CSV, ...)

## Looking up word definitions

The [dictionaryapi.dev](https://dictionaryapi.dev/) API lets us look up words in various languages and get the definitions. The format of the URL to access the API is just:

```
https://api.dictionaryapi.dev/api/v2/entries/<language_code>/<word>
```

So, we can get the English definition for *science* like this (and parse the json result):

In [8]:
import requests
import json

r = requests.get('https://api.dictionaryapi.dev/api/v2/entries/en_US/science')
data = json.loads(r.text)
r.text[:100]

'[{"word":"science","phonetic":"/ˈsaɪəns/","phonetics":[{"text":"/ˈsaɪəns/","audio":"https://api.dict'

In [13]:
len(data)
#data[0]
#data[1]

2

In [3]:
data[0]['meanings'][0]['definitions']

[{'definition': 'A particular discipline or branch of learning, especially one dealing with measurable or systematic principles rather than intuition or natural ability.',
  'synonyms': [],
  'antonyms': [],
  'example': 'Of course in my opinion Social Studies is more of a science than an art.'},
 {'definition': 'Specifically the natural sciences.',
  'synonyms': [],
  'antonyms': [],
  'example': 'My favorite subjects at school are science, mathematics, and history.'},
 {'definition': 'Knowledge gained through study or practice; mastery of a particular discipline or area.',
  'synonyms': [],
  'antonyms': []},
 {'definition': 'The fact of knowing something; knowledge or understanding of a truth.',
  'synonyms': [],
  'antonyms': []},
 {'definition': 'The collective discipline of study or learning acquired through the scientific method; the sum of knowledge gained from such methods and discipline.',
  'synonyms': [],
  'antonyms': []},
 {'definition': 'Knowledge derived from scientific

In [4]:
data = json.loads(r.text)
print(data[0]['word'], data[0]['phonetic'])
data

science /ˈsaɪəns/


[{'word': 'science',
  'phonetic': '/ˈsaɪəns/',
  'phonetics': [{'text': '/ˈsaɪəns/',
    'audio': 'https://api.dictionaryapi.dev/media/pronunciations/en/science-1-ca.mp3',
    'sourceUrl': 'https://commons.wikimedia.org/w/index.php?curid=189581',
    'license': {'name': 'BY-SA 3.0',
     'url': 'https://creativecommons.org/licenses/by-sa/3.0'}},
   {'text': '/ˈsaɪəns/',
    'audio': 'https://api.dictionaryapi.dev/media/pronunciations/en/science-1-us.mp3',
    'sourceUrl': 'https://commons.wikimedia.org/w/index.php?curid=189574',
    'license': {'name': 'BY-SA 3.0',
     'url': 'https://creativecommons.org/licenses/by-sa/3.0'}}],
  'meanings': [{'partOfSpeech': 'noun',
    'definitions': [{'definition': 'A particular discipline or branch of learning, especially one dealing with measurable or systematic principles rather than intuition or natural ability.',
      'synonyms': [],
      'antonyms': [],
      'example': 'Of course in my opinion Social Studies is more of a science than an a

**Exercise**: Use the API to fetch and print out the definition of *frippery* in english.

## Public Government Data from datausa.io

"The definitive place to explore US public data" [https://datausa.io/]
 
There is a REST data API available at URL template:

```
URL = f"https://datausa.io/api/data?drilldowns={level}&measures={measures}&year={year}" # for some level, measure, and year.
```

In [19]:
level = "Nation"
measure = "Population"
year = "latest"
URL = f"https://datausa.io/api/data?drilldowns={level}&measures={measure}&year={year}"
    
r = requests.get(URL)
data = json.loads(r.text)

print(json.dumps(data)[0:1000])

{"data": [{"ID Nation": "01000US", "Nation": "United States", "ID Year": 2022, "Year": "2022", "Population": 331097593, "Slug Nation": "united-states"}], "source": [{"measures": ["Population"], "annotations": {"source_name": "Census Bureau", "source_description": "The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year.", "dataset_name": "ACS 5-year Estimate", "dataset_link": "http://www.census.gov/programs-surveys/acs/", "table_id": "B01003", "topic": "Diversity", "subtopic": "Demographics"}, "name": "acs_yg_total_population_5", "substitutions": []}]}


In [20]:
data['data']

[{'ID Nation': '01000US',
  'Nation': 'United States',
  'ID Year': 2022,
  'Year': '2022',
  'Population': 331097593,
  'Slug Nation': 'united-states'}]

Here's how to fetch the data for as many years possible from the data:

In [21]:
level = "Nation"
measure = "Population"
URL = f"https://datausa.io/api/data?drilldowns={level}&measures={measure}"
    
r = requests.get(URL)
data = json.loads(r.text)

print(json.dumps(data)[0:1000])

{"data": [{"ID Nation": "01000US", "Nation": "United States", "ID Year": 2022, "Year": "2022", "Population": 331097593, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2021, "Year": "2021", "Population": 329725481, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2020, "Year": "2020", "Population": 326569308, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2019, "Year": "2019", "Population": 324697795, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2018, "Year": "2018", "Population": 322903030, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2017, "Year": "2017", "Population": 321004407, "Slug Nation": "united-states"}, {"ID Nation": "01000US", "Nation": "United States", "ID Year": 2016, "Year": "2016", "Population": 318558162, "Slug Nation": "united

This website gives you JSON, which is very easy to load and dump using the default `json` package as you can see from that code snippet. As before, you can grab one of the elements using dictionary like indexing:

In [22]:
results = data['data']
results[0:2]

[{'ID Nation': '01000US',
  'Nation': 'United States',
  'ID Year': 2022,
  'Year': '2022',
  'Population': 331097593,
  'Slug Nation': 'united-states'},
 {'ID Nation': '01000US',
  'Nation': 'United States',
  'ID Year': 2021,
  'Year': '2021',
  'Population': 329725481,
  'Slug Nation': 'united-states'}]

It is convenient to look at the records in a data frame:

In [23]:
import pandas as pd
pd.DataFrame.from_dict(results)

Unnamed: 0,ID Nation,Nation,ID Year,Year,Population,Slug Nation
0,01000US,United States,2022,2022,331097593,united-states
1,01000US,United States,2021,2021,329725481,united-states
2,01000US,United States,2020,2020,326569308,united-states
3,01000US,United States,2019,2019,324697795,united-states
4,01000US,United States,2018,2018,322903030,united-states
5,01000US,United States,2017,2017,321004407,united-states
6,01000US,United States,2016,2016,318558162,united-states
7,01000US,United States,2015,2015,316515021,united-states
8,01000US,United States,2014,2014,314107084,united-states
9,01000US,United States,2013,2013,311536594,united-states
