In [18]:
import json
# !pip install -Uq requests
import requests
!mkdir -p 
# data

# Using an API (Application Programming Interfaces)

Learning outcomes

- difference between API & webscraping
- what JSON is (and why it's like a Python `dict`)
- how to properly handle files in Python
- what a REST API is
- how to use the `requests` library

## API versus web-scraping

**Both are ways to sample data from the internet**

API
- structured
- provided as a service (you are talking to a server via a REST API)
- limited data / rate limits / paid / require auth (sometimes)
- most will give back JSON (maybe XML or CSV)

Web scraping
- less structure
- parsing HTML meant for your browser

Neither is better than the other

- API developer can limit what data is accessible through the API
- API developer can not maintain the API
- website page can change HTML structure
- website page can have dynamic (Javascript) content that requires execution (usually done by the browser) before the correct HTML is available

Much of the work in using an API is figuring out how to properly construct URLs for `GET` requests
- requires looking at their documentation (& ideally a Python example!)

## Where to find APIs

- [ProgrammableWeb](https://www.programmableweb.com/apis/directory) - a collection of available API's
- For the *Developer* or *For Developers* documentation on your favourite website
- [public-apis/public-apis](https://github.com/public-apis/public-apis)

## Using APIs

Most APIs require authentication

- so they API developer knows who you are
- can charge you
- can limit access
- commonly via key or OAuth (both of which may be free)

All the APIs we use here are unauthenticated - this is to avoid the time of you all signing up

If your app requries authentication, it's usually done by passing in your credentials into the request (as a header)

```python
response = requests.get(url, auth=auth)
```

## JSON strings

JSON (JavaScript Object Notation) is a:
- lightweight data-interchange format (text)
- easy for humans to read and write 
- easy for machines to parse and generate
- based on key, value pairs

You can think of the Python `dict` as JSON like:

In [6]:
data = {'name': 'alan-turing'}
data

{'name': 'alan-turing'}

But true JSON is just a string (only text).  We can use `json.dumps` from the standard library to turn the `dict` into a JSON string:

In [7]:
data = json.dumps(data)
data

'{"name": "alan-turing"}'

In [8]:
type(data)

str

We can then use `json.loads` to turn this string back into a `dict`:

In [9]:
json.loads(data)

{'name': 'alan-turing'}

## Opening, reading & writing to files

### Reading from a file

We open files using the Python `open` builtin function, followed by a `read`:

In [11]:
import os

In [13]:
cd ..\

E:\Project\dsr32


In [14]:
open('./README.md', 'r').read()[:100]

'# dsr32\nThis repo is used to keep track of all classes at Data Science Retreat: https://datasciencer'

(Note the `./path` - the `.` refers to the current working directory. Could be where the notebook is, or where the notebook server is running.

If we wanted to read the file as separate lines, we could use `readlines()` (note we would still need to manually strip off the `\n` characters later)

(Note `\n` is used as a newline indicator in text files - you never see it because your editor interprets it as a line break :)

In [16]:
open('./README.md', 'r').readlines()[:100]

['# dsr32\n',
 'This repo is used to keep track of all classes at Data Science Retreat: https://datascienceretreat.com/\n',
 '\n',
 '\n',
 '## Introduction to Python\n',
 '\n',
 '### Topics:\n',
 '1. Introduction: \n',
 '  - Data Types \n',
 '  - Conditionals \n',
 '  - For Loops\n',
 '  - PEP8 Coding Standards\n',
 '2. Importing Files & Packages:\n',
 '  - Navigating the File System\n',
 '  - Reading and Saving Files\n',
 '3. Functions:\n',
 '  - Writing Functions \n',
 '  - Parameters and Arguments \n',
 '4. Accessing Data via APIs\n',
 '  - About JSONs \n',
 '  - API intro\n',
 '  - Saving and working with data from an API\n',
 '5. Scripting:\n',
 '  - Moving from Interactive Mode to Script Mode\n',
 '  - Working with Arg Parse\n',
 '  \n',
 '## Second ']

## Using `open`

`open(path, mode)`

Common values for the mode:
- `r` read
- `rb` read binary
- `w+` write (`+` to create file if it doesn't exist)
- `a` append

Note there are options for both reading & writing - we actually use `open` for both reading & writing.

We open a file using the Python builtin `open`, which is then followed either by a read or write stage

- open the file
- read the file OR write to the file

Notice that the file is read in as a single string, with the newline character `\n` separating lines

- this is how all text files are structured
- your editor does the line splitting for you

### Writing to a file (without context management)

We can write to a new file using the same `open` builtin

- open the file
- write to the file

In [20]:
cd Intro_to_python

E:\Project\dsr32\Intro_to_python


In [21]:
open('./data/creat_output.data', 'w+').write('We make this file to show how not to do it\n')

43

In [3]:
open('./data/output.data', 'w').write('We make this file to show how not to do it\n')

43

Note that we can do the same file write by explicitly assiging the file object to a variable (note the `a` to append)

In [4]:
fi = open('./data/output.data', 'a')
fi.write('We make this file slightly differently to show how not to do it\n')

64

The issue with the code above is that we aren't closing the file - we can fix this by intentionally closing the file.  

One way to do this is to use `.close()` when we are done:

In [5]:
fi = open('./data/output.data', 'a')
fi.write('This time we close the file manually\n')
fi.close()

This requires us to remember to close (also an additional line).

### Reading files with context management

The Pythonic way of handling opening & closing of files is context management:

In [6]:
with open('./readme.md', 'r') as fi:
    data = fi.read()

The difference with using context management (which essentially just means using the `with` statement) is that it closes the file automatically when finished writing/reading.

### Writing to a file with context management

Using the context management syntax, we can save our `data` dict as JSON, using `json.dump` to write the dict to a file:

In [22]:
data = {'name': 'alan turing'}
with open('./data/output.json', 'w') as fi:
    json.dump(data, fi)

Let's check it worked by loading the file, here using `json.load` to load from the file object:

In [23]:
with open('./data/output.json', 'r') as fi:
    data = json.load(fi)
data

{'name': 'alan turing'}

## REST APIs

[REST - Wiki](https://en.wikipedia.org/wiki/Representational_state_transfer)

REST is a set of constraints that allow **stateless communication of text data on the internet**

- REST = REpresentational State Transfer
- API = Application Programming Interface

REST
- communication of resources (located at URLs / URIs)
- requests for a resource are responded to with a text payload (HTML, JSON etc)
- these requests are made using HTTP (determines how messages are formatted, what actions (methods) can be taken)
- common HTTP methods are `GET` and `POST`

HTTP methods
- GET - retrieve information about the REST API resource
- POST - create a REST API resource
- PUT - update a REST API resource
- DELETE - delete a REST API resource or related component

RESTful APIs enable you to develop any kind of web application having all possible CRUD (create, retrieve, update, delete) operations

- can do anything we would want to do with a database

*Further reading*
- [Web Architecture 101](https://medium.com/storyblocks-engineering/web-architecture-101-a3224e126947) for more detail on how the web works

## Example - sunrise API

Docs - https://sunrise-sunset.org/api

First we need to form the url
- use `?` to separate the API server name from the parameters for our request
- use `&` to separate the parameters from each other
- use `+` instead of space in the parameter

In [24]:
# getting sunrise & sunset for Berlin today
res = requests.get("https://api.sunrise-sunset.org/json?lat=52.5200&lng=13.4050")
data = res.json()
data

{'results': {'sunrise': '4:59:08 AM',
  'sunset': '4:55:36 PM',
  'solar_noon': '10:57:22 AM',
  'day_length': '11:56:28',
  'civil_twilight_begin': '4:26:53 AM',
  'civil_twilight_end': '5:27:51 PM',
  'nautical_twilight_begin': '3:46:55 AM',
  'nautical_twilight_end': '6:07:50 PM',
  'astronomical_twilight_begin': '3:05:32 AM',
  'astronomical_twilight_end': '6:49:12 PM'},
 'status': 'OK'}

This response is JSON - `requests.json()` turns it into a `dict`:

In [25]:
res = requests.get("https://api.sunrise-sunset.org/json?lat=18.5204&lng=73.8567")
data = res.json()
data

{'results': {'sunrise': '12:53:12 AM',
  'sunset': '12:58:02 PM',
  'solar_noon': '6:55:37 AM',
  'day_length': '12:04:50',
  'civil_twilight_begin': '12:32:32 AM',
  'civil_twilight_end': '1:18:43 PM',
  'nautical_twilight_begin': '12:07:13 AM',
  'nautical_twilight_end': '1:44:02 PM',
  'astronomical_twilight_begin': '11:41:51 PM',
  'astronomical_twilight_end': '2:09:23 PM'},
 'status': 'OK'}

In [26]:
type(data)

dict

It's common to have a top level heirarchy to dig through to get the data:

In [27]:
data.keys()

dict_keys(['results', 'status'])

Here the interesting stuff is in `results`:

In [28]:
data['results']

{'sunrise': '12:53:12 AM',
 'sunset': '12:58:02 PM',
 'solar_noon': '6:55:37 AM',
 'day_length': '12:04:50',
 'civil_twilight_begin': '12:32:32 AM',
 'civil_twilight_end': '1:18:43 PM',
 'nautical_twilight_begin': '12:07:13 AM',
 'nautical_twilight_end': '1:44:02 PM',
 'astronomical_twilight_begin': '11:41:51 PM',
 'astronomical_twilight_end': '2:09:23 PM'}

In [29]:
data['status']

'OK'

### Exercise: 
Save the parameter `sunrise` as a string value by itself in a variable called `sunrise_time`

## Example - Chronicling America API

Docs - https://chroniclingamerica.loc.gov/about/api/ 

In [31]:
term = "germany"
fmt = "json"
url = f"https://chroniclingamerica.loc.gov/search/pages/results/?proxtext={term}&format={fmt}"
url

'https://chroniclingamerica.loc.gov/search/pages/results/?proxtext=germany&format=json'

We use the `requests` HTTP library to perform a `GET` request:

In [33]:
response = requests.get(url)

In [34]:
response

<Response [200]>

## HTTP response

What we recieved above is an *HTTP response**

[HTTP Response - Wiki](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Response_message)

The response message consists of the following:

- a status line which includes the status code and reason message (e.g., HTTP/1.1 200 OK, which indicates that the client's request succeeded)
- response header fields (e.g., Content-Type: text/html)
- an optional message body

## What can we do with this HTTP response in `requests`?

The Python builtin `dir` gives us all the attributes & methods of a Python object.

This also includes all the `__` dunder (literally double-under) methods) - which we filter out using a list comprehension.

In [35]:
[item for item in dir(response) if '__' not in item]

['_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

We can get the HTTP status code (used to communicate things like everything OK (200), stop making requests etc - see [List of HTTP status codes - Wiki](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)):

In [36]:
response.status_code

200

The HTTP response headers:

In [37]:
response.headers

{'Date': 'Tue, 27 Sep 2022 14:38:33 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Last-Modified': 'Tue, 27 Sep 2022 05:38:49 GMT', 'X-Robots-Tag': 'noindex, nofollow', 'Expires': 'Wed, 28 Sep 2022 05:38:49 GMT', 'Cache-Control': 's-maxage=86400, public, max-age=86400', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'X-requested-with', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'X-Varnish': '317006243', 'Via': '1.1 varnish (Varnish/5.2)', 'CF-Cache-Status': 'HIT', 'Age': '32385', 'Server': 'cloudflare', 'CF-RAY': '7514fbd2786a58de-TXL'}

And the response body:

In [38]:
response.text[:1000]

'{"totalItems": 2457731, "endIndex": 20, "startIndex": 1, "itemsPerPage": 20, "items": [{"sequence": 11, "county": ["Wayne"], "edition": "Home ed.", "frequency": "Daily", "id": "/lccn/sn88063294/1944-09-12/ed-1/seq-11/", "subject": ["Detroit (Mich.)--Newspapers.", "Michigan--Detroit.--fast--(OCoLC)fst01205010", "Michigan--Wayne County.--fast--(OCoLC)fst01206628", "Wayne County (Mich.)--Newspapers."], "city": ["Detroit"], "date": "19440912", "title": "Detroit evening times.", "end_year": 1958, "note": ["Archived issues are available in digital format from the Library of Congress Chronicling America online collection.", "Description based on: 22nd yr., no. 27 (Nov. 1, 1921).", "Issue called: Michigan centennial ed., May 2, 1937.", "Most Sunday issues for Aug. 6, 1922-Jan. 12, 1958 called: Detroit Sunday times.", "Some issues include supplement called: Goodfellows Special/Old Newsboys ed., published in Dec. of each year <Dce. 16, 1926-Dec. 1948>.", "The word \\"evening\\" appears within o

In [39]:
import json
data = json.loads(response.text)

In [40]:
data

{'totalItems': 2457731,
 'endIndex': 20,
 'startIndex': 1,
 'itemsPerPage': 20,
 'items': [{'sequence': 11,
   'county': ['Wayne'],
   'edition': 'Home ed.',
   'frequency': 'Daily',
   'id': '/lccn/sn88063294/1944-09-12/ed-1/seq-11/',
   'subject': ['Detroit (Mich.)--Newspapers.',
    'Michigan--Detroit.--fast--(OCoLC)fst01205010',
    'Michigan--Wayne County.--fast--(OCoLC)fst01206628',
    'Wayne County (Mich.)--Newspapers.'],
   'city': ['Detroit'],
   'date': '19440912',
   'title': 'Detroit evening times.',
   'end_year': 1958,
   'note': ['Archived issues are available in digital format from the Library of Congress Chronicling America online collection.',
    'Description based on: 22nd yr., no. 27 (Nov. 1, 1921).',
    'Issue called: Michigan centennial ed., May 2, 1937.',
    'Most Sunday issues for Aug. 6, 1922-Jan. 12, 1958 called: Detroit Sunday times.',
    'Some issues include supplement called: Goodfellows Special/Old Newsboys ed., published in Dec. of each year <Dce. 16

We can access the keys of the dictionary:

In [41]:
data.keys()

dict_keys(['totalItems', 'endIndex', 'startIndex', 'itemsPerPage', 'items'])

We can access the values using the square bracket indexing with a key:

In [42]:
data['totalItems']

2457731

While JSON is a simple text format, it can become complex due to

- nesting (JSON inside JSON)
- lists of JSON

An example is our `items`, which has been parsed as a Python `list`:

In [43]:
type(data['items'])

list

`items` is a list of dicts:

In [44]:
item = data['items'][0]
item.keys()

dict_keys(['sequence', 'county', 'edition', 'frequency', 'id', 'subject', 'city', 'date', 'title', 'end_year', 'note', 'state', 'section_label', 'type', 'place_of_publication', 'start_year', 'edition_label', 'publisher', 'language', 'alt_title', 'lccn', 'country', 'ocr_eng', 'batch', 'title_normal', 'url', 'place', 'page'])

We can iterate over both the keys and values as a pair using `items()`.

Below we use a check  a quick check that the value isn't too long before printing:

In [45]:
from collections.abc import Iterable

for k, v in item.items():
    if isinstance(v, Iterable) and len(v) < 100:
        print(f'{k}: {v}')

county: ['Wayne']
edition: Home ed.
frequency: Daily
id: /lccn/sn88063294/1944-09-12/ed-1/seq-11/
subject: ['Detroit (Mich.)--Newspapers.', 'Michigan--Detroit.--fast--(OCoLC)fst01205010', 'Michigan--Wayne County.--fast--(OCoLC)fst01206628', 'Wayne County (Mich.)--Newspapers.']
city: ['Detroit']
date: 19440912
title: Detroit evening times.
note: ['Archived issues are available in digital format from the Library of Congress Chronicling America online collection.', 'Description based on: 22nd yr., no. 27 (Nov. 1, 1921).', 'Issue called: Michigan centennial ed., May 2, 1937.', 'Most Sunday issues for Aug. 6, 1922-Jan. 12, 1958 called: Detroit Sunday times.', 'Some issues include supplement called: Goodfellows Special/Old Newsboys ed., published in Dec. of each year <Dce. 16, 1926-Dec. 1948>.', 'The word "evening" appears within ornament.']
state: ['Michigan']
section_label: 
type: page
place_of_publication: Detroit, Mich
edition_label: REDLINE
publisher: [s.n.]
language: ['English']
alt_ti

Let's finish this exercise by only taking articles that appear between two years, and save those to disk.

Normally you would apply this kind of filtering in the API request - we are going to filter in memory.

First we need a bit of data cleaning of our date, which is an integer representation of time (but is a `str`):

In [46]:
item['date']

'19440912'

Here use `strptime` to convert the integer into a proper datetime:
- ([Python's strftime directives](http://strftime.org/) is very useful!)

In [47]:
from datetime import datetime as dt
dt.strptime(item['date'], "%Y%m%d")

datetime.datetime(1944, 9, 12, 0, 0)

Now let's put this data cleaning & filtering into a pipeline:

In [48]:
term = "germany"
fmt = "json"
url = f"https://chroniclingamerica.loc.gov/search/pages/results/?proxtext={term}&format={fmt}"
res = requests.get(url)
data = res.json()
items = data['items']

start = 1900
extract = []
for item in items:
    item['date'] = dt.strptime(item['date'], "%Y%m%d")
    
    if item['date'].year > start:
        extract.append(item)
        
len(extract)

20

We have a list of dictionaries, which plays very nice with `pandas`:

In [49]:
# !pip install -q pandas
import pandas as pd
df = pd.DataFrame(extract)
df.head(2)

Unnamed: 0,sequence,county,edition,frequency,id,subject,city,date,title,end_year,...,language,alt_title,lccn,country,ocr_eng,batch,title_normal,url,place,page
0,11,[Wayne],Home ed.,Daily,/lccn/sn88063294/1944-09-12/ed-1/seq-11/,"[Detroit (Mich.)--Newspapers., Michigan--Detro...",[Detroit],1944-09-12,Detroit evening times.,1958,...,[English],"[Detroit Sunday times, Detroit times]",sn88063294,Michigan,Where the Battle of Germany Will Be Fought\n|D...,mimtptc_holly_ver03,detroit evening times.,https://chroniclingamerica.loc.gov/lccn/sn8806...,[Michigan--Wayne--Detroit],
1,16,[San Francisco],,Daily,/lccn/sn85066387/1908-11-08/ed-1/seq-16/,[California--San Francisco Bay Area.--fast--(O...,[San Francisco],1908-11-08,The San Francisco call. [volume],1913,...,[English],"[Call, Call-chronicle-examiner, Sunday call]",sn85066387,California,"JACK— he w.45 11111^^ Germany,\nI^ll/ riFSIIOV...",curiv_llano_ver01,san francisco call.,https://chroniclingamerica.loc.gov/lccn/sn8506...,[California--San Francisco--San Francisco],16.0


## Example - downloading images

We can also `requests` to download things other than text - such as images.

Below we do a requests and see we get back a binary string:

In [50]:
url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
res = requests.get(url)
res.text[:100]

'�PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x02 \x00\x00\x00�\x08\x06\x00\x00\x00�#W\x1b\x00\x004�IDATx\x01��\x03�%?\x1e��m۶}����(�Y:��n��z��m۶���w�$=�ח�\u05fb�^����N�g���|��_'

We can use context management to dump the contents of the binary string into a file:

In [51]:
with open('./data/google-logo.png', 'wb') as fi:
    fi.write(res.content)

We now have the Google logo locally (you may need to re-run this cell)

![](./data/google-logo.png)

## Exercise - earthquake API 

Let's as a group write a program to get data from the USGS Earthquake Catalog - [documentation](https://earthquake.usgs.gov/fdsnws/event/1/#methods)

Steps:
- Make a folder (from your current working directory) to hold the earthquake data 
- Investigate the data from the response 
- Save each earthquake as its own JSON in the folder you created (hint - you will need to decide on a name for each earthquake JSON) 

In [52]:
start = "2014-01-01"
end = "2014-01-02"
url = f"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={start}&endtime={end}"

import os
os.makedirs('./data/earthquakes', exist_ok=True)
os.makedirs(f'./data/earthquakes/{start}', exist_ok=True)

In [53]:
response = requests.get(url)

In [54]:
response

<Response [200]>

In [55]:
response.status_code

200

In [56]:
response.headers

{'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Server': 'nginx', 'Date': 'Tue, 27 Sep 2022 14:45:54 GMT', 'Cache-Control': 'public, max-age=60', 'Expires': 'Tue, 27 Sep 2022 14:46:54 GMT', 'Last-Modified': 'Tue, 27 Sep 2022 14:45:54 GMT', 'Access-Control-Allow-Headers': '*', 'Access-Control-Allow-Methods': 'GET, OPTIONS', 'Access-Control-Allow-Origin': '*', 'Access-Control-Max-Age': '86400', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'X-Frame-Options': 'SAMEORIGIN', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Vary': 'Accept-Encoding', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 d158c0069ebae5dc0d0401d105ee9c06.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'TXL52-C1', 'X-Amz-Cf-Id': 'dJuo8isi9x4CJ0e_a_YXU_SOnFzuWDgXU86C-P1DJfqFSAhFi0IZlQ=='}

In [57]:
response.text[:1000]

'{"type":"FeatureCollection","metadata":{"generated":1664289954000,"url":"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02","title":"USGS Earthquakes","status":200,"api":"1.13.6","count":326},"features":[{"type":"Feature","properties":{"mag":1.29,"place":"10km SSW of Idyllwild, CA","time":1388620296020,"updated":1457728844428,"tz":null,"url":"https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890","detail":"https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson","felt":null,"cdi":null,"mmi":null,"alert":null,"status":"reviewed","tsunami":0,"sig":26,"net":"ci","code":"11408890","ids":",ci11408890,","sources":",ci,","types":",cap,focal-mechanism,nearby-cities,origin,phase-data,scitech-link,","nst":39,"dmin":0.06729,"rms":0.09,"gap":51,"magType":"ml","type":"earthquake","title":"M 1.3 - 10km SSW of Idyllwild, CA"},"geometry":{"type":"Point","coordinates":[-116.7776667,33.6633333,11.008]},"id":"ci11408

In [58]:
data = json.loads(response.text)
data

{'type': 'FeatureCollection',
 'metadata': {'generated': 1664289954000,
  'url': 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2014-01-01&endtime=2014-01-02',
  'title': 'USGS Earthquakes',
  'status': 200,
  'api': '1.13.6',
  'count': 326},
 'features': [{'type': 'Feature',
   'properties': {'mag': 1.29,
    'place': '10km SSW of Idyllwild, CA',
    'time': 1388620296020,
    'updated': 1457728844428,
    'tz': None,
    'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890',
    'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson',
    'felt': None,
    'cdi': None,
    'mmi': None,
    'alert': None,
    'status': 'reviewed',
    'tsunami': 0,
    'sig': 26,
    'net': 'ci',
    'code': '11408890',
    'ids': ',ci11408890,',
    'sources': ',ci,',
    'types': ',cap,focal-mechanism,nearby-cities,origin,phase-data,scitech-link,',
    'nst': 39,
    'dmin': 0.06729,
    'rms': 0.09,
    'gap': 51,
 

In [59]:
data.keys()

dict_keys(['type', 'metadata', 'features', 'bbox'])

In [65]:
type(data['features'][0])

dict

In [66]:
feature = data['features'][0]
feature.keys()

dict_keys(['type', 'properties', 'geometry', 'id'])

In [67]:
from collections.abc import Iterable

for k, v in feature.items():
    if isinstance(v, Iterable) and len(v) < 100:
        print(f'{k}: {v}')

type: Feature
properties: {'mag': 1.29, 'place': '10km SSW of Idyllwild, CA', 'time': 1388620296020, 'updated': 1457728844428, 'tz': None, 'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890', 'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson', 'felt': None, 'cdi': None, 'mmi': None, 'alert': None, 'status': 'reviewed', 'tsunami': 0, 'sig': 26, 'net': 'ci', 'code': '11408890', 'ids': ',ci11408890,', 'sources': ',ci,', 'types': ',cap,focal-mechanism,nearby-cities,origin,phase-data,scitech-link,', 'nst': 39, 'dmin': 0.06729, 'rms': 0.09, 'gap': 51, 'magType': 'ml', 'type': 'earthquake', 'title': 'M 1.3 - 10km SSW of Idyllwild, CA'}
geometry: {'type': 'Point', 'coordinates': [-116.7776667, 33.6633333, 11.008]}
id: ci11408890


In [72]:
feature['properties']['time']

1388620296020

In [79]:
url = 'https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890'
url.split('/')[-1]

'ci11408890'

In [85]:
features = data['features']
for feature in features:
    properties = feature['properties']
    
    url = properties['url'].split('/')[-1]
    path = f'./data/earthquakes/{start}/{url}.json'
    with open(path, 'w') as fi:
        json.dump(properties,fi)

In [86]:
# string_date = feature['properties']['time']
# dt.strptime('string_date', "%a %b %d %H:%M:%S %Y")

In [73]:
start = "2014-01-01"
end = "2014-01-02"
url = f"https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime={start}&endtime={end}"
res = requests.get(url)
data = res.json()
features = data['features']

In [74]:
features


[{'type': 'Feature',
  'properties': {'mag': 1.29,
   'place': '10km SSW of Idyllwild, CA',
   'time': 1388620296020,
   'updated': 1457728844428,
   'tz': None,
   'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/ci11408890',
   'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=ci11408890&format=geojson',
   'felt': None,
   'cdi': None,
   'mmi': None,
   'alert': None,
   'status': 'reviewed',
   'tsunami': 0,
   'sig': 26,
   'net': 'ci',
   'code': '11408890',
   'ids': ',ci11408890,',
   'sources': ',ci,',
   'types': ',cap,focal-mechanism,nearby-cities,origin,phase-data,scitech-link,',
   'nst': 39,
   'dmin': 0.06729,
   'rms': 0.09,
   'gap': 51,
   'magType': 'ml',
   'type': 'earthquake',
   'title': 'M 1.3 - 10km SSW of Idyllwild, CA'},
  'geometry': {'type': 'Point',
   'coordinates': [-116.7776667, 33.6633333, 11.008]},
  'id': 'ci11408890'},
 {'type': 'Feature',
  'properties': {'mag': 1.1,
   'place': 'Central Alaska',
   'time': 1388620046501

## Exercise - Wikipedia API

Now for an open-ended exercise for you! Your task is to:
- create a database of countries
- in a folder called `countries` (you will need to make the folder)
- each country in it's own folder
- start with Germany

V1 of your program should:
- save the url you use to request the data
- save the title
- save the `line` parameter of each section (`data['parse']['sections']`)
- save all in a single JSON

V2 of your program should also:
- save all '.png' & '.jpg' images as images, with the url as the image name
- save all external links as CSV

Much of the work will be understanding how the Wikipedia API works - useful resources are below:
- [Main API page](https://www.mediawiki.org/wiki/API:Main_page)
- [What the actions are](https://www.mediawiki.org/w/api.php)
- [Python examples](https://github.com/wikimedia/mediawiki-api-demos/tree/master/python)

Please also feel free to work on another API - happy to assist you with this as well :)

In [45]:
url = f"https://en.wikipedia.org/w/api.php?action=parse&page={term}&format=json"
res = requests.get(url)
data = res.json()

data['parse']['sections']

[{'toclevel': 1,
  'level': '2',
  'line': 'Etymology',
  'number': '1',
  'index': '1',
  'fromtitle': 'Germany',
  'byteoffset': 12477,
  'anchor': 'Etymology'},
 {'toclevel': 1,
  'level': '2',
  'line': 'History',
  'number': '2',
  'index': '2',
  'fromtitle': 'Germany',
  'byteoffset': 15058,
  'anchor': 'History'},
 {'toclevel': 2,
  'level': '3',
  'line': 'Germanic tribes and Frankish Empire',
  'number': '2.1',
  'index': '3',
  'fromtitle': 'Germany',
  'byteoffset': 18483,
  'anchor': 'Germanic_tribes_and_Frankish_Empire'},
 {'toclevel': 2,
  'level': '3',
  'line': 'East Francia and Holy Roman Empire',
  'number': '2.2',
  'index': '4',
  'fromtitle': 'Germany',
  'byteoffset': 22352,
  'anchor': 'East_Francia_and_Holy_Roman_Empire'},
 {'toclevel': 2,
  'level': '3',
  'line': 'German Confederation and Empire',
  'number': '2.3',
  'index': '5',
  'fromtitle': 'Germany',
  'byteoffset': 30396,
  'anchor': 'German_Confederation_and_Empire'},
 {'toclevel': 2,
  'level': '3',