In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import requests
import json

import os
import pandas as pd
import numpy as np

from bs4 import BeautifulSoup

from IPython.core.display import HTML, display

#### Lecture 4a - JSONs and APIs
by Vítek Macháček
October 25, 2022

### Contents

* Standardized data representation
* JSON (+ XMLs)
* Introduction to Requests (GET vs. POST) and APIs


### Goals:
    
* work with data  online/real-time data
* acquisition, processing - > results
* Today introduction and next week a practical example

## Microservice architecture

* Foundation of modern software architecture
* Do one thing and do it well.

![Microservice architecture schema](./img/microservices.png "Microservice Architecture")


## Date exchange formats - JSON, XML

`Language of the internet`

* You can send/receive a message with (almost) any service
* we need system agnostic data format 
* is ediatable in basic editors
* More complex than simple tables
* Highly structured - if you dont follow the rules, you are out
* Both sides need to understand the structure
* only data. It does not do anything - no code to be run
* distributed as text/string (to be precise as `bytes` literals) 
* parsed to objects - easy to work with straight away
* Can be persisted as special files, or some data streams from APIs. 
* Human readable
* Hierarchical
* Can be fetched using standard web APIs

### Purpose

1. Communication 
    * All imaginable communication channels
    * Applications

2. Storing
    * self-descriptive
    * human readable
    * also in DBs - SQL, MongoDB etc.

3. Standardization
    * predictability
    * cooperation
    * spillovers from standardization

### Dimensionality problem

* rich information comes at costs of data complexity 
* to interrelate information, you need to high dimensionality (or A LOT of columns)
* Strongly object-oriented


### 1D:
* logs

### 2D
* tabular data (like pandas DFs)
* SQL

### 3+D:
#### XML (and HTML)
* eXtensible Markup Language is a software- and hardware-independent tool for storing and transporting data.
* Officialy defined at 1998, but its roots are even older.
* XML was designed to carry data - with focus on what data is
* HTML was designed to display data - with focus on what data should look like displayed
* XML tags are not predefined like HTML tags are
* more verbose than JSON
* can have comments !actually a really cool in useful feature!
* used historically as a transaction format in many areas: 
    * Scientific measurements
    * News information
    * Wheather measurements
    * Financial transactions
* Necessary to use XML parser - for example `BeautifulSoup` or `xmltree`
* doc`x`, xls`x`, etc. stands for xml


### JSON
* JavaScript Object Notation
* REST APIs return JSONs
* often *.json* files
* but also used in the web etc.
* supports standard datatypes - strings, integers, floats, lists
* No comments
* More compact, less verbose
* No closing tags
* Used EVERYWHERE, BUT [NOT LICENSED FOR EVIL](https://www.json.org/license.html). If you want to do evil stuff, use XML instead.
* Native in JavaScript and close to native in Python (dictionary)
* Jupyter Notebooks

# JSON

* JSON is similar to combination of `dictionaries` (`object` in JSON-terms)  and `lists` (`arrays`) in Python

In [2]:
teachers = [
    {'name':'Jozef Baruník','titles':['doc.','PhDr.','Ph.D.','Bc.','Mgr.'],'ID':1234,'courses':['JEM005','JEM116','JEM059','JEM061']},
    {'name':'Martin Hronec','titles':['Bc.','Mgr.'],'ID':3421,'courses':['JEM005','JEM207']},
]

courses = {
    "JEM005":{'name':'Advanced Econometrics','ECTS':6,'teachers':[3421,1234]},
    'JEM207':{'name':'Data Processing in Python','ECTS':5,'teachers':[3421]},
    'JEM116':{'name':'Applied Econometrics','ECTS':6,'teachers':[1234]},
    'JEM059':{'name':'Quantitative Finance I.','ECTS':6,'teachers':[1234,5678]},
    'JEM061':{'name':'Quantitative Finance II.','ECTS':6,'teachers':[1234,5678]}
}
jsondata = {'teachers':teachers,'courses':courses}
jsondata

{'teachers': [{'name': 'Jozef Baruník',
   'titles': ['doc.', 'PhDr.', 'Ph.D.', 'Bc.', 'Mgr.'],
   'ID': 1234,
   'courses': ['JEM005', 'JEM116', 'JEM059', 'JEM061']},
  {'name': 'Martin Hronec',
   'titles': ['Bc.', 'Mgr.'],
   'ID': 3421,
   'courses': ['JEM005', 'JEM207']}],
 'courses': {'JEM005': {'name': 'Advanced Econometrics',
   'ECTS': 6,
   'teachers': [3421, 1234]},
  'JEM207': {'name': 'Data Processing in Python',
   'ECTS': 5,
   'teachers': [3421]},
  'JEM116': {'name': 'Applied Econometrics', 'ECTS': 6, 'teachers': [1234]},
  'JEM059': {'name': 'Quantitative Finance I.',
   'ECTS': 6,
   'teachers': [1234, 5678]},
  'JEM061': {'name': 'Quantitative Finance II.',
   'ECTS': 6,
   'teachers': [1234, 5678]}}}

is this a valid JSON?

https://jsonformatter.curiousconcept.com/

In [3]:
js = json.dumps(
    jsondata, indent=4, ensure_ascii = False
) #json formatted string!

print(js)

{
    "teachers": [
        {
            "name": "Jozef Baruník",
            "titles": [
                "doc.",
                "PhDr.",
                "Ph.D.",
                "Bc.",
                "Mgr."
            ],
            "ID": 1234,
            "courses": [
                "JEM005",
                "JEM116",
                "JEM059",
                "JEM061"
            ]
        },
        {
            "name": "Martin Hronec",
            "titles": [
                "Bc.",
                "Mgr."
            ],
            "ID": 3421,
            "courses": [
                "JEM005",
                "JEM207"
            ]
        }
    ],
    "courses": {
        "JEM005": {
            "name": "Advanced Econometrics",
            "ECTS": 6,
            "teachers": [
                3421,
                1234
            ]
        },
        "JEM207": {
            "name": "Data Processing in Python",
            "ECTS": 5,
            "teachers": [
                3

# Reading data using `requests` library

* API = Application Programming Interface
* more specifically: http based APIs

### When to use?
* whenever more applications need to communicate - 
    * DB speaks to app
    * accounting system communicates with inventory system
    * Google Maps need to get info about local public transport
    * ML-based BitCoin price prediction to be used to facilitate automatic trading
    *
* user-friendly interface for complicated tasks - DEEP AI, Google Maps
* Data - Golemio, OpenStreetMaps

## HTTP request

* A most standard webserver communication channel around
* `Client` asks/requests questions - **requests**
* `Server` replies/serve answers - **responses**

### HTTP request structure:
* URL
    * domain
    * route
    * parameters
* Request Type - GET, POST, PUT, DELETE
* Request Header
    * authentication
    * cookies
    * other metadata
* Outcoming data (will see below)
   
### HTTP response structure
* Header 
    * cookies
    * other metadata - responding server, dates, 
* Status Code:
    * 200 - success
    * 404 - resource does not exist
    * 500 - the server failed during processing your request
* Content
    * text - JSON, HTML etc.
    * file

### API types
1) REST API - use HTTP request and returns JSON
2) SOAP API - use HTTP request and returns XML
3) Website - use HTTP request and returns set of HTML, JavaScript, CSS and other files


### GET request
* fast
* public
* data flow only one direction
* parameters via request adress

### POST request
* slow
* private
* both sides can send data

### The simplest request

In [4]:
import requests

In [74]:
r = requests.get('https://www.google.com/')
print(r.text)
type(r.json)

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="cs"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="81KrC11B8c38EHWcQrt3_Q">(function(){window.google={kEI:'33oZZOnJJYGJ8gK9sIXADQ',kEXPI:'0,1303430,55979,6058,207,4804,2316,383,246,5,1129120,1197695,706,171,379918,16112,28687,22431,1361,12318,2817,14764,4998,13228,3847,38444,885,1987,2891,561,7788,3405,606,54144,4142,2404,2614,3784,9358,3,576,6459,14124,4,1528,2304,42126,13659,4437,9358,7428,5797,2560,4094,7596,1,42154,2,14022,25739,5679,1020,31122,4568,6259,23418,1252,5835,14968,4332,7484,25076,2006,5895,2260,7381,15970,873,13462,14,6157,7,1922,5784,3995,21390,389,14375,6305,2007,18191,6050,14183,20206,1622,1778,4977,8048,330,6513,4097,4384,991,3030,426,5685,1410,890,2110,5295,1804,6250,1303,676,1150,1089,1653,108,96,1032,449,436,12592,3651,1985,4,623,537,

method

r = requests.get('https://www.google.com/')
display(HTML(r.text))

## "Real-world" APIs

### Sreality

* surprisingly no need for authentication
https://www.sreality.cz/hledani/prodej/byty/praha

In [6]:
r = requests.get('https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&locality_region_id=10&per_page=20&tms=1678732084920')
r.text

'{"meta_description": "5866 realit v nab\\u00eddce prodej byt\\u016f Praha. Vyberte si novou nemovitost na sreality.cz s hled\\u00e1n\\u00edm na map\\u011b a velk\\u00fdmi n\\u00e1hledy fotografi\\u00ed nab\\u00edzen\\u00fdch byt\\u016f.", "result_size": 5866, "_embedded": {"estates": [{"labelsReleased": [["partly_furnished"], []], "has_panorama": 0, "labels": ["\\u010c\\u00e1ste\\u010dn\\u011b vybaven\\u00fd"], "is_auction": false, "labelsAll": [["personal", "brick", "cellar", "partly_furnished"], ["playground", "natural_attraction", "small_shop", "candy_shop", "vet", "tavern", "theater", "movies", "sightseeing", "post_office", "restaurant", "bus_public_transport", "shop", "kindergarten", "school", "drugstore", "metro", "atm", "train", "sports", "medic", "tram"]], "seo": {"category_main_cb": 1, "category_sub_cb": 6, "category_type_cb": 1, "locality": "praha-vinohrady-borivojova"}, "exclusively_at_rk": 0, "category": 1, "has_floor_plan": 0, "_embedded": {"favourite": {"is_favourite": f

In [7]:
type(r.json())

dict

In [13]:
r = requests.get('https://www.sreality.cz/api/cs/v2/estates?category_main_cb=1&category_type_cb=1&locality_region_id=10')
r.text

'{"meta_description": "5865 realit v nab\\u00eddce prodej byt\\u016f Praha. Vyberte si novou nemovitost na sreality.cz s hled\\u00e1n\\u00edm na map\\u011b a velk\\u00fdmi n\\u00e1hledy fotografi\\u00ed nab\\u00edzen\\u00fdch byt\\u016f.", "result_size": 5865, "_embedded": {"estates": [{"labelsReleased": [["after_reconstruction", "terrace"], ["post_office"]], "has_panorama": 0, "labels": ["Po rekonstrukci", "Terasa", "Po\\u0161ta 3 min. p\\u011b\\u0161ky"], "is_auction": false, "labelsAll": [["personal", "after_reconstruction", "balcony", "terrace", "cellar", "garage"], ["vet", "candy_shop", "small_shop", "playground", "tavern", "theater", "movies", "sightseeing", "atm", "shop", "restaurant", "tram", "drugstore", "metro", "medic", "sports", "kindergarten", "school", "post_office", "bus_public_transport", "train"]], "seo": {"category_main_cb": 1, "category_sub_cb": 9, "category_type_cb": 1, "locality": "praha-mala-strana-"}, "exclusively_at_rk": 0, "category": 1, "has_floor_plan": 1, "_

In [14]:
d = r.json()

In [15]:
d.keys()

dict_keys(['meta_description', 'result_size', '_embedded', 'filterLabels', 'title', 'filter', '_links', 'locality', 'locality_dativ', 'logged_in', 'per_page', 'category_instrumental', 'page', 'filterLabels2'])

In [22]:
pd.json_normalize(d['_embedded']['estates'])

Unnamed: 0,labelsReleased,has_panorama,labels,is_auction,labelsAll,exclusively_at_rk,category,has_floor_plan,paid_logo,locality,...,price_czk.unit,price_czk.name,_links.dynamicDown,_links.dynamicUp,_links.iterator.href,_links.self.href,_links.images,_links.image_middle2,gps.lat,gps.lon
0,"[[after_reconstruction, terrace], [post_office]]",0,"[Po rekonstrukci, Terasa, Pošta 3 min. pěšky]",False,"[[personal, after_reconstruction, balcony, ter...",0,1,1,1,Praha 1 - Malá Strana,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/0?category_main_cb=1&su...,/cs/v2/estates/3988821580?region_tip=2453447,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.079033,14.415553
1,"[[partly_furnished], []]",0,[Částečně vybavený],False,"[[personal, balcony, cellar, elevator, parking...",1,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,/cs/v2/estate-iterator/1?category_main_cb=1&su...,/cs/v2/estates/414307404,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,50.060196,14.394708
2,"[[partly_furnished], []]",0,[Částečně vybavený],False,"[[new_building, personal, cellar, elevator, pa...",1,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,/cs/v2/estate-iterator/2?category_main_cb=1&su...,/cs/v2/estates/2146882636,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,50.060196,14.394708
3,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick], [pl...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,/cs/v2/estate-iterator/3?category_main_cb=1&su...,/cs/v2/estates/500053068,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,50.05951,14.3938
4,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick, park...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,/cs/v2/estate-iterator/4?category_main_cb=1&su...,/cs/v2/estates/4158534732,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,50.05951,14.3938
5,"[[], []]",0,[],False,"[[new_building, personal, brick, cellar, parki...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,/cs/v2/estate-iterator/5?category_main_cb=1&su...,/cs/v2/estates/9188428,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,50.05951,14.3938
6,"[[], []]",0,[],False,"[[new_building, personal, loggia, brick, cella...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,/cs/v2/estate-iterator/6?category_main_cb=1&su...,/cs/v2/estates/311178316,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,50.05951,14.3938
7,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick, cell...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,/cs/v2/estate-iterator/7?category_main_cb=1&su...,/cs/v2/estates/4220269644,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,50.05951,14.3938
8,"[[terrace], []]",0,[Terasa],False,"[[new_building, personal, terrace, brick, gara...",0,1,0,0,Praha 5 - Košíře,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/8?category_main_cb=1&su...,/cs/v2/estates/3170002764,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.054776,14.377124
9,"[[], [metro]]",0,[Metro 7 min. pěšky],False,"[[new_building, personal, loggia, elevator], [...",1,1,1,1,Praha 5,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/9?category_main_cb=1&su...,/cs/v2/estates/1900144460,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.047051,14.417275


In [23]:
pd.json_normalize(d['_embedded']['estates'])

Unnamed: 0,labelsReleased,has_panorama,labels,is_auction,labelsAll,exclusively_at_rk,category,has_floor_plan,paid_logo,locality,...,price_czk.unit,price_czk.name,_links.dynamicDown,_links.dynamicUp,_links.iterator.href,_links.self.href,_links.images,_links.image_middle2,gps.lat,gps.lon
0,"[[after_reconstruction, terrace], [post_office]]",0,"[Po rekonstrukci, Terasa, Pošta 3 min. pěšky]",False,"[[personal, after_reconstruction, balcony, ter...",0,1,1,1,Praha 1 - Malá Strana,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/0?category_main_cb=1&su...,/cs/v2/estates/3988821580?region_tip=2453447,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.079033,14.415553
1,"[[partly_furnished], []]",0,[Částečně vybavený],False,"[[personal, balcony, cellar, elevator, parking...",1,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,/cs/v2/estate-iterator/1?category_main_cb=1&su...,/cs/v2/estates/414307404,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,50.060196,14.394708
2,"[[partly_furnished], []]",0,[Částečně vybavený],False,"[[new_building, personal, cellar, elevator, pa...",1,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,/cs/v2/estate-iterator/2?category_main_cb=1&su...,/cs/v2/estates/2146882636,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_gY_...,50.060196,14.394708
3,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick], [pl...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,/cs/v2/estate-iterator/3?category_main_cb=1&su...,/cs/v2/estates/500053068,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QK_...,50.05951,14.3938
4,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick, park...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,/cs/v2/estate-iterator/4?category_main_cb=1&su...,/cs/v2/estates/4158534732,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,50.05951,14.3938
5,"[[], []]",0,[],False,"[[new_building, personal, brick, cellar, parki...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,/cs/v2/estate-iterator/5?category_main_cb=1&su...,/cs/v2/estates/9188428,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QI_...,50.05951,14.3938
6,"[[], []]",0,[],False,"[[new_building, personal, loggia, brick, cella...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,/cs/v2/estate-iterator/6?category_main_cb=1&su...,/cs/v2/estates/311178316,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QJ_...,50.05951,14.3938
7,"[[], []]",0,[],False,"[[new_building, personal, balcony, brick, cell...",0,1,1,1,Praha 5 - Smíchov,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,/cs/v2/estate-iterator/7?category_main_cb=1&su...,/cs/v2/estates/4220269644,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QL_...,50.05951,14.3938
8,"[[terrace], []]",0,[Terasa],False,"[[new_building, personal, terrace, brick, gara...",0,1,0,0,Praha 5 - Košíře,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/8?category_main_cb=1&su...,/cs/v2/estates/3170002764,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.054776,14.377124
9,"[[], [metro]]",0,[Metro 7 min. pěšky],False,"[[new_building, personal, loggia, elevator], [...",1,1,1,1,Praha 5,...,,Celková cena,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,/cs/v2/estate-iterator/9?category_main_cb=1&su...,/cs/v2/estates/1900144460,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,[{'href': 'https://d18-a.sdn.cz/d_18/c_img_QR_...,50.047051,14.417275


### World Bank

#### Exploratory request

In [24]:
d = requests.get('http://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?format=json&per_page=100').json()

In [38]:
d[0]

{'page': 1,
 'pages': 165,
 'per_page': 100,
 'total': 16492,
 'sourceid': '2',
 'sourcename': 'World Development Indicators',
 'lastupdated': '2023-03-01'}

#### Look at the data in the dataframe

In [39]:
pd.json_normalize(d[1])

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2021,702976832,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
1,AFE,2020,685112705,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
2,AFE,2019,667242712,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
3,AFE,2018,649756874,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
4,AFE,2017,632746296,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
95,AFW,1988,195969722,,,0,SP.POP.TOTL,"Population, total",ZI,Africa Western and Central
96,AFW,1987,190759952,,,0,SP.POP.TOTL,"Population, total",ZI,Africa Western and Central
97,AFW,1986,185720244,,,0,SP.POP.TOTL,"Population, total",ZI,Africa Western and Central
98,AFW,1985,180817312,,,0,SP.POP.TOTL,"Population, total",ZI,Africa Western and Central


### More advanced example: Paging

#### Return to Python basics 1: Errors and exceptions in Python

In [42]:
for i in range(-5,5):
    try: 
        print(5/i)
    except ZeroDivisionError:
        print(0)

-1.0
-1.25
-1.6666666666666667
-2.5
-5.0
0
5.0
2.5
1.6666666666666667
1.25


In [44]:
for i in range(-5,5):
    try:
        print(1/i)
    except:
        print(f'dividing with {i} raised an error. Are you sure your input was correct?')

-0.2
-0.25
-0.3333333333333333
-0.5
-1.0
dividing with 0 raised an error. Are you sure your input was correct?
1.0
0.5
0.3333333333333333
0.25


Return to Python basics 2: Formatting strings

In [45]:
my_name = 'Vítek'

f'Hello {my_name}!'

'Hello Vítek!'

In [47]:
string_template = 'Today {teachers_name} is teaching and he is in the {teachers_mood} mood'

string_template.format(
    teachers_name='Vítek',
    teachers_mood='good'
)

'Today Vítek is teaching and he is in the good mood'

Sending API requests is always risky - you do not control the other side of the transaction

Try listing first ten pages of results in the request

Always check if everything goes fine by checking the request status code

In [52]:
l = []
url_template = 'http://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?format=json&page={page_num}'

for i in range(1,10):
    r = requests.get(url_template.format(page_num=i))    
    if r.ok: #r.status_code == 200 would also work!
        l.append(r.json())




OK, but you still have imply a strong confidence on the other side, try and except is more certain

In [53]:
l = []
url_template = 'http://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?format=json&page={page_num}'

for i in range(1,10):
    url = url_template.format(page_num=i)
    try:
        r = requests.get(url)    
        if r.ok: #r.status_code == 200 would also work!
            l.append(r.json())
    except:
        print(f'At least I want to know that something went wrong and when. Url: {url}')

OMG, this looks a bit messy. I would consider a writing function to increase clarity

In [70]:
def request_worldbank(url):
    try:
        r = requests.get(url)    
        if r.ok: #r.status_code == 200 would also work!
            return r.json()
        else:
            print(f'The url request on {url} not succesful. Status code: {r.status_code}. Message: {r.message}')
    except:
        print(f'At least I want to know that something went wrong and when. Url: {url}')

l = []
url_template = 'http://api.worldbank.org/v2/country/all/indicator/SP.POP.TOTL?format=json&page={page_num}'
for i in range(1,10):
    l.append(request_worldbank(url_template.format(page_num=i)))

In [73]:
def request(url):
    try:
        r=request.get(url)
        if r.ok:
            return r.json()
        else:
            print(f'The url request on {url} not successful. Status code: {r.status_code}. Message: {r.message}')
    except:
        print(f'Something went wrong')



In [56]:
l = [request_worldbank(url_template.format(page_num=i)) for i in range (1,7)]

pd.concat([pd.json_normalize(output[1]) for output in l])

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2021,702976832,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
1,AFE,2020,685112705,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
2,AFE,2019,667242712,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
3,AFE,2018,649756874,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
4,AFE,2017,632746296,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
45,CEB,1974,101939916,,,0,SP.POP.TOTL,"Population, total",B8,Central Europe and the Baltics
46,CEB,1973,101112680,,,0,SP.POP.TOTL,"Population, total",B8,Central Europe and the Baltics
47,CEB,1972,100357161,,,0,SP.POP.TOTL,"Population, total",B8,Central Europe and the Baltics
48,CEB,1971,99635258,,,0,SP.POP.TOTL,"Population, total",B8,Central Europe and the Baltics


In [55]:
l = [request_worldbank(url_template.format(page_num=i)) for i in range(1,10)]

pd.concat([pd.json_normalize(output_json[1]) for output_json in l])

Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2021,702976832,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
1,AFE,2020,685112705,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
2,AFE,2019,667242712,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
3,AFE,2018,649756874,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
4,AFE,2017,632746296,,,0,SP.POP.TOTL,"Population, total",ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
45,EAP,2010,1969241762,,,0,SP.POP.TOTL,"Population, total",4E,East Asia & Pacific (excluding high income)
46,EAP,2009,1955097471,,,0,SP.POP.TOTL,"Population, total",4E,East Asia & Pacific (excluding high income)
47,EAP,2008,1940824032,,,0,SP.POP.TOTL,"Population, total",4E,East Asia & Pacific (excluding high income)
48,EAP,2007,1926362850,,,0,SP.POP.TOTL,"Population, total",4E,East Asia & Pacific (excluding high income)


In [57]:
def download_worldbank(indicator):
    url_template = 'http://api.worldbank.org/v2/country/all/indicator/{indicator}?format=json&page={page}&per_page=500'
    first_request = requests.get(url_template.format(indicator=indicator,page=1)).json()
        
    pages = first_request[0]['pages']
    
    def single_worldbank_request(url):
        try:
            r = requests.get(url)
            if r.ok:
                return pd.json_normalize(r.json()[1])
        except Exception as e:
            print(f'Could not parse an URL {url}. Read the message: {e.msg}')
    
    first_data = pd.json_normalize(first_request[1])
    
    l = [single_worldbank_request(url_template.format(indicator=indicator,page=page)) for page in range(2,pages+1)]

    return pd.concat([first_data] + l).set_index(['countryiso3code','date']).value
        
population = download_worldbank('SP.POP.TOTL')
population

countryiso3code  date
AFE              2021    702976832.0
                 2020    685112705.0
                 2019    667242712.0
                 2018    649756874.0
                 2017    632746296.0
                            ...     
ZWE              1964      4310332.0
                 1963      4177931.0
                 1962      4049778.0
                 1961      3925952.0
                 1960      3806310.0
Name: value, Length: 16492, dtype: float64

### Eurostat

In [58]:
from io import StringIO
r_gdp = requests.get('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/NAMA_10_GDP?format=SDMX-CSV')
gdp = pd.read_csv(StringIO(r_gdp.text))
gdp

Unnamed: 0,DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
0,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,CLV05_MEUR,B1G,AT,1995,177617.0,
1,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,CLV05_MEUR,B1G,AT,1996,180999.4,
2,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,CLV05_MEUR,B1G,AT,1997,184799.1,
3,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,CLV05_MEUR,B1G,AT,1998,192025.4,
4,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,CLV05_MEUR,B1G,AT,1999,198247.2,
...,...,...,...,...,...,...,...,...,...
842372,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,PYP_MNAC,YA1,XK,2017,0.0,
842373,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,PYP_MNAC,YA1,XK,2018,0.0,
842374,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,PYP_MNAC,YA1,XK,2019,0.0,
842375,ESTAT:NAMA_10_GDP(1.0),17/03/23 23:00:00,A,PYP_MNAC,YA1,XK,2020,0.0,


In [59]:
gdp.unit.unique()

array(['CLV05_MEUR', 'CLV05_MNAC', 'CLV10_MEUR', 'CLV10_MNAC',
       'CLV15_MEUR', 'CLV15_MNAC', 'CLV_I05', 'CLV_I10', 'CLV_I15',
       'CLV_PCH_PRE', 'CON_PPCH_PRE', 'CP_MEUR', 'CP_MNAC',
       'CP_MPPS_EU27_2020', 'PC_EU27_2020_MEUR_CP',
       'PC_EU27_2020_MPPS_CP', 'PC_GDP', 'PD05_EUR', 'PD05_NAC',
       'PD10_EUR', 'PD10_NAC', 'PD15_EUR', 'PD15_NAC', 'PD_PCH_PRE_EUR',
       'PD_PCH_PRE_NAC', 'PYP_MEUR', 'PYP_MNAC'], dtype=object)

In [60]:
gdp.na_item.unique()

array(['B1G', 'B1GQ', 'D21', 'D21X31', 'D31', 'P3', 'P31_S13', 'P31_S14',
       'P31_S14_S15', 'P31_S15', 'P32_S13', 'P3_P5', 'P3_P6', 'P3_S13',
       'P41', 'P51G', 'P5G', 'P6', 'P61', 'P62', 'P7', 'P71', 'P72',
       'P52_P53', 'B11', 'B111', 'B112', 'B2A3G', 'D1', 'D11', 'D12',
       'D2', 'D2X3', 'D3', 'P52', 'P53', 'YA0', 'YA1', 'YA2'],
      dtype=object)

In [61]:
geo

NameError: name 'geo' is not defined

In [62]:
gdp['freq'].unique()

array(['A'], dtype=object)

### Twitter

### Scopus

In [None]:
from secret import SCOPUS_API_KEY
r = requests.get('https://api.elsevier.com/content/search/scopus?query=AUTH(baruník, j.)  ',headers={'Accept':'application/json','X-ELS-APIKey': SCOPUS_API_KEY})

pd.json_normalize(r.json()['search-results']['entry'])

### XML or even HTML data

In [63]:
response = requests.get('https://en.wikipedia.org/wiki/Charles_University')
soup = BeautifulSoup(response.text)
div = soup.find('div',{'id':'mw-content-text'}) #  #mw-content-text > div > p:nth-child(10)texts)
article = ' '.join([p.text for p in div.find_all('p')])
print(article)

Charles University (Czech: Univerzita Karlova, UK; Latin: Universitas Carolina; German: Karls-Universität), also known as Charles University in Prague or historically as the University of Prague (Latin: Universitas Pragensis), is the oldest and largest university in the Czech Republic.[2] It is one of the oldest universities in Europe in continuous operation.[3] Today, the university consists of 17 faculties located in Prague, Hradec Králové, and Plzeň. Charles University belongs among the top three universities in Central and Eastern Europe.[4][5] It is ranked around 200–300 in the world.[6][7]
 The establishment of a medieval university in Prague was inspired by Holy Roman Emperor Charles IV.[8] He asked his friend and ally, Pope Clement VI, to do so. On 26 January 1347 the pope issued the bull establishing a university in Prague, modeled on the University of Paris, with the full (4) number of faculties, that is including a theological faculty. On 7 April 1348 Charles, the king of Bo

# Bonus example:

<img src="http://ies.fsv.cuni.cz/default/file/get/id/31996" height="500" width="300">

Will not work without authentication.

* You will need IAM account for Amazon Web Service 
* For that you can create `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`. See here: https://aws.amazon.com/premiumsupport/knowledge-center/create-access-key/
* create `secret.py` file and put `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`. Follow the template of `secret-example.py`

In [64]:
!pip install boto3



In [65]:
import boto3

In [66]:
from secret import AWS_ACCESS_KEY, AWS_SECRET_KEY

client=boto3.client('rekognition', 
                    region_name='us-west-2',
                    aws_access_key_id=AWS_ACCESS_KEY,
                    aws_secret_access_key=AWS_SECRET_KEY
)

with open('./img/iespic.jpeg','rb') as f:
    response = client.recognize_celebrities(Image={'Bytes': f.read()})
pd.DataFrame(response['UnrecognizedFaces'][0]['Emotions']).set_index('Type').Confidence.plot.bar()

ModuleNotFoundError: No module named 'secret'

In [None]:
response