# Agenda

- Assignment feedback
- Generators
- Requests
- Multiprocessing

## Requests

* URLs
* HTTP protocol
  * HTTPS?
* Requests
  * CLI
  * Python
* API keys

In case the `requests` module is not installed on your system, install it with:

```bash
pip install requests 
```

However, it should be part of the Anaconda distribution.

This material is based on http://docs.python-requests.org/en/master/user/quickstart/#quickstart and on chapter 17 in *Python Crash Course*, Eric Matthes.


We will have a look at the following *application programming interfaces (API)*:

  * http://openweathermap.org/api
  * https://developer.github.com/v3/

## URLs?

Usually, our URLs for working with remote *Representational state transfer (REST)* APIs consist of a host, a path and a query. In this tutorial we will work with the *Hypertext Transfer Protocol (HTTP)* only. 


```
                    hierarchical part
        ┌───────────────────┴─────────────────────┐
                    authority               path
        ┌───────────────┴───────────────┐┌───┴────┐
  abc://username:password@example.com:123/path/data?key=value&key2=value2#fragid1
  └┬┘   └───────┬───────┘ └────┬────┘ └┬┘           └─────────┬─────────┘ └──┬──┘
scheme  user information     host     port                  query         fragment
```


The example above is from https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples.

## HTTP protocol

Requests 
```HTTP
GET /index.html HTTP/1.1
Host: www.example.com
```

Response 
```HTTP
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 138
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
ETag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Connection: close

<html>
<head>
  <title>An Example Page</title>
</head>
<body>
  Hello World, this is a very simple HTML document.
</body>
</html>
```

# Working with APIs on the CLI

On the CLI in a Unix environment, you have usually access either to `curl` or to `wget`. Both are similar and allow -amongst others- to interact with HTTP-based REST APIs.

In [1]:
%%bash
curl https://api.github.com/search/repositories?q=language:python&sort=stars

{
  "total_count": 3575153,
  "incomplete_results": false,
  "items": [
    {
      "id": 21289110,
      "node_id": "MDEwOlJlcG9zaXRvcnkyMTI4OTExMA==",
      "name": "awesome-python",
      "full_name": "vinta/awesome-python",
      "private": false,
      "owner": {
        "login": "vinta",
        "id": 652070,
        "node_id": "MDQ6VXNlcjY1MjA3MA==",
        "avatar_url": "https://avatars2.githubusercontent.com/u/652070?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/vinta",
        "html_url": "https://github.com/vinta",
        "followers_url": "https://api.github.com/users/vinta/followers",
        "following_url": "https://api.github.com/users/vinta/following{/other_user}",
        "gists_url": "https://api.github.com/users/vinta/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/vinta/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/vinta/subscriptions",
        "organizations_url": "https:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0100  174k  100  174k    0     0  80241      0  0:00:02  0:00:02 --:--:-- 80250


In [2]:
%%bash

wget -O - https://api.github.com/search/repositories?q=language:python&sort=stars

{
  "total_count": 3575153,
  "incomplete_results": false,
  "items": [
    {
      "id": 21289110,
      "node_id": "MDEwOlJlcG9zaXRvcnkyMTI4OTExMA==",
      "name": "awesome-python",
      "full_name": "vinta/awesome-python",
      "private": false,
      "owner": {
        "login": "vinta",
        "id": 652070,
        "node_id": "MDQ6VXNlcjY1MjA3MA==",
        "avatar_url": "https://avatars2.githubusercontent.com/u/652070?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/vinta",
        "html_url": "https://github.com/vinta",
        "followers_url": "https://api.github.com/users/vinta/followers",
        "following_url": "https://api.github.com/users/vinta/following{/other_user}",
        "gists_url": "https://api.github.com/users/vinta/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/vinta/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/vinta/subscriptions",
        "organizations_url": "https:

--2019-03-12 15:01:50--  https://api.github.com/search/repositories?q=language:python
Resolving api.github.com (api.github.com)... 192.30.253.116, 192.30.253.117
Connecting to api.github.com (api.github.com)|192.30.253.116|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 178236 (174K) [application/json]
Saving to: ‘STDOUT’

     0K .......... .......... .......... .......... .......... 28%  230K 1s
    50K .......... .......... .......... .......... .......... 57%  461K 0s
   100K .......... .......... .......... .......... .......... 86% 89.0M 0s
   150K .......... .......... ....                            100% 38.9M=0.3s

2019-03-12 15:01:52 (533 KB/s) - written to stdout [178236/178236]



# Working with APIs from Python
 

## Make a Request

For this tutorial we are mostly collecting information with HTTPs `GET` request. Similarly, the `requests` module supports HTTP `POST`, `PUT`, `DELETE`, `HEAD`, and `OPTIONS` via corresponding functions in `requests`. See http://docs.python-requests.org/en/master/user/quickstart/#make-a-request for more details.

You can access the status code of an HTTP request via the `status_code` attribute. 

In [3]:
import requests


url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'

r = requests.get(url)
print(type(r))
print(r.url)
print(r.status_code)
print(r.json())

results = r.json()['items']

<class 'requests.models.Response'>
https://api.github.com/search/repositories?q=language:python&sort=stars
200
{'items': [{'created_at': '2014-06-27T21:00:06Z', 'forks_url': 'https://api.github.com/repos/vinta/awesome-python/forks', 'keys_url': 'https://api.github.com/repos/vinta/awesome-python/keys{/key_id}', 'updated_at': '2019-03-12T13:29:05Z', 'ssh_url': 'git@github.com:vinta/awesome-python.git', 'issue_events_url': 'https://api.github.com/repos/vinta/awesome-python/issues/events{/number}', 'labels_url': 'https://api.github.com/repos/vinta/awesome-python/labels{/name}', 'trees_url': 'https://api.github.com/repos/vinta/awesome-python/git/trees{/sha}', 'stargazers_count': 63848, 'collaborators_url': 'https://api.github.com/repos/vinta/awesome-python/collaborators{/collaborator}', 'owner': {'subscriptions_url': 'https://api.github.com/users/vinta/subscriptions', 'url': 'https://api.github.com/users/vinta', 'gravatar_id': '', 'login': 'vinta', 'node_id': 'MDQ6VXNlcjY1MjA3MA==', 'site_a

In [4]:
len(results)

30

In [5]:
summary = [(el['full_name'], el['stargazers_count'], el['html_url'], 
            el['description']) for el in results[:10]]

for name, stars, url, desc in summary:
    print(name)
    print(stars)
    print(url)
    print(desc)
    print('---------------')

vinta/awesome-python
63848
https://github.com/vinta/awesome-python
A curated list of awesome Python frameworks, libraries, software and resources
---------------
donnemartin/system-design-primer
59268
https://github.com/donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.
---------------
toddmotto/public-apis
53341
https://github.com/toddmotto/public-apis
A collective list of free APIs for use in software and web development.
---------------
tensorflow/models
49519
https://github.com/tensorflow/models
Models and examples built with TensorFlow
---------------
ytdl-org/youtube-dl
48086
https://github.com/ytdl-org/youtube-dl
Command-line program to download videos from YouTube.com and other video sites
---------------
pallets/flask
42526
https://github.com/pallets/flask
The Python micro framework for building web applications.
---------------
nvbn/thefuck
41535
https://github.com/nvbn/thefuck
Magnificent

## Passing Parameters In URLs

For getting the weather forecast, we need to specify, for example for which place we forecast, in which format we want to receive the response, etc. All those paramters are passed as a dictionary into the `params` keyword argument.

In [6]:
import json
import api_keys
import requests
import importlib
importlib.reload(api_keys)

url = "http://api.openweathermap.org/data/2.5/forecast"
query = {
         'q': 'Copenhagen,dk', 
         'mode': 'json',                       
         'units': 'metric',
         'appid': api_keys.OPENWEATHERMAP_KEY
        }
r = requests.get(url, params=query)

r.json()

{'city': {'coord': {'lat': 55.6867, 'lon': 12.5701},
  'country': 'DK',
  'id': 2618425,
  'name': 'Copenhagen',
  'population': 15000},
 'cnt': 40,
 'cod': '200',
 'list': [{'clouds': {'all': 92},
   'dt': 1552402800,
   'dt_txt': '2019-03-12 15:00:00',
   'main': {'grnd_level': 1005.03,
    'humidity': 90,
    'pressure': 1007.2,
    'sea_level': 1007.2,
    'temp': 4.59,
    'temp_kf': -0.07,
    'temp_max': 4.66,
    'temp_min': 4.59},
   'rain': {'3h': 0.2875},
   'snow': {'3h': 0.0005},
   'sys': {'pod': 'd'},
   'weather': [{'description': 'light rain',
     'icon': '10d',
     'id': 500,
     'main': 'Rain'}],
   'wind': {'deg': 188.5, 'speed': 9.11}},
  {'clouds': {'all': 92},
   'dt': 1552413600,
   'dt_txt': '2019-03-12 18:00:00',
   'main': {'grnd_level': 1000.98,
    'humidity': 95,
    'pressure': 1003.16,
    'sea_level': 1003.16,
    'temp': 4.12,
    'temp_kf': -0.05,
    'temp_max': 4.18,
    'temp_min': 4.12},
   'rain': {'3h': 1.0475},
   'snow': {},
   'sys': {'pod

## Response Content

### As Text

`requests` will automatically decode content from the server. Most unicode charsets are seamlessly decoded.

When you make a request, `requests` makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by `requests` is used when you access `r.text`.

In [7]:
import requests

# A call to the Github timeline
r = requests.get('https://api.github.com/events')
# response encoding
print(r.encoding)
# response content
print(r.text)

utf-8
[{"id":"9226853303","type":"DeleteEvent","actor":{"id":873258,"login":"wagnerand","display_login":"wagnerand","gravatar_id":"","url":"https://api.github.com/users/wagnerand","avatar_url":"https://avatars.githubusercontent.com/u/873258?"},"repo":{"id":42436885,"name":"mozilla/addons-linter","url":"https://api.github.com/repos/mozilla/addons-linter"},"payload":{"ref":"renovate/source-map-support-0.x","ref_type":"branch","pusher_type":"user"},"public":true,"created_at":"2019-03-12T13:56:54Z","org":{"id":131524,"login":"mozilla","gravatar_id":"","url":"https://api.github.com/orgs/mozilla","avatar_url":"https://avatars.githubusercontent.com/u/131524?"}},{"id":"9226853293","type":"PushEvent","actor":{"id":3335667,"login":"jfwwlong","display_login":"jfwwlong","gravatar_id":"","url":"https://api.github.com/users/jfwwlong","avatar_url":"https://avatars.githubusercontent.com/u/3335667?"},"repo":{"id":172724066,"name":"jfwwlong/jfwwlong.github.io","url":"https://api.github.com/repos/jfwwlon

### JSON Response Content

`requests` has a builtin JSON decoder, which returns the JSON response decoded into a dictionary.

In [8]:
r.json()

[{'actor': {'avatar_url': 'https://avatars.githubusercontent.com/u/873258?',
   'display_login': 'wagnerand',
   'gravatar_id': '',
   'id': 873258,
   'login': 'wagnerand',
   'url': 'https://api.github.com/users/wagnerand'},
  'created_at': '2019-03-12T13:56:54Z',
  'id': '9226853303',
  'org': {'avatar_url': 'https://avatars.githubusercontent.com/u/131524?',
   'gravatar_id': '',
   'id': 131524,
   'login': 'mozilla',
   'url': 'https://api.github.com/orgs/mozilla'},
  'payload': {'pusher_type': 'user',
   'ref': 'renovate/source-map-support-0.x',
   'ref_type': 'branch'},
  'public': True,
  'repo': {'id': 42436885,
   'name': 'mozilla/addons-linter',
   'url': 'https://api.github.com/repos/mozilla/addons-linter'},
  'type': 'DeleteEvent'},
 {'actor': {'avatar_url': 'https://avatars.githubusercontent.com/u/3335667?',
   'display_login': 'jfwwlong',
   'gravatar_id': '',
   'id': 3335667,
   'login': 'jfwwlong',
   'url': 'https://api.github.com/users/jfwwlong'},
  'created_at': '2

### Binary Response Content

You can also access the response body as bytes, for example when you request a file or an image.

In [9]:
print(r,'\n----------------------')
r.content

<Response [200]> 
----------------------


b'[{"id":"9226853303","type":"DeleteEvent","actor":{"id":873258,"login":"wagnerand","display_login":"wagnerand","gravatar_id":"","url":"https://api.github.com/users/wagnerand","avatar_url":"https://avatars.githubusercontent.com/u/873258?"},"repo":{"id":42436885,"name":"mozilla/addons-linter","url":"https://api.github.com/repos/mozilla/addons-linter"},"payload":{"ref":"renovate/source-map-support-0.x","ref_type":"branch","pusher_type":"user"},"public":true,"created_at":"2019-03-12T13:56:54Z","org":{"id":131524,"login":"mozilla","gravatar_id":"","url":"https://api.github.com/orgs/mozilla","avatar_url":"https://avatars.githubusercontent.com/u/131524?"}},{"id":"9226853293","type":"PushEvent","actor":{"id":3335667,"login":"jfwwlong","display_login":"jfwwlong","gravatar_id":"","url":"https://api.github.com/users/jfwwlong","avatar_url":"https://avatars.githubusercontent.com/u/3335667?"},"repo":{"id":172724066,"name":"jfwwlong/jfwwlong.github.io","url":"https://api.github.com/repos/jfwwlong/jf

## Writing Response to a file

In [10]:
import requests


user_url = 'https://api.github.com/users/Thomas-Hartmann'
r = requests.get(user_url)
img_url = r.json()['avatar_url']
print(img_url)
r = requests.get(img_url)

filename = './avatar.jpg'

with open(filename, 'wb') as fd:
    fd.write(r.content)


https://avatars0.githubusercontent.com/u/9944487?v=4


![](avatar.jpg)

### Download Large Files or Responses

In case you have a large file that you want to save, then it is a good idea to save the stream of data coming in, by chopping it into smaller blocks of data and saving them sequentially.

In [11]:
with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=1024):
        fd.write(chunk)

## Custom Headers, Authentication, Response Headers

If you want to send your request with a customized header, then you can just pass your header as a dictionary to the `headers` keyword argument of your request funtion call.

For example, one way to authenticate to the Github API, is by sending an API token in the header. Thereby, you increase the amount of possible requests to 5000 per hour. Not to make the following code run you have to first generate a Github API token (https://github.com/blog/1509-personal-api-tokens) and add it to our token module. There are many other possible ways for authorization, see http://docs.python-requests.org/en/master/user/authentication/#authentication.

The header of a response is accessible as a dictionary via the `headers` attribute on the response object. In the following example, we have to inspect the response header to get the links to more results, as the Github API returns results split accross many pages.

In [12]:
%%bash
echo "GITHUB_API_KEY = 'YOUR_API_KEY'" >> ./api_keys.py

**!!!! ADD api_keys.py to .gitignore !!!!**

In [17]:
import api_keys
import requests
from datetime import datetime
from urllib.parse import urlparse


url = 'https://api.github.com/repos/pallets/flask/contributors'
headers = {'Authorization': 'token {}'.format(api_keys.GITHUB_API_KEY)}

r = requests.get(url, headers=headers)
    
print(r.headers['X-RateLimit-Remaining'])
print(r.headers['X-RateLimit-Reset'])
print(datetime.fromtimestamp(int(r.headers['X-RateLimit-Reset'])))
print(r.json())
contributors = [(contrib['login'], contrib['contributions'], contrib['html_url']) for contrib in r.json()]
 
print(r.headers)

4998
1552402919
2019-03-12 16:01:59
[{'subscriptions_url': 'https://api.github.com/users/mitsuhiko/subscriptions', 'url': 'https://api.github.com/users/mitsuhiko', 'gravatar_id': '', 'login': 'mitsuhiko', 'node_id': 'MDQ6VXNlcjczOTY=', 'site_admin': False, 'type': 'User', 'events_url': 'https://api.github.com/users/mitsuhiko/events{/privacy}', 'starred_url': 'https://api.github.com/users/mitsuhiko/starred{/owner}{/repo}', 'html_url': 'https://github.com/mitsuhiko', 'following_url': 'https://api.github.com/users/mitsuhiko/following{/other_user}', 'avatar_url': 'https://avatars1.githubusercontent.com/u/7396?v=4', 'id': 7396, 'repos_url': 'https://api.github.com/users/mitsuhiko/repos', 'organizations_url': 'https://api.github.com/users/mitsuhiko/orgs', 'received_events_url': 'https://api.github.com/users/mitsuhiko/received_events', 'gists_url': 'https://api.github.com/users/mitsuhiko/gists{/gist_id}', 'contributions': 1181, 'followers_url': 'https://api.github.com/users/mitsuhiko/follower

In [15]:
def gen_next_links(headers_link_str):
    next_page_str, last_page_str = headers_link_str.split(',')
    next_page_link = next_page_str.split(';')[0][1:-1]
    link_base = next_page_link[:-1]
    start_idx = int(urlparse(next_page_link).query.split('=')[1])
    last_page_link = last_page_str.split(';')[0][2:-1]
    end_idx = int(urlparse(last_page_link).query.split('=')[1])
    return [link_base + str(idx) for idx in range(start_idx, end_idx + 1)]

In [16]:
import pygal


print('There are {} contributors to Flask.'.format(len(contributors)))

chart = pygal.Bar(x_label_rotation=80, show_legend=False, spacing=170, 
                  height=1000, width=4000)
chart.title = 'Contributions to Flask on GitHub'

names, no_contrib, _ = zip(*contributors)

values = []
for label, value, link in contributors:
    s_dict = {
    'value': value,
    'label': label,
    'xlink': {'href': link}}
    values.append(s_dict)


chart.x_labels = names
chart.add('', values) 
chart.render_in_browser('contrib_flask.svg')

There are 30 contributors to Flask.


TypeError: render_in_browser() takes 1 positional argument but 2 were given

![](contrib_flask.svg)

import tqdm # A small library to show progress

In [None]:
next_urls = gen_next_links(r.headers['Link'])
for next_url in tqdm(next_urls):
    r = requests.get(next_url, headers=headers)
    contributors += [(contrib['login'], contrib['contributions'], contrib['html_url'])
                     for contrib in r.json()]

In [None]:
import os
import sys
import time
import logging
import requests
import api_keys
from multiprocessing import Pool, cpu_count


HEADER = {'Authorization': f'token {api_keys.GITHUB_API_KEY}'}
contributor_urls = [f'https://api.github.com/repositories/596892/contributors?page={idx}' for idx in range(1, 15)]


def hard_work(a_url):
    print(f'{__name__}/{os.getppid()}/{os.getpid()} gets data from {a_url}')
    r = requests.get(a_url, headers=HEADER)
    time.sleep(6)
    print('Done')
    return [(contrib['login'], contrib['contributions'],
             contrib['html_url']) for contrib in r.json()]
    # return []


def run_sequential_download():
    contributors = []
    logging.info('Running the sequential program.')
    start = time.time()
    for contributor_url in contributor_urls:
        contributors += hard_work(contributor_url)
    print(f'It took {time.time() - start}s in total.')

    return contributors


def run_parallel_processes():
    workers = cpu_count()
    pool = Pool(processes=workers)

    print('Running the concurrent program.')
    start = time.time()
    result = pool.map(hard_work, contributor_urls)

    print(f'It took {time.time() - start}s in total.')
    return result


if __name__ == '__main__':
    if sys.argv[1] == '-s':
        run_sequential_download()
    elif sys.argv[1] == '-p':
        run_parallel_processes()

print('There are {} contributors to Flask.'.format(len(contributors)))

## Exercise

* Create a GitHub API key
* Add it to the `api_keys.py` file
  * DO NOT PUT THIS ON GITHUB!!!
* Make a request to using these parameters:

```Python
url = 'https://api.github.com/repos/datsoftlyngby/dat4sem2019spring-python-materials'
headers = {'Authorization': 'token {}'.format(api_keys.GITHUB_API_KEY)}
```

* Parse the quest to json and look at the data

#  An intro to generators

In [None]:
[i for i in range(1000, 500, -5)]

In [None]:
(i for i in range(1000, 500, -5))

In [None]:
## Pygal plotting 

http://www.pygal.org/en/stable/index.html

In [None]:
chart = pygal.Bar(x_label_rotation=80, show_legend=False, spacing=170, 
                  height=1000, width=4000)
chart.title = 'Contributions to Flask on GitHub'

names, no_contrib, _ = zip(*contributors)

In [None]:
values = []
for label, value, link in contributors:
    s_dict = {
    'value': value,
    'label': label,
    'xlink': {'href': link}}
    values.append(s_dict)

In [None]:
chart.x_labels = names
chart.add('', values) 
chart.render_to_file('contrib_flask.svg')

![](contrib_flask.svg)
