# Requests
How to work with data from the internet

In case the `requests` module is not installed, install it with:

```bash
pip install requests 
```


This material is based on http://docs.python-requests.org/en/master/user/quickstart/#quickstart and on chapter 17 in *Python Crash Course*, Eric Matthes.


We will have a look at the following *application programming interfaces (API)*:

  * http://openweathermap.org/api
  * https://developer.github.com/v3/
  
  

## URLs?

Usually, our URLs for working with remote *Representational state transfer (REST)* APIs consist of a host, a path and a query. In this tutorial we will work with the *Hypertext Transfer Protocol (HTTP)* only. 


```
                    hierarchical part
        ┌───────────────────┴─────────────────────┐
                    authority               path
        ┌───────────────┴───────────────┐┌───┴────┐
  abc://username:password@example.com:123/path/data?key=value&key2=value2#fragid1
  └┬┘   └───────┬───────┘ └────┬────┘ └┬┘           └─────────┬─────────┘ └──┬──┘
scheme  user information     host     port                  query         fragment
```


The example above is from https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples.

In [1]:
from urllib.parse import urlparse
import pandas as pd
url_components = urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3')
for comp in url_components:
    print(comp)
pd.DataFrame(url_components)
url_components

http
example.com
/foo
key1=value1
key2=value2
key3=value3


ParseResult(scheme='http', netloc='example.com', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3')

# Working with APIs on the CLI

On the CLI in a Unix environment, you have usually access either to `curl` or to `wget`. Both are similar and allow -amongst others- to interact with HTTP-based REST APIs.

In [3]:
%%bash

curl https://api.github.com/search/repositories?q=language:python&sort=stars

{
  "total_count": 4919525,
  "incomplete_results": false,
  "items": [
    {
      "id": 83222441,
      "node_id": "MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==",
      "name": "system-design-primer",
      "full_name": "donnemartin/system-design-primer",
      "private": false,
      "owner": {
        "login": "donnemartin",
        "id": 5458997,
        "node_id": "MDQ6VXNlcjU0NTg5OTc=",
        "avatar_url": "https://avatars2.githubusercontent.com/u/5458997?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/donnemartin",
        "html_url": "https://github.com/donnemartin",
        "followers_url": "https://api.github.com/users/donnemartin/followers",
        "following_url": "https://api.github.com/users/donnemartin/following{/other_user}",
        "gists_url": "https://api.github.com/users/donnemartin/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/donnemartin/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/us

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0 43  174k   43 77901    0     0  49681      0  0:00:03  0:00:01  0:00:02 49681100  174k  100  174k    0     0   108k      0  0:00:01  0:00:01 --:--:--  108k


In [4]:
%%bash

#wget laver request
#kræver %%bash 
wget -O - https://api.github.com/search/repositories?q=language:python&sort=stars


bash: line 4: wget: command not found


In [6]:
%%bash
ls

__pycache__
01 Uge 1. python Ã¸velser.ipynb
01-0 Getting Started with installations etc.ipynb
01-0 Intro.slides.html
01-1 Getting Started with installations etc.slides.html
01-1 Intro.ipynb
01-2 Python language features.ipynb
01-3 Python data structures (lists and tuples).ipynb
01-4 Data structures (Dictionaries).ipynb
01-Exercise.ipynb
02-0a Loops.ipynb
02-0b Functions.ipynb
02-0c Modules.ipynb
02-1 CLI Programs.ipynb
02-2 Working with Files.ipynb
02-3 Object-oriented Programming.ipynb
02-4 Exceptions.ipynb
02-assignment
03 Plotting.ipynb
03_assignment
04 Numpy.ipynb
05-1 DateTime.ipynb
05-2 Subprocess.ipynb
05-3 pandas.ipynb
05-assignment panda.ipynb
06-1 Generators.ipynb
06-2 Python modules.ipynb
06-3 Requests.ipynb
06-4 Multiprocessing.ipynb
07-1 webscraping.ipynb
07-2 Regular Expressions.ipynb
07-3 selenium.ipynb
08 Data Persistence.ipynb
08 Web services.ipynb
09-1 cli arguments.ipynb
09-2 graphs and pagerank.ipynb
1 uge 1 python Ã¸vesler.ipynb
10-0 Linear Algebra.ipynb
10-1 Featu

# Working with APIs from Python
 

## Make a Request

For this tutorial we are mostly collecting information with HTTPs `GET` request. Similarly, the `requests` module supports HTTP `POST`, `PUT`, `DELETE`, `HEAD`, and `OPTIONS` via corresponding functions in `requests`. See http://docs.python-requests.org/en/master/user/quickstart/#make-a-request for more details.

You can access the status code of an HTTP request via the `status_code` attribute. 

In [8]:
import requests


url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'

r = requests.get(url)

type(r.status_code)

print('URL:\n',r.url) #få URL
print('Status code:\n',r.status_code) #få status
print('JSON:\n',r.json()) #giv som json

URL:
 https://api.github.com/search/repositories?q=language:python&sort=stars
Status code:
 200
JSON:
 {'total_count': 4904531, 'incomplete_results': True, 'items': [{'id': 83222441, 'node_id': 'MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==', 'name': 'system-design-primer', 'full_name': 'donnemartin/system-design-primer', 'private': False, 'owner': {'login': 'donnemartin', 'id': 5458997, 'node_id': 'MDQ6VXNlcjU0NTg5OTc=', 'avatar_url': 'https://avatars2.githubusercontent.com/u/5458997?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/donnemartin', 'html_url': 'https://github.com/donnemartin', 'followers_url': 'https://api.github.com/users/donnemartin/followers', 'following_url': 'https://api.github.com/users/donnemartin/following{/other_user}', 'gists_url': 'https://api.github.com/users/donnemartin/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/donnemartin/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/donnemartin/subscriptions', 'organizatio

In [8]:
results = r.json()['items']
print('First result:\n',results[0])

First result:
 {'id': 83222441, 'node_id': 'MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==', 'name': 'system-design-primer', 'full_name': 'donnemartin/system-design-primer', 'private': False, 'owner': {'login': 'donnemartin', 'id': 5458997, 'node_id': 'MDQ6VXNlcjU0NTg5OTc=', 'avatar_url': 'https://avatars2.githubusercontent.com/u/5458997?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/donnemartin', 'html_url': 'https://github.com/donnemartin', 'followers_url': 'https://api.github.com/users/donnemartin/followers', 'following_url': 'https://api.github.com/users/donnemartin/following{/other_user}', 'gists_url': 'https://api.github.com/users/donnemartin/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/donnemartin/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/donnemartin/subscriptions', 'organizations_url': 'https://api.github.com/users/donnemartin/orgs', 'repos_url': 'https://api.github.com/users/donnemartin/repos', 'events_url': 'https://api.git

In [9]:
summary = [(el['full_name'], el['stargazers_count'], el['html_url'], 
            el['description']) for el in results[:10]]

for name, stars, url, desc in summary:
    print(name)
    print(stars)
    print(url)
    print(desc)
    print('---------------')

donnemartin/system-design-primer
84179
https://github.com/donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview.  Includes Anki flashcards.
---------------
vinta/awesome-python
79774
https://github.com/vinta/awesome-python
A curated list of awesome Python frameworks, libraries, software and resources
---------------
public-apis/public-apis
71328
https://github.com/public-apis/public-apis
A collective list of free APIs for use in software and web development.
---------------
TheAlgorithms/Python
67599
https://github.com/TheAlgorithms/Python
All Algorithms implemented in Python
---------------
ytdl-org/youtube-dl
62320
https://github.com/ytdl-org/youtube-dl
Command-line program to download videos from YouTube.com and other video sites
---------------
tensorflow/models
61983
https://github.com/tensorflow/models
Models and examples built with TensorFlow
---------------
nvbn/thefuck
52328
https://github.com/nvbn/thefuck
Magnificent ap

## Passing Parameters In URLs

For getting the weather forecast, we need to specify, for example for which place we forecast, in which format we want to receive the response, etc. All those paramters are passed as a dictionary into the `params` keyword argument.

In [10]:
import json
import api_keys
import requests


url = "http://api.openweathermap.org/data/2.5/forecast"
query = {'q': 'Copenhagen,dk', 
         'mode': 'json',                       
         'units': 'metric',
         'appid': api_keys.OWM_API_KEY}
r = requests.get(url, params=query)

r.json()

ModuleNotFoundError: No module named 'api_keys'

## Response Content

### As Text

`requests` will automatically decode content from the server. Most unicode charsets are seamlessly decoded.

When you make a request, `requests` makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by `requests` is used when you access `r.text`.

In [None]:
import requests

# A call to the Github timeline
r = requests.get('https://api.github.com/events')
# response encoding
print(r.encoding, '\n')
# response content
print(r.text)

### JSON Response body

`requests` has a builtin JSON decoder, which returns the JSON response decoded into a dictionary.

In [None]:
# response body to dictionary
r.json()

### Binary Response Content

You can also access the response body as bytes, for example when you request a file or an image.

In [None]:
# get response as bytes:
r.content

## Writing Response to a file

In [5]:
import requests



user_url = 'https://api.github.com/users/Thomas-Hartmann' #bruger
r = requests.get(user_url)
img_url = r.json()['avatar_url'] #omdan til json, hent kun billedets URL
print(img_url)
r = requests.get(img_url)  #hent billedet ud fra dets URL

filename = './avatar.jpg' 

with open(filename, 'wb') as fd: #skriv det hentede billede til filename variablen 
    fd.write(r.content)

https://avatars0.githubusercontent.com/u/9944487?v=4


avatar.jpg:
![](avatar.jpg)

### Download Large Files or Response body

In case you have a large file that you want to save, then it is a good idea to save the stream of data coming in, by chopping it into smaller blocks of data and saving them sequentially.

In [None]:
with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=1024):
        fd.write(chunk)

## Custom Headers, Authentication, Response Headers

If you want to send your request with a customized header, then you can just pass your header as a dictionary to the `headers` keyword argument of your request funtion call.

For example, one way to authenticate to the Github API, is by sending an API token in the header. Thereby, you increase the amount of possible requests to 5000 per hour. 

**Note** to make the following code run you have to first generate a Github API token (https://github.com/blog/1509-personal-api-tokens) and add it to our token module. There are many other possible ways for authorization, see http://docs.python-requests.org/en/master/user/authentication/#authentication.

The header of a response is accessible as a dictionary via the `headers` attribute on the response object. In the following example, we have to inspect the response header to get the links to more results, as the Github API returns results split accross many pages.

In [None]:
%%bash

echo "GITHUB_API_KEY = 'YOUR_API_KEY'" >> ./api_keys.py

In [None]:
import api_keys
import requests
from datetime import datetime
from urllib.parse import urlparse


url = 'https://api.github.com/repos/pallets/flask/contributors'
headers = {'Authorization': 'token {}'.format(api_keys.GITHUB_API_KEY)}

r = requests.get(url, headers=headers)
    
print(r.headers['X-RateLimit-Remaining'])
print(r.headers['X-RateLimit-Reset'])
print(datetime.fromtimestamp(int(r.headers['X-RateLimit-Reset'])))

contributors = [(contrib['login'], contrib['contributions'], contrib['html_url'])
                for contrib in r.json()]
 
#print(r.headers)
print(r.headers['Link'])
print('contributors:\n',contributors)

## urllib.parse.urlparse
Parse a URL into six components, returning a 6-item named tuple.   
This corresponds to the general structure of a URL: 
- scheme='http', 
- netloc='www.cwi.nl:80', 
- path='/%7Eguido/Python.html',
- params='q=23', 
- query='', 
- fragment=''

In [None]:
def gen_next_links(headers_link_str):
    next_page_str, last_page_str = headers_link_str.split(',')
    next_page_link = next_page_str.split(';')[0][1:-1]
    link_base = next_page_link[:-1]
    start_idx = int(urlparse(next_page_link).query.split('=')[1])
    last_page_link = last_page_str.split(';')[0][2:-1]
    end_idx = int(urlparse(last_page_link).query.split('=')[1])
    return [link_base + str(idx) for idx in range(start_idx, end_idx + 1)]


next_urls = gen_next_links(r.headers['Link'])
for nurl in next_urls:
    print(nurl)


## Site note on tqdm module:
From the [documentation](https://pypi.org/project/tqdm/):

"tqdm means “progress” in Arabic (taqadum, تقدّم) and is an abbreviation for “I love you so much” in Spanish (te quiero demasiado).

Instantly make your loops show a smart progress meter - just wrap any iterable with tqdm(iterable), and you’re done!"


In [None]:
from tqdm import tqdm
#contributors = []
for next_url in tqdm(next_urls):
    r = requests.get(next_url, headers=headers)
    contributors += [(contrib['login'], contrib['contributions'], contrib['html_url'])
                     for contrib in r.json()]

In [None]:
print(contributors[:3])

In [None]:
import matplotlib.pyplot as plt


print('There are {} contributors to Flask.'.format(len(contributors)))

#chart = plt.Bar(x_label_rotation=80, show_legend=False, spacing=170, height=1000, width=4000)
#chart.title = 'Contributions to Flask on GitHub'

names, no_contrib, _ = zip(*contributors)
#print(names)

plt.bar(names[:10], no_contrib[:10])
plt.xticks(names[:10], names[:10], rotation='vertical')
plt.show()

#  A Small Detour... on Counting

In [None]:
gender = ['m','f','m','f','m']

f_count = 0
m_count = 0

for g in gender:
    if g == 'f':
        f_count += 1
    else:
        m_count += 1
print('male:',m_count,'female:',f_count)

In [None]:
female = sum([1 for g in gender if g == 'f'])
male   = sum([1 for g in gender if g == 'm'])
print('male:',male,'female:',female)

In [None]:
from collections import Counter
gender_dict = dict(Counter(gender))
print(gender_dict)
print('male:',gender_dict['m'],'female:',gender_dict['f'])

In [None]:
import numpy as np
gender = [['m','f','m','f','m'],
          ['m','m','m','f','m'],
         ['m','f','m','f','m']]
gender_counts = np.unique(gender, return_counts=True)
print(gender_counts)
ar1, ar2 = zip(*gender_counts)
print('\n\n',ar1,ar2)

## Exercise with request module
1. If not allready done register on OpenWetherMap and get an api key: https://home.openweathermap.org/api_keys
2. get the 5 day forecast data like in the lecture notebook.
3. create a function to get 'main' data from each list item in the forecast
4. find the lowest and highest temperature in celcius from the 5 days forecast
5. Download list of all cities here: http://bulk.openweathermap.org/sample/city.list.json.gz and unzip it
6. read the file and use json.load(file_name) to load the data into a dict
7. create a function to return a dict of key=countryname and value= list of all available cities
8. how many cities are in DK
9. create a function that returns a dictionary with avg temperatures of all cities provided in a list
10. what is the temperatur difference between Los Angeles and Copenhagen?