# Getting data from the web

Getting data from the Internet is easy. There are several techniques available. We'll be both using linux commands and python libraries to get this done. When you know how to do this, you'll be able to get live data which will make your programs more attractive!

### Contents
0. Install packages
1. Requests
2. wget
3. cURL
4. Urllib

## 0. Install packages

In [None]:
%pip install requests
%pip install wget

## 1. Requests

source: https://realpython.com/python-json/
We will use a website called jasonplaceholder to call dummy data. 

We will use both the requests and json packages (see 1.3 working with files)

In [None]:
import requests
url = "https://jsonplaceholder.typicode.com/todos" #it's best to define the url seperately
response = requests.get(url)
print(response)

In [None]:
print(dir(response))

In [None]:
#print the http status code (200 = OK)
response.status_code

In [None]:
response.text

In [None]:
import json
todos = json.loads(response.text)
print(type(todos))
todos

### Weather with an api_key
registreer je via weerlive

In [None]:
import requests
def weersverwachting(api_key=None, plaats=None):
    assert api_key is not None, 'Geef je API key op'
    plaats = plaats or 'Amsterdam'
    
    BASE_URL = 'https://weerlive.nl/api/json-data-10min.php?'
    verwachting = requests.get(f"{BASE_URL}key={api_key}&locatie={plaats}").json().get('liveweer')
    
    return verwachting #[0]
weersverwachting('demo')
#def samenvatting(api_key=None, plaats=None):

### Learning to work with headers using Reqbin
source: https://reqbin.com/req/python/5k564bhv/get-request-bearer-token-authorization-header-example

In [None]:
import requests
from requests.structures import CaseInsensitiveDict

url = "https://reqbin.com/echo/get/json"

headers = CaseInsensitiveDict()
headers["Accept"] = "application/json"
headers["Authorization"] = "Bearer {token}"


resp = requests.get(url, headers=headers)

print(resp.status_code)

In [None]:
print(dir(resp))

In [None]:
print(resp.headers)
print('-------------------------')
print(resp.content)

## 2. wget

In [None]:
import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
filename

In [None]:
import wget

print('Beginning file download with wget module')

url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
wget.download(url)

In [None]:
import wget
url = 'https://github.com/MichielBbal/ArduinoTensorFlowLiteTutorials/blob/master/examples/audio/ipynb/uk_ireland_accent_recognition.ipynb'
filename = wget.download(url)
filename

In [6]:
import io

from PIL import Image  # https://pillow.readthedocs.io/en/4.3.x/
import requests  # http://docs.python-requests.org/en/master/

url = 'https://github.com/oneoffcoder/books/blob/master/sphinx/datascience/source/pose-estimation/images/tennis-00.jpg'
# example image url: https://m.media-amazon.com/images/S/aplus-media/vc/6a9569ab-cb8e-46d9-8aea-a7022e58c74a.jpg
def download_image(url, image_file_path):
    r = requests.get(url, timeout=4.0)
    if r.status_code != requests.codes.ok:
        assert False, 'Status code error: {}.'.format(r.status_code)

    with Image.open(io.BytesIO(r.content)) as im:
        im.save(image_file_path)

    print('Image downloaded from url: {} and saved to: {}.'.format(url, image_file_path))

In [None]:
pwd

In [None]:
#extract a zipfile
import zipfile
with zipfile.ZipFile('gm_ve_v1.zip', 'r') as zip_ref:
    zip_ref.extractall()

## 3. cURL (=not python)
curl is a way of retrieving data. Standard syntax is:
!curl https://www.example.com

The curl repository can be found here:https://github.com/curl/curl

In [None]:
!curl www.example.com

## 4. Urllib

urllib is the python built-in library for getting url's.

The package contains the following modules:
- urllib.request for opening and reading URLs
- urllib.error containing the exceptions raised by urllib.request
- urllib.parse for parsing URLs
- urllib.robotparser for parsing robots.txt files

source: https://docs.python.org/3/library/urllib.html


In [None]:
import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()
html

In [None]:
# A NON WORKING WEBSITE...
req = urllib.request.Request('http://www.pretend_server.org')
try: 
    urllib.request.urlopen(req)
except urllib.error.URLError as e:
    print(e.reason) 

In [None]:
import urllib.request
url = "https://jsonplaceholder.typicode.com/todos"

with urllib.request.urlopen(url) as response:
   response = response.read()
print(response)

In [None]:
from urllib import parse

params = {"v": "EuC-yVZHhMI", "t": "5m56s"}
querystring = parse.urlencode(params)
print(querystring)

In [None]:
url = "https://www.youtube.com/watch"+"?" +querystring
print(url)
resp = urllib.request.urlopen(url)
resp.code