## Step One
### Import the <u>requests</u> library

In [1]:
import requests

## Step Two
### Send an HTTP request, get the response, and save in a variable

In [5]:
response = requests.get("https://www.simplyrecipes.com/?s=asparagus+garlic")

In [6]:
type(response)

requests.models.Response

## Step Three
### Check the respomse status code to see if everything went as planned
- Status code 200: the request-response cycle was successful
- Any other status code if didn't work

In [7]:
print(response.status_code)

200


## Step Four
### Get the content of the response
and convert to utf-8 if necessary

In [6]:
response.content#.decode('utf-8')

b'<!DOCTYPE html>\r\n<html lang="en-US">\r\n<head>\r\n\t<title>asparagus cheese  |  Search Results  | SimplyRecipes.com</title>\r\n\r\n\t<meta charset="UTF-8">\r\n\t<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">\r\n\r\n\t\t\t<script type="text/javascript">\nwindow.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o||n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<e.length;o++)r(e[o]);return r}({1:[function(t,n,e){function r(t){try{s.console&&console.log(t)}catch(n){}}var o,i=t("ee"),a=t(15),s={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(s.console=!0,o.indexOf("dev")!==-1&&(s.dev=!0),o.indexOf("nr_dev")!==-1&&(s.nrDev=!0))}catch(c){}s.nrDev&&i.on("internal-error",function(t){r(t.stack)}),s.dev&&i.on("fn-err",function(t,n,e){r(e.stac

Here we can see letter __b__ at the first position. It means that this is the representation called __byte string__ and we have to transform it to Unicode:

In [7]:
response.content.decode('utf-8')

'<!DOCTYPE html>\r\n<html lang="en-US">\r\n<head>\r\n\t<title>asparagus cheese  |  Search Results  | SimplyRecipes.com</title>\r\n\r\n\t<meta charset="UTF-8">\r\n\t<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">\r\n\r\n\t\t\t<script type="text/javascript">\nwindow.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o||n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<e.length;o++)r(e[o]);return r}({1:[function(t,n,e){function r(t){try{s.console&&console.log(t)}catch(n){}}var o,i=t("ee"),a=t(15),s={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(s.console=!0,o.indexOf("dev")!==-1&&(s.dev=!0),o.indexOf("nr_dev")!==-1&&(s.nrDev=!0))}catch(c){}s.nrDev&&i.on("internal-error",function(t){r(t.stack)}),s.dev&&i.on("fn-err",function(t,n,e){r(e.stack

Here we used the function __decode__ and gave it the __coding scheme__.

The coding scheme can vary. There are lots of coding schemes. But __UTF8__ or __UTF16__ are the most common. So generally, if you're going to an English language web page, you can expect that the result is going to come back in UTF8 format.

Let's found the subtitle "Welcome to Simply Recipes" from the main page and get number of byte where ir is situated.

In [8]:
response.content.decode('utf-8').find('Welcome to Simply Recipes')

83705

## JSON
The Python library __json__ deals with converting text to and from JSON

In [9]:
import json

data_string = '[{"q":[2,3],"r":3.0,"s":"SS"}]'
python_data = json.loads(data_string)
print(type(python_data))
python_data

<class 'list'>


[{'q': [2, 3], 'r': 3.0, 's': 'SS'}]

<h4><u>json.loads recursively decodes a string in JSON format into equivalent python objects</u></h4>
<li>data_string's outermost element is converted into a python list
<li>the first element of that list is converted into a dictionary
<li>the key of that dictionary is converted into a string
<li>the value of that dictionary is converted into a list of two integer elements

In [11]:
print(type(data_string),type(python_data))
print(type(python_data[0]),python_data[0])
print(type(python_data[0]['s']),python_data[0]['s'])

<class 'str'> <class 'list'>
<class 'dict'> {'q': [2, 3], 'r': 3.0, 's': 'SS'}
<class 'str'> SS


__json.loads__ will throw an exception if the format is incorrect

In [12]:
#Correct
json.loads('"Hello"')

'Hello'

The next code is wrong. And the reason I get that exception is because here I have a string, but it doesn't contain a JSON object. __To contain a JSON object, it should have a string inside it.__

In [13]:
#Wrong
json.loads("Hello")

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### requests library and JSON
The __Request Library__ has a function that automatically loads a JSON string into Python. 

In [14]:
address = 'KPI, Kyiv, Ukraine'
url = 'https://maps.googleapis.com/maps/api/geocode/json?address=%s'%(address)
response = requests.get(url).json()
print(type(response))

<class 'dict'>


### Exception checking!
So you should always be ready to face the fact that your code may not work. You may be expecting a JSON object back, but the server instead sends you a malformed JSON object. Be ready for that, too. So always check for exceptions. And that's what we're going to do now. We're going to make sure that we have everything properly checked over here.

In [11]:
address="KPI,Kyiv,Ukraine"
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
try:
    response = requests.get(url)
    if not response.status_code == 200:
        print("HTTP error",response.status_code)
    else:
        try:
            response_data = response.json()
        except:
            print("Response not in valid JSON format")
except:
    print("Something wemt wrong with requests.get()")
print(type(response_data))

<class 'dict'>


In [12]:
# Let's see what URL looks like
url

'https://maps.googleapis.com/maps/api/geocode/json?address=KPI,Kyiv,Ukraine'

We get the response data. And we notice that it's of type Dictionary. So let's take a look at what this dictionary looks like.

In [13]:
response_data

{'results': [{'address_components': [{'long_name': '37',
     'short_name': '37',
     'types': ['street_number']},
    {'long_name': 'просп. Перемоги',
     'short_name': 'просп. Перемоги',
     'types': ['route']},
    {'long_name': "Solom'yans'kyi district",
     'short_name': "Solom'yans'kyi district",
     'types': ['political', 'sublocality', 'sublocality_level_1']},
    {'long_name': 'Kyiv',
     'short_name': 'Kyiv',
     'types': ['locality', 'political']},
    {'long_name': 'Kyiv City',
     'short_name': 'Kyiv City',
     'types': ['administrative_area_level_2', 'political']},
    {'long_name': 'Ukraine',
     'short_name': 'UA',
     'types': ['country', 'political']},
    {'long_name': '03056', 'short_name': '03056', 'types': ['postal_code']}],
   'formatted_address': 'просп. Перемоги, 37, Kyiv, Ukraine, 03056',
   'geometry': {'location': {'lat': 50.4488824, 'lng': 30.4572542},
    'location_type': 'ROOFTOP',
    'viewport': {'northeast': {'lat': 50.4502313802915,
      '

## Problem One
### Write a function that takes an address as an argument and returns a (latitude,longitude) tuple

In [14]:
response_data['results']

[{'address_components': [{'long_name': '37',
    'short_name': '37',
    'types': ['street_number']},
   {'long_name': 'просп. Перемоги',
    'short_name': 'просп. Перемоги',
    'types': ['route']},
   {'long_name': "Solom'yans'kyi district",
    'short_name': "Solom'yans'kyi district",
    'types': ['political', 'sublocality', 'sublocality_level_1']},
   {'long_name': 'Kyiv',
    'short_name': 'Kyiv',
    'types': ['locality', 'political']},
   {'long_name': 'Kyiv City',
    'short_name': 'Kyiv City',
    'types': ['administrative_area_level_2', 'political']},
   {'long_name': 'Ukraine',
    'short_name': 'UA',
    'types': ['country', 'political']},
   {'long_name': '03056', 'short_name': '03056', 'types': ['postal_code']}],
  'formatted_address': 'просп. Перемоги, 37, Kyiv, Ukraine, 03056',
  'geometry': {'location': {'lat': 50.4488824, 'lng': 30.4572542},
   'location_type': 'ROOFTOP',
   'viewport': {'northeast': {'lat': 50.4502313802915,
     'lng': 30.4586031802915},
    'south

In [15]:
for thing in response_data['results'][0]:
    print(thing)

address_components
formatted_address
geometry
partial_match
place_id
types


In [16]:
response_data['results'][0]['geometry']

{'location': {'lat': 50.4488824, 'lng': 30.4572542},
 'location_type': 'ROOFTOP',
 'viewport': {'northeast': {'lat': 50.4502313802915, 'lng': 30.4586031802915},
  'southwest': {'lat': 50.4475334197085, 'lng': 30.4559052197085}}}

In [17]:
response_data['results'][0]['geometry']['location']

{'lat': 50.4488824, 'lng': 30.4572542}

In [18]:
def get_lat_lng(address):
    import requests, time
    
    url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
    
    try:
        response = requests.get(url)
        if not response.status_code == 200:
            print('HTTP error',response.status_code)
        else:
            try:
                response_data = response.json()
            except:
                print('Response not valid JSON format')
    except:
        print('Something went wrong with requests.get')
    try:
        time.sleep(1)
        lat = response_data['results'][0]['geometry']['location']['lat']
        lng = response_data['results'][0]['geometry']['location']['lng']
    except:
        print('Try another one')
    return (lat,lng)

In [19]:
get_lat_lng('Sarny,Ukraine')

(51.3456549, 26.601983)

## Problem Two
### Extend the function so that it takes a possibly incomplete address as an argument and returns a list of tuples of the form (complete address, latitude, longitude)

In [24]:
def get_lat_lng_incompl(address):
    #python code goes here
    import requests, time
    
    url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
    try:
        response = requests.get(url)
        if not response.status_code == 200:
            print("HTTP error",response.status_code)
        else:
            try:
                response_data = response.json()
            except:
                print("Response not in valid JSON format")
    except:
        print("Something went wrong with requests.get")
    try:
        time.sleep(1)
        propos_adr = []
        for i in range(len(response_data['results'])):
            adr = response_data['results'][i]['address_components'][0]['long_name']
            lat = response_data['results'][i]['geometry']['location']['lat']
            lng = response_data['results'][i]['geometry']['location']['lng']
            propos_adr.append((adr,lat,lng))
    except:
        print("Try another one.")
    return propos_adr    

In [44]:
get_lat_lng_incompl('Lon')

[('Lon', 34.1500792, -105.123883),
 ('Lon', 37.1836603, -93.0593459),
 ('331', 38.9114656, -94.6522892),
 ('3', 35.83313, -96.3917589),
 ('# 203', 37.5542932, -97.2719228),
 ('107', 38.6530322, -94.34875559999999),
 ('6306', 39.013442, -94.58645399999999),
 ('2242', 39.025491, -95.765889),
 ('5532', 33.518225, -112.004898),
 ('Lon Norris Township', 35.2949073, -94.38221089999999)]

## XML
The Python library __lxml__ deals with converting an xml string to python objects and vice versa

In [45]:
data_string = """
<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>
"""

In [46]:
from lxml import etree
root = etree.XML(data_string)
print(root.tag,type(root.tag))

Bookstore <class 'str'>


In [47]:
print(etree.tostring(root,pretty_print=True).decode('utf-8'))

<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>



### Iterating over an XML tree
- Use an iterator
- The iterator will generate every tree element for a given subtree

In [48]:
for element in root.iter():
    print(element)

<Element Bookstore at 0x1ef844bc508>
<Element Book at 0x1ef844c9708>
<Element Title at 0x1ef84561608>
<Element Authors at 0x1ef84561848>
<Element Author at 0x1ef844c9708>
<Element First_Name at 0x1ef84561608>
<Element Last_Name at 0x1ef84561848>
<Element Book at 0x1ef84561888>
<Element Remark at 0x1ef84561608>
<Element Title at 0x1ef844c9708>
<Element Authors at 0x1ef84561888>
<Element Author at 0x1ef84561608>
<Element First_Name at 0x1ef844c9708>
<Element Last_Name at 0x1ef84561888>
<Element Author at 0x1ef84561848>
<Element First_Name at 0x1ef84561608>
<Element Last_Name at 0x1ef844c9708>


Or just use the child in subtree construction:

In [49]:
for child in root:
    print(child)

<Element Book at 0x1ef834d4508>
<Element Book at 0x1ef84561b88>


#### Accessing the tag

In [51]:
for child in root:
    print(child.tag)

Book
Book


#### Using the iterator to get specific tags:
- in the below example, only the __author__ tags are accessed
- for each __author__ tag, the __.find()__ function access the First_Name and Last_Name tags
- the __.find()__ function only looks at the children, not other descendants!!!
- the __.text__ attribute prints the text in a __leaf node__

In [52]:
for element in root.iter('Author'):
    print(element.find('First_Name').text,element.find('Last_Name').text)

Richard Berenholtz
Bill Harris
Jorg Brockmann


## Problem
### Find the last names of all authors in the tree 'root' using xpath

In [53]:
for element in root.findall('Book/Title'):
    print(element.text)

New York Deco
Five Hundred Buildings of New York


In [55]:
for element in root.findall('Book/Authors/Author'):
    print(element.find('First_Name').text)

Richard
Bill
Jorg


### Using values of attributes as filters
- Example: Find the first name of the author of a book that weights 1.5 oz

In [56]:
root.find('Book[@Weight="1.5"]/Authors/Author/First_Name').text

'Richard'

In [57]:
root.find('Book[@Price="15.80"]/Authors/Author/Last_Name').text

'Harris'

## Problem
### Print first and last names of all authors who live in New York City

In [60]:
books = root.findall('Book')
print(books,type(books),sep='\n')

[<Element Book at 0x1ef845fe108>, <Element Book at 0x1ef84561b88>]
<class 'list'>


In [65]:
for i in range(len(books)):
    print(root.findall('Book/Authors/Author[@Residence="New York City"]/First_Name')[i].text,
          root.findall('Book/Authors/Author[@Residence="New York City"]/Last_Name')[i].text)    

Richard Berenholtz
Jorg Brockmann
