<h3>Step 1: Import the requests library</h3>

In [1]:
import requests

<h3>Step 2: Send an HTTP request, get the response, and save in a variable</h3>

In [2]:
response = requests.get("http://www.epicurious.com/search/Tofu+Chili")

<h3>Step 3: Check the response status code to see if everything went as planned</h3>
<li>status code 200: the request response cycle was successful
<li>any other status code: it didn't work (e.g., 404 = page not found)

In [3]:
print(response.status_code)

200


<h3>Step 4: Get the content of the response</h3>
<li>Convert to utf-8 if necessary

In [4]:
response.content.decode('utf-8')



<h4>Problem: Get the contents of Wikipedia's main page and look for the string "Did you know" in it</h4>

In [6]:
url = "https://en.wikipedia.org/wiki/main_page"
#The rest of your code should go below this line
wiki_response = requests.get(url)
print(wiki_response.status_code)
wiki_text = wiki_response.content.decode('utf-8')
print(wiki_text.find('Did you know'))

200
13772


<h2>JSON</h2>
<li>The python library - json - deals with converting text to and from JSON


In [7]:
import json
data_string = '[{"b": [2, 4], "c": 3.0, "a": "A"}]'
python_data = json.loads(data_string)
print(python_data)

[{'b': [2, 4], 'a': 'A', 'c': 3.0}]


<h3>json.loads recursively decodes a string in JSON format into equivalent python objects</h3>
<li>data_string's outermost element is converted into a python list
<li>the first element of that list is converted into a dictionary
<li>the key of that dictionary is converted into a string
<li>the value of that dictionary is converted into a list of two integer elements

In [8]:
print(type(data_string),type(python_data))
print(type(python_data[0]),python_data[0])
print(type(python_data[0]['b']),python_data[0]['b'])

<class 'str'> <class 'list'>
<class 'dict'> {'b': [2, 4], 'a': 'A', 'c': 3.0}
<class 'list'> [2, 4]


<h3>json.loads will throw an exception if the format is incorrect</h3>

In [11]:
#Wrong
#json.loads("'Hello'")
#Correct
json.loads('"Hello"')

'Hello'

In [12]:
import json
data_string = json.dumps(python_data)
print(type(data_string))
print(data_string)


<class 'str'>
[{"b": [2, 4], "a": "A", "c": 3.0}]


<h2>requests library and JSON</h2>

In [13]:
address="Columbia University, New York, NY"
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
response = requests.get(url).json()
print(type(response))

<class 'dict'>


<h3>Exception checking!</h3>

In [14]:
address="Columbia University, New York, NY"
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
try:
    response = requests.get(url)
    if not response.status_code == 200:
        print("HTTP error",response.status_code)
    else:
        try:
            response_data = response.json()
        except:
            print("Response not in valid JSON format")
except:
    print("Something went wrong with requests.get")
print(type(response_data))

<class 'dict'>


In [21]:
response_data['results'][0]['geometry']['location']

{'lat': 40.8075355, 'lng': -73.9625727}

<h2>Problem 1: Write a function that takes an address as an argument and returns a (latitude, longitude) tuple</h2>

In [22]:
def get_lat_lng(address_string):
    #python code goes here
    url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
    import requests
    response = requests.get(url)
    if not response.status_code == 200:
        print("error http response code is :%s" % response.status_code)
    else:
        try:
            response_data = response.json()
        except:
            print("response not in valid JSON format")
    lat = response_data['results'][0]['geometry']['location']['lat']
    lng = response_data['results'][0]['geometry']['location']['lng']
    return lat, lng

<h2>Problem 2: Extend the function so that it takes a possibly incomplete address as an argument and returns a list of tuples of the form (complete address, latitude, longitude)</h2>

In [30]:
address="Columbia"
url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
try:
    response = requests.get(url)
    if not response.status_code == 200:
        print("HTTP error",response.status_code)
    else:
        try:
            response_data = response.json()
        except:
            print("Response not in valid JSON format")
except:
    print("Something went wrong with requests.get")
print(type(response_data))

<class 'dict'>


In [31]:
import json
json.dumps(response_data)

'{"status": "OK", "results": [{"address_components": [{"types": ["locality", "political"], "long_name": "Columbia", "short_name": "Columbia"}, {"types": ["administrative_area_level_2", "political"], "long_name": "Boone County", "short_name": "Boone County"}, {"types": ["administrative_area_level_1", "political"], "long_name": "Missouri", "short_name": "MO"}, {"types": ["country", "political"], "long_name": "United States", "short_name": "US"}], "place_id": "ChIJyYKBu_Or3IcRIG-9ui1pEaA", "geometry": {"viewport": {"southwest": {"lng": -92.4334839, "lat": 38.863548}, "northeast": {"lng": -92.22819400000002, "lat": 39.02944}}, "bounds": {"southwest": {"lng": -92.4334839, "lat": 38.863548}, "northeast": {"lng": -92.22819400000002, "lat": 39.02944}}, "location_type": "APPROXIMATE", "location": {"lng": -92.3340724, "lat": 38.9517053}}, "types": ["locality", "political"], "formatted_address": "Columbia, MO, USA"}, {"address_components": [{"types": ["locality", "political"], "long_name": "Colum

In [32]:
def get_lat_lng(address_string):
    #python code goes here
    url="https://maps.googleapis.com/maps/api/geocode/json?address=%s" % (address)
    import requests
    try:
        response = requests.get(url)
        if not response.status_code == 200:
            print('request error code : %s ' % response_status_code)
        else:
            try:
                response_data = response.json()
            except:
                print('response data is not a valid JSON format')
    except:
        print('request went wrong')
    geos = []
    for geo in response_data['results']:
        geos.append([geo['geometry']['location']['lat'], geo['geometry']['location']['lng']])
    return geos
        

In [33]:
get_lat_lng('Columbia')

[[38.9517053, -92.3340724],
 [34.0007104, -81.0348144],
 [4.570868, -74.297333],
 [39.2037144, -76.86104619999999],
 [35.6150716, -87.0352831],
 [38.8338816, -104.8213634]]

<h1>XML</h1>
<li>The python library - lxml - deals with converting an xml string to python objects and vice versa</li>

In [57]:
data_string = """
<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>
"""

In [58]:
from lxml import etree
root = etree.XML(data_string)
print(root.tag,type(root.tag))

Bookstore <class 'str'>


In [59]:
print(etree.tostring(root, pretty_print=True).decode("utf-8"))

<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>



<h3>Iterating over an XML tree</h3>
<li>Use an iterator. 
<li>The iterator will generate every tree element for a given subtree

In [89]:
for element in root.iter():
    print(element.text)


   

      
New York Deco

         

            
Richard
Berenholtz

      

      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      
Five Hundred Buildings of New York

         

            
Bill
Harris

            
Jorg
Brockmann


<h4>Or just use the child in subtree construction

In [61]:
for child in root:
    print(child)

<Element Book at 0x10a41cd48>
<Element Book at 0x10a41c208>


<h4>Accessing the tag</h4>


In [62]:
for child in root:
    print(child.tag)

Book
Book


<h4>Using the iterator to get specific tags<h4>
<li>In the below example, only the author tags are accessed
<li>For each author tag, the .find function accesses the First_Name and Last_Name tags
<li>The .find function only looks at the children, not other descendants, so be careful!
<li>The .text attribute prints the text in a leaf node

In [63]:
for element in root.iter("Author"):
    print(element.find('First_Name').text,element.find('Last_Name').text)

Richard Berenholtz
Bill Harris
Jorg Brockmann


<h4>Problem: Find the last names of all authors in the tree “root” using xpath</h4>

In [66]:
for element in root.iter("Author"):
    print(element.find('Last_Name').text)

Berenholtz
Harris
Brockmann


In [90]:
# right solution for xpath
for element in root.findall("Book/Authors/Author"):
    print(element.find('Last_Name').text)

Berenholtz
Harris
Brockmann


<h4>Using values of attributes as filters</h4>
<li>Example: Find the first name of the author of a book that weighs 1.5 oz

In [64]:
root.find('Book[@Weight="1.5"]/Authors/Author/First_Name').text

'Richard'

In [65]:
root.find('Book/Remark').text

'\n      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.\n      '

<h4>Problem: Print first and last names of all authors who live in New York City</h4>

In [88]:
for element in root.findall('Book/Authors/Author[@Residence="New York City"]'):
    print(element.find('First_Name').text, element.find("Last_Name").text)

Richard Berenholtz
Jorg Brockmann
