## Alex Lee

This project scrapes 15 cities with the worst rush hour traffic in the world from CN Traveler. The intellectual signifance comes from the increasing problem of traffic globally as cities become more developed and populated.

One challenge came from using the API and how it treats spaces differently. I solved this by creating a very specific solution of getting rid of the spaces in the names of cities I knew were multiple words. This isn't a solution I would use normally (too specific) but it worked for this problem. Another challenge was the issue of displaying folium inline, which I wasn't able to overcome.

I only collaborated with the professor and his resources.

I estimate that I spent about six hours on this project.

In [2]:
import urllib2
import lxml
from lxml import html
import xml.etree.ElementTree as ET
import cssselect
import pandas
import json
import folium
import matplotlib.pyplot as plt
import shapely
import shapely.wkt
import geopandas

In [3]:
# Extract cities on the specific page of cnTraveler
url = "http://www.cntraveler.com/story/15-cities-with-the-worst-rush-hour-traffic-in-the-world"
con = urllib2.urlopen(url)
doc_text = con.read()
doc = lxml.html.fromstring(doc_text)

In [4]:
# Store results in a list
cities =[]

# Go through website and get content I want
for row in doc.cssselect("body div.article-body"):
    for li in row.cssselect("ol li"):
        cities.append(li.text_content())

In [5]:
city_country = {}

for city in cities:
    # Get rid of spaces in names
    if city == "Mexico City, Mexico":
        city = "Mexico%20city, Mexico"
    if city == "St. Petersburg, Russia":
        city = "St.%20Petersburg, Mexico"
    if city == "Los Angeles, California, U.S.":
        city = "Los%20Angeles, U.S."
    # Get rid of commas and spaces
    dict_values = city.split(",")
    city_country[dict_values[0]] = dict_values[1].strip()

In [6]:
city_coord = {}

for city, country in city_country.iteritems():
    # Get coordinates from the API and store in a dict
    json_text =  urllib2.urlopen('http://nominatim.openstreetmap.org/search?city=' + city + '&' + country + '=USA&format=json').read()
    json_text = json.loads(json_text)
    lat = json_text[0]['lat']
    lon = json_text[0]['lon']
    coordinate = lat + " " + lon
    city_coord[city] = str(coordinate)
print city_coord

{'Bucharest': '44.4361414 26.1027202', 'Beijing': '39.9059631 116.391248', 'Santiago': '-33.4377967 -70.650445', 'Istanbul': '41.0096334 28.9651646', 'Jakarta': '-6.1753941 106.827183', 'Moscow': '55.7506828 37.6174976', 'Guangzhou': '23.1300037 113.259001', 'Zhuhai': '22.2657516 113.568045', 'Shijiazhuang': '38.0359808 114.4627725', 'Bangkok': '-7.8464229 112.1005469', 'Los%20Angeles': '34.054935 -118.2444759', 'St.%20Petersburg': '27.7703796 -82.6695084', 'Mexico%20city': '19.4326009 -99.1333415', 'Chongqing': '29.5585712 106.5492822', 'Shenzhen': '22.5442673 114.0545327'}


In [7]:
df = pandas.DataFrame(city_coord.values(),index=city_coord.keys(), columns = ['geometry'])
df

Unnamed: 0,geometry
Bucharest,44.4361414 26.1027202
Beijing,39.9059631 116.391248
Santiago,-33.4377967 -70.650445
Istanbul,41.0096334 28.9651646
Jakarta,-6.1753941 106.827183
Moscow,55.7506828 37.6174976
Guangzhou,23.1300037 113.259001
Zhuhai,22.2657516 113.568045
Shijiazhuang,38.0359808 114.4627725
Bangkok,-7.8464229 112.1005469


In [8]:
list_of_points = []

for coordinate in city_coord.values():
    # Save each coordinate as a point object in a list
    list_of_points.append(shapely.wkt.loads('POINT(' + coordinate + ')'))

In [13]:
# Create new data frame
newGeometryGeoSeries = geopandas.GeoSeries(
    list_of_points
)


newGeometryGeoSeries.to_file('MyGeometries.shp', driver='ESRI Shapefile')
newGeometryGeoSeries

0              POINT (44.4361414 26.1027202)
1              POINT (39.9059631 116.391248)
2     POINT (-33.4377967 -70.65044500000001)
3              POINT (41.0096334 28.9651646)
4              POINT (-6.1753941 106.827183)
5              POINT (55.7506828 37.6174976)
6              POINT (23.1300037 113.259001)
7              POINT (22.2657516 113.568045)
8             POINT (38.0359808 114.4627725)
9             POINT (-7.8464229 112.1005469)
10            POINT (34.054935 -118.2444759)
11            POINT (27.7703796 -82.6695084)
12            POINT (19.4326009 -99.1333415)
13            POINT (29.5585712 106.5492822)
14            POINT (22.5442673 114.0545327)
dtype: object

In [10]:
# Create geoDataFrame and plot
global_traffic_map = folium.Map(location=[47.655914, -122.309646],
                            zoom_start = 10000000,
                            tiles='cartodbpositron')


gjson = newGeometryGeoSeries.to_json()
points = folium.features.GeoJson(gjson)
global_traffic_map.add_children(points)

global_traffic_map