## A cautionary tale about data

Nice clean data that is perfectly accurate and ready to use doesn't exist. Get over it. Most of the stuff you can get your hands into has to be scrubbed clean, aligned properly, scaled. Good data is hard to come by. Here are some tips over how you can get data when you need it.

## Structured data testing tool

Many of the major websites who are good citizens of the net provide [microformats](http://microformats.org/) within their HTML. So that other machines can read the content of those pages and extract contextual information form them. You can find out if a particular page has uses microformats by using Google's tool.

Google has this wonderful tool called the [Structured Data Testing Tool](https://developers.google.com/structured-data/testing-tool/) that allows you to paste any dataset or fetch any URL and it will tell you if it can find **any** structured data. Hard to emphasize how awesome this is.

## Geo coordinates

There are geolocation servers and full gazetteers online, so you geographic info is not difficult to come by. But if you are in a rush and you don't need much, just one city you can get its geolocation from wikipedia.

In [17]:
import requests
from lxml import html

URL = 'https://en.wikipedia.org/wiki/Trabzon'

# In the case of extracting content from Wikipedia, be sure to
# review its "Bot Policy," which is defined at
# http://meta.wikimedia.org/wiki/Bot_policy#Unacceptable_usage

page = requests.get(URL)

# check if request was successful
if (page.status_code == 200):
    markup = page.text  # get contents of page
    tree = html.fromstring(markup)  # load it in our scraping tool of choice

lat, lon = tree.xpath('//span[@class="geo"]')[0].text_content().split('; ')
print lat, lon

41.000 39.733


## Creating a map to display geolocation
To display the geolocation we just fetched we can just inline a google map at those coordinates.

In [23]:
from IPython.display import IFrame
from IPython.core.display import display

print lat, lon

# Google Maps URL template for an iframe
google_maps_url = "http://maps.google.com/maps?q={0}+{1}&ie=UTF8&t=h&z=12&{0},{1}&output=embed".format(lat, lon)

display(IFrame(google_maps_url, '425px', '350px'))

41.000 39.733
