### Process WikiGazetteer

WikiGazetteer is a gazetteer based on Wikipedia and enriched with Geonames data.

To build a WikiGazetteer (into a MySQL database) for a specific Wikipedia version follow [these instructions](https://github.com/Living-with-machines/lwm_GIR19_resolving_places/tree/master/gazetteer_construction). 

This notebook takes the relevant fields in the WikiGazetteer MySQL database and creates a more manageable pickle file.


In [1]:
#!/usr/bin/python
# -*- coding: UTF-8 -*-

import mysql.connector
from mysql.connector import Error
import pandas as pd

In [2]:
def wikigazExtract(language, dbname):
    # Access wikigazetteer database
    gazDB = ""
    cursorGaz = ""
    try:
        gazDB = mysql.connector.connect(
                host='localhost',
                database=dbname,
                user='testGazetteer',
                password='1234')
        if gazDB.is_connected():
            cursorGaz = gazDB.cursor(dictionary=True)
    except Error as e:
        print("Error while connecting to MySQL", e)

    # Query database
    cursorGaz.execute("""
            select altname.altname, location.wiki_title, location.lat, location.lon from altname
            join location on location.id=altname.main_id
            where source != "wikigt"
        """)
    results = cursorGaz.fetchall()

    # Store relevant metadata into pkl
    name = []
    wikititle = []
    latitude = []
    longitude = []
    for x in results:
        altname = x['altname']
        if len(altname.split(" ")) <= 2:
            name.append(x['altname'])
            wikititle.append(x['wiki_title'])
            latitude.append(x['lat'])
            longitude.append(x['lon'])
    wg = pd.DataFrame()
    wg["name"] = name
    wg["wikititle"] = wikititle
    wg["latitude"] = latitude
    wg["longitude"] = longitude
    wg.to_pickle(language + "/wikigaz_" + language + ".pkl")

    # Close connection to gazDB
    if (gazDB.is_connected()):
        cursorGaz.close()
        gazDB.close()

In [3]:
wikigazExtract("en", "wikiGazetteer")
wikigazExtract("es", "wikiGazES")

In [13]:
wges = pd.read_pickle("es/wikiGaz_es.pkl")

In [14]:
wges[wges["wikititle"] == "Barcelona"]

Unnamed: 0,name,wikititle,latitude,longitude
288752,Barcelona,Barcelona,41.3825,2.17694
288753,Barcino,Barcelona,41.3825,2.17694
288754,Bartzelona,Barcelona,41.3825,2.17694
288755,Barzelona,Barcelona,41.3825,2.17694
288756,Barcelono,Barcelona,41.3825,2.17694
288757,Barcelone,Barcelona,41.3825,2.17694
288758,Barselóna,Barcelona,41.3825,2.17694
288759,Barcellona,Barcelona,41.3825,2.17694
288760,Barselona,Barcelona,41.3825,2.17694
288761,Barcillona,Barcelona,41.3825,2.17694


In [15]:
wges.shape

(363780, 4)