<a href="https://colab.research.google.com/github/MonikaBarget/GeoHumTutorials/blob/master/Colab_Geocoding/Geocode_Plot_Geonames_interactiveMAP_withLabels.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a script for geocoding and plotting spatial information from an EXCEL spreadsheet with an "Addresses" column. The API used is **GeoNames.**

GeoNames is mainly using REST APIs. It offers 40 different webservices.

**Geocoder** for Python supports the following ones:

*   (geocoding) retrieve GeoNames’s geocoded data from a query string, and 
various filters
*   (details) retrieve all geonames data for a given geonames_id
*   (children) retrieve the hierarchy of a given geonames_id
*   (hierarchy) retrieve all children for a given geonames_id

Full documentation: https://geocoder.readthedocs.io/providers/GeoNames.html

The first step is to get COLAB working:


In [2]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


A file path needs to be defined for storing input or output files linked with this script:

In [3]:
directory="/content/drive/My Drive/Colab_FASoS/" ## add your own folder name

Now we can install packages that are not part of Python's standard distribution but are necessary for geocoding and plotting maps. There will most likely be a dependency error for NumPy, but the script should still work.

In [4]:
## install packages that are not part of Python's standard distribution

!pip install geocoder
!pip install basemap
!pip install ipyleaflet
!pip install geojson

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.6/98.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting basemap
  Downloading basemap-1.3.6-cp310-cp310-manylinux1_x86_64.whl (859 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m859.2/859.2 kB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting basemap-data<1.4,>=1.3.2
  Downloading basemap_data-1.3.2-py2.py3-none-any.whl (30.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.5/30.5 MB[0m [31m40.3 MB/s[0m eta [3

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ipyleaflet
  Downloading ipyleaflet-0.17.2-py3-none-any.whl (3.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.7/3.7 MB[0m [31m49.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting traittypes<3,>=0.2.1
  Downloading traittypes-0.2.1-py2.py3-none-any.whl (8.6 kB)
Collecting xyzservices>=2021.8.1
  Downloading xyzservices-2023.2.0-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.4/55.4 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting jedi>=0.16
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xyzservices, traittypes, jedi, ipyleaflet
Successfully installed ipyleaflet-0.17.2 jedi-0.18.2 traittypes-0.2.1 xyzservices-2023.2.0
Looking in indexes: https://p

Now that all packages are installed, we can read the input data (in this case from Github or Google Drive) and display the content in a table.

In [5]:
## import relevant packages for geocoding as well as reading and writing data
import pandas as pd
import geocoder
# command needed for correct plotting in Jupyter Notebooks:
%matplotlib inline 
import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import os
import json
from geojson import Feature, FeatureCollection, Point

## geocode data from spreadsheet

## input addresses in EXCEL format and read
## OPTION 1: from Github as raw file
infile="https://github.com/MonikaBarget/GeoHumTutorials/blob/master/Colab_Geocoding/Addresses_AP3.xlsx?raw=true"

## OPTION 2: from your Google Drive as EXCEL FILE
#infile=directory+"African_Cultural_Institutions.xlsx"

## OPTION 3: from your Google Drive as CSV FILE
#infile=directory+"ATLASSES & COSMOGRAPHIES_utf-8.csv"

## read if EXCEL
addresses_df = pd.read_excel(infile)
display(addresses_df)

## read if CSV
#addresses_df = pd.read_csv(infile, encoding="utf-8", delimiter=";") # delimiter=None # encoding=None # encoding_errors='strict'
#display(addresses_df)

Unnamed: 0,Place Name,Country,City,Address
0,Yinka Shonibare Foundation,Nigeria,Lagos,"19a Hakeem Dickson Dr, Maroko 106104, Lagos, N..."
1,GAS Foundation,Nigeria,Lagos,"9 Hakeem Dickson Dr, Maroko 106104, Lagos, Nig..."
2,Thapong visual art centre,Botswana,Gaborone,"Plot 10144, Gaborone, Botsuana"
3,32 East Ugandan Arts Trust,Uganda,Kampala,"Plot 212 Sonko Close, Kampala, Uganda"
4,Nairobi Contemporary Arts Institute,Kenya,Nairobi,"Rosslyn Riviera Mall, Kenia"
5,Bag Factory Art,South Africa,Johannesburg,"10 Mahlathini St, Newtown, Johannesburg, 2001,..."
6,Operndorf Afrika,Brukina Faso,,"GPV9+PQH, Ziniare, Burkina Faso"
7,Mansion Blatt,Libanon,Beirut,"VFVW+C9R, Beirut, Libanon"
8,Partage,Mauritius,Flic en Flac,"P989+75F, Flic en Flac, Mauritius"
9,Raw Material Company,Senegal,Dakar,"Villa 2a ZONE B, Senegal"


Now we will use the Pandas package to read the content of the address column to a so-called DataFrame. A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. This 2-dimensional structure is often used to manipulate data with programming languages. Our "manipulation" is the act of geocoding.

In [6]:
# read information from address column to dataframe
addresses = addresses_df["City"].values.tolist() # add name of column to geocode

latitudes = []
longitudes = []
ids = []
g_addresses = []

# geocode each address in file
for address in addresses:
	try:
		g = geocoder.geonames(address, key="Mob2023", featureClass='A') # http://www.geonames.org/source-code/javadoc/org/geonames/FeatureClass.html

		if g and len(g):
			geonames_address = g.address
			longitude = g.lng
			latitude = g.lat
			id = g.geonames_id
			#print(geonames_address, longitude, latitude, id) # OPTIONAL: print individual output
		else:
			geonames_address = "0"
			longitude = "0"
			latitude = "0"
			id = "0"

	# add information to lists
		g_addresses.append(geonames_address)
		latitudes.append(latitude)
		longitudes.append(longitude)
		ids.append(id)

	except ValueError:
		print("No more data in file.")

# write information to new columns in dataframe
addresses_df["latitudes"] = latitudes
addresses_df["longitudes"] = longitudes
addresses_df["ids"] = ids
addresses_df["geonames address"] = g_addresses

print("All addresses geocoded!")


All addresses geocoded!


If all addresses have been successfully geocoded, the next step is to check the geocoding and write the results to a new EXCEL file. 


In [7]:
# view geocoded data
display(addresses_df)

# write geocoded places to new file
addresses_df.to_excel(directory+"Addresses_Africa_Geocoded_withID.xlsx")

addresses_df.to_csv(directory+"Addresses_Africa_Geocoded_withID.csv")

Unnamed: 0,Place Name,Country,City,Address,latitudes,longitudes,ids,geonames address
0,Yinka Shonibare Foundation,Nigeria,Lagos,"19a Hakeem Dickson Dr, Maroko 106104, Lagos, N...",-42.59889,-73.96048,3881974,Los Lagos Region
1,GAS Foundation,Nigeria,Lagos,"9 Hakeem Dickson Dr, Maroko 106104, Lagos, Nig...",-42.59889,-73.96048,3881974,Los Lagos Region
2,Thapong visual art centre,Botswana,Gaborone,"Plot 10144, Gaborone, Botsuana",-24.64639,25.91194,11778169,Gaborone
3,32 East Ugandan Arts Trust,Uganda,Kampala,"Plot 212 Sonko Close, Kampala, Uganda",0.33508,32.58313,443339,Kampala District
4,Nairobi Contemporary Arts Institute,Kenya,Nairobi,"Rosslyn Riviera Mall, Kenia",-1.28333,36.83333,184742,Nairobi
5,Bag Factory Art,South Africa,Johannesburg,"10 Mahlathini St, Newtown, Johannesburg, 2001,...",-26.17673,27.96353,8347354,City of Johannesburg Metropolitan Municipality
6,Operndorf Afrika,Brukina Faso,,"GPV9+PQH, Ziniare, Burkina Faso",9.5,2.25,2395170,Benin
7,Mansion Blatt,Libanon,Beirut,"VFVW+C9R, Beirut, Libanon",33.88333,35.5,276780,Beyrouth
8,Partage,Mauritius,Flic en Flac,"P989+75F, Flic en Flac, Mauritius",0.0,0.0,0,0
9,Raw Material Company,Senegal,Dakar,"Villa 2a ZONE B, Senegal",14.76667,-17.28333,2253350,Dakar


Our geocoded data have been written to a new EXCEL file, which is handy for further (manual) data cleaning and data enrichment. But EXCEL is unfortunately not a file format which GIS applications can handle. This is why we also need to export our geocoded data to GeoJSON.

The conversion of a DataFrame to GeoJSON follows the instructions in the following tutorial by Geoff Boeing:

https://notebook.community/captainsafia/nteract/applications/desktop/example-notebooks/pandas-to-geojson

In [8]:
# convert coordinates to floats

addresses_df['latitudes'] = addresses_df['latitudes'].astype(float)
addresses_df['longitudes'] = addresses_df['longitudes'].astype(float)

# ignore places that have not been geocoded

df_geo = addresses_df.dropna(subset=['latitudes', 'longitudes'], axis=0, inplace=False)

# combine information in GeoJSON fromat

def df_to_geojson(df, properties, lat='latitudes', lon='longitudes'):
    # create a new python dict to contain our geojson data, using geojson format
    geojson = {'type':'FeatureCollection', 'features':[]}

    # loop through each row in the dataframe and convert each row to geojson format
    for _, row in df.iterrows():
        # create a feature template to fill in
        feature = {'type':'Feature',
                   'properties':{},
                   'geometry':{'type':'Point',
                               'coordinates':[]}}

        # fill in the coordinates
        feature['geometry']['coordinates'] = [row[lon],row[lat]]

        # for each column, get the value and add it as a new feature property
        for prop in properties:
            feature['properties'][prop] = row[prop]
        
        # add this feature (aka, converted dataframe row) to the list of features inside our dict
        geojson['features'].append(feature)
    
    return geojson

cols = ['City', 'ids', 'geonames address'] # make sure that your column with place names is selected
geojson = df_to_geojson(df_geo, cols)

with open(directory+'Africa.geojson', 'w', encoding='utf-8') as f:
    json.dump(geojson, f, ensure_ascii=False)

Your Google Drive should now contain a file with the "geojson" file ending. We can check if this file has been created and if it is well-formed.

In [9]:
## double-check if GeoJSON file has been created and is well-formed

# load GeoJSON data

with open(directory+'Africa.geojson', 'r') as f2:
    data = json.load(f2)
    print(data)

{'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'properties': {'City': 'Lagos', 'ids': 3881974, 'geonames address': 'Los Lagos Region'}, 'geometry': {'type': 'Point', 'coordinates': [-73.96048, -42.59889]}}, {'type': 'Feature', 'properties': {'City': 'Lagos', 'ids': 3881974, 'geonames address': 'Los Lagos Region'}, 'geometry': {'type': 'Point', 'coordinates': [-73.96048, -42.59889]}}, {'type': 'Feature', 'properties': {'City': 'Gaborone', 'ids': 11778169, 'geonames address': 'Gaborone'}, 'geometry': {'type': 'Point', 'coordinates': [25.91194, -24.64639]}}, {'type': 'Feature', 'properties': {'City': 'Kampala', 'ids': 443339, 'geonames address': 'Kampala District'}, 'geometry': {'type': 'Point', 'coordinates': [32.58313, 0.33508]}}, {'type': 'Feature', 'properties': {'City': 'Nairobi', 'ids': 184742, 'geonames address': 'Nairobi'}, 'geometry': {'type': 'Point', 'coordinates': [36.83333, -1.28333]}}, {'type': 'Feature', 'properties': {'City': 'Johannesburg', 'ids': 8347354,

Now we can plot the geocoded data to an interactive map. The code below is partly based on an Ipyleaflet Tutorial provided by the *Carpentries Incubator*:

https://carpentries-incubator.github.io/jupyter_maps/01-introduction/index.html

In [13]:
# plot map from geocoded data and add labels for all places on map

from ipyleaflet import Map, Marker, Popup
from ipywidgets import HTML

# Create a map centered on a specific location
map = Map(center=(55, 7), zoom=3)

# print(data) # optional to check if GeoJSON file is read correctly

# Loop over the features and create a marker for each one
for feature in data['features']:
    # Get the coordinates and place name from the feature properties
    coords = feature['geometry']['coordinates']
    place_name = feature['properties']['City'] # double-check name of "place name" column
    #print(place_name) # optional to check if place names are correctly displayed

    # Create a marker

    marker = Marker(location=(coords[1], coords[0]))

    # Create pop-up for each location
        
    popup = Popup(
        location=(coords[1], coords[0]),
        close_button=False,
        auto_close=False,
        close_on_escape_key=False,
    )
    popup.children = str(place_name)[:20] # truncate place names to first "n" characters
    print(popup.children)
    message= HTML()
    message.value = popup.children
    marker.popup = message

    # Add the marker to the map
    map.add_layer(marker)

# Display the map
map



Lagos
Lagos
Gaborone
Kampala
Nairobi
Johannesburg
nan
Beirut
Flic en Flac
Dakar
Lome


Map(center=[55, 7], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text…

Congratulations, you have just plotted a new map! At the moment, the map only has markers for the point geometries but no pop-up labels. To embed those, other Python packages will need to be imported first. I will add pop-ups in the next development step. 

Notebook created by: Monika Barget

Latest update: 26 January 2023