This is a script for geocoding and plotting spatial information from an EXCEL spreadsheet with an "Addresses" column. The API used is **GeoNames.**

GeoNames is mainly using REST APIs. It offers 40 different webservices.

**Geocoder** for Python supports the following ones:

*   (geocoding) retrieve GeoNames’s geocoded data from a query string, and 
various filters
*   (details) retrieve all geonames data for a given geonames_id
*   (children) retrieve the hierarchy of a given geonames_id
*   (hierarchy) retrieve all children for a given geonames_id

Full documentation: https://geocoder.readthedocs.io/providers/GeoNames.html

Now we can install packages that are not part of Python's standard distribution but are necessary for geocoding and plotting maps. There will most likely be a dependency error for NumPy, but the script should still work.

In [None]:
## install packages that are not part of Python's standard distribution

!pip install --upgrade pip

!pip install geocoder
!pip install basemap

Now that all packages are installed, we can read the input data (in this case from Github or Google Drive) and display the content in a table.

In [None]:
## import relevant packages
import pandas as pd
import geocoder
# command needed for correct plotting in Jupyter Notebooks:
%matplotlib inline 
import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import os

## geocode data from spreadsheet

# input addresses in EXCEL format and read
#infile="https://github.com/MonikaBarget/GeoHumTutorials/blob/master/Colab_Geocoding/Addresses_AP3.xlsx?raw=true"

addresses_df = pd.read_excel('Ortsontologie-Geocoded-newplaces.xlsx', dtype=str)
display(addresses_df)

Now we will use the Pandas package to read the content of the address column to a so-called DataFrame. A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. This 2-dimensional structure is often used to manipulate data with programming languages. Our "manipulation" is the act of geocoding.

In [None]:
import geocoder
import pandas as pd

# Combine address and continent for geocoding
addresses = addresses_df['place_name'].values.tolist()

latitudes = []
longitudes = []
geonames_ids = []
g_addresses = []

# Geocode each combined address
for a in addresses:
    address = a + ', Europe'
    try:
        g = geocoder.geonames(address, key="YOURKEY", featureClass='P')
        g2 = geocoder.geonames(a.split(", ")[-1], key="YOURKEY", featureClass='P')
        g3 = geocoder.geonames(address, key="YOURKEY", featureClass='S')
        g4 = geocoder.geonames(address, key="YOURKEY", featureClass='A')
        
        # check if full address exists

        if g and len(g):
            geonames_address = g.address
            longitude = g.lng
            latitude = g.lat
            geonames_id = g.geonames_id
            # print(geonames_address, longitude, latitude, id) # OPTIONAL: print individual output
    
        # check if region exists
    
        elif g2 and len(g2):
            print(a.split(", ")[-1])
            geonames_address = g.address
            longitude = g.lng
            latitude = g.lat
            geonames_id = g.geonames_id
            # print(geonames_address, longitude, latitude, id) # OPTIONAL: print individual output
    
        # check if full address is castle
    
        elif g3 and len(g3):
            print(a.split(", ")[-1])
            geonames_address = g.address
            longitude = g.lng
            latitude = g.lat
            geonames_id = g.geonames_id
            # print(geonames_address, longitude, latitude, id) # OPTIONAL: print individual output
            
        # check if full address is area
        
        elif g4 and len(g4):
            print(a.split(", ")[-1])
            geonames_address = g.address
            longitude = g.lng
            latitude = g.lat
            geonames_id = g.geonames_id
            # print(geonames_address, longitude, latitude, id) # OPTIONAL: print individual output
        
        else:
            geonames_address = "N/A"
            longitude = "N/A"
            latitude = "N/A"
            geonames_id = "N/A"

        # Add information to lists
        g_addresses.append(geonames_address)
        latitudes.append(latitude)
        longitudes.append(longitude)
        geonames_ids.append(geonames_id)

    except ValueError:
        print("No more data in file.")

# Write information to new columns in dataframe
addresses_df["latitudes"] = latitudes
addresses_df["longitudes"] = longitudes
addresses_df["ids"] = geonames_ids
addresses_df["geonames address"] = g_addresses

print("All addresses geocoded!")



If all addresses have been successfully geocoded, the next step is to check the geocoding and write the results to a new EXCEL file. 


In [None]:
# view geocoded data
display(addresses_df)

# write geocoded places to new file
addresses_df.to_excel("Addresses_AP3ALLCLASSES_Geocoded_withID.xlsx")

Now we can plot the geocoded data to a 2-dimensional, static map. As this map is a world-map, the individual places will only be visible as small coloured dots.

In [None]:
## plot geocoded data

# read file with geocoded data
places = 'Addresses_AP3NEW_Geocoded_withID.xlsx'
data = pd.read_excel(places)

# set the size of the map
fig = plt.figure(figsize=(15,10))

# create the map - set latitude and longitude
m = Basemap(projection = 'mill', llcrnrlat = -90, urcrnrlat = 90, llcrnrlon = -180, urcrnrlon = 180, resolution = 'c')

# draw the coastline
m.drawcoastlines()
m.drawcountries(color='gray') # OPTIONAL: define color for modern country borders
m.drawstates(color='gray') # OPTIONAL: define color for borders of modern US federal states

# write variables for latitude and longitude to list
lat = data['latitudes'].tolist()
lon = data['longitudes'].tolist()

# plot the map
m.scatter(lon, lat, latlon = True, s = 10, c = 'orange', marker = 'o', alpha = 1)

plt.show()

Congratulations, you have just plotted a new map! You can save the image by rightclicking with your mouse and downloading it to your local drive.
