<a href="https://colab.research.google.com/github/MonikaBarget/GeoHumTutorials/blob/master/Colab_Geocoding/GEOJSON_from_EXCEL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a script for geocoding and plotting spatial information from an EXCEL spreadsheet with an "Addresses" column. The API used is **GeoNames.**

GeoNames is mainly using REST APIs. It offers 40 different webservices.

**Geocoder** for Python supports the following ones:

*   (geocoding) retrieve GeoNames’s geocoded data from a query string, and 
various filters
*   (details) retrieve all geonames data for a given geonames_id
*   (children) retrieve the hierarchy of a given geonames_id
*   (hierarchy) retrieve all children for a given geonames_id

Full documentation: https://geocoder.readthedocs.io/providers/GeoNames.html

The first step is to get COLAB working:


In [None]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


A file path needs to be defined for storing input or output files linked with this script:

In [None]:
directory="/content/drive/My Drive/Colab_DigiKAR/"

Now we can install packages that are not part of Python's standard distribution but are necessary for geocoding and plotting maps. There will most likely be a dependency error for NumPy, but the script should still work.

In [None]:
## install packages that are not part of Python's standard distribution

!pip install geocoder
!pip install basemap
!pip install ipyleaflet
!pip install geojson

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.6/98.6 KB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting basemap
  Downloading basemap-1.3.6-cp38-cp38-manylinux1_x86_64.whl (863 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m863.9/863.9 KB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
Collecting numpy<1.24,>=1.22
  Downloading numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.1/17.1 MB[0m [31m61.3 M

Now that all packages are installed, we can read the input data (in this case from Github or Google Drive) and display the content in a table.

Now we will use the Pandas package to read the content of the address column to a so-called DataFrame. A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. This 2-dimensional structure is often used to manipulate data with programming languages. Our "manipulation" is the act of geocoding.

If all addresses have been successfully geocoded, the next step is to check the geocoding and write the results to a new EXCEL file. 


In [None]:
import pandas as pd
import geojson
import os

# read file with geocoded data
places = directory+'Geographers/early_modern_geographers_GEO.xlsx' # alternative input from Google Drive'
addresses_df = pd.read_excel(places)

# convert coordinates to floats

addresses_df['latitude'] = addresses_df['latitude'].astype(float)
addresses_df['longitude'] = addresses_df['longitude'].astype(float)

# ignore places that have not been geocoded

df_geo = addresses_df.dropna(subset=['latitude', 'longitude'], axis=0, inplace=False)

# combine information in GeoJSON fromat

def df_to_geojson(df, properties, lat='latitude', lon='longitude'):
    # create a new python dict to contain our geojson data, using geojson format
    geojson = {'type':'FeatureCollection', 'features':[]}

    # loop through each row in the dataframe and convert each row to geojson format
    for _, row in df.iterrows():
        # create a feature template to fill in
        feature = {'type':'Feature',
                   'properties':{},
                   'geometry':{'type':'Point',
                               'coordinates':[]}}

        # fill in the coordinates
        feature['geometry']['coordinates'] = [row[lon],row[lat]]

        # for each column, get the value and add it as a new feature property
        for prop in properties:
            feature['properties'][prop] = row[prop]
        
        # add this feature (aka, converted dataframe row) to the list of features inside our dict
        geojson['features'].append(feature)
    
    return geojson

cols = ['Full Address', 'place1Label']
geojson = df_to_geojson(df_geo, cols)

with open(directory+'early_modern_geographers.geojson', 'w', encoding='utf-8') as f:
    json.dump(geojson, f, ensure_ascii=False)

Your Google Drive should now contain a file with the "geojson" file ending. We can check if this file has been created and if it is well-formed.

In [None]:
## double-check if GeoJSON file has been created and is well-formed

# load GeoJSON data

with open(directory+'AP3.geojson', 'r') as f2:
    data = json.load(f2)
    print(data)

{'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'properties': {'Addresses': 'Aachen', 'ids': 3247448, 'geonames address': 'Städteregion Aachen'}, 'geometry': {'type': 'Point', 'coordinates': [6.24194, 50.75389]}}, {'type': 'Feature', 'properties': {'Addresses': 'Abderode', 'ids': '0', 'geonames address': '0'}, 'geometry': {'type': 'Point', 'coordinates': [0.0, 0.0]}}, {'type': 'Feature', 'properties': {'Addresses': 'Alach', 'ids': '0', 'geonames address': '0'}, 'geometry': {'type': 'Point', 'coordinates': [0.0, 0.0]}}, {'type': 'Feature', 'properties': {'Addresses': 'Algesheim', 'ids': 6557928, 'geonames address': 'Gau-Algesheim'}, 'geometry': {'type': 'Point', 'coordinates': [8.00946, 49.94992]}}, {'type': 'Feature', 'properties': {'Addresses': 'Altdorf', 'ids': 7285057, 'geonames address': 'Altdorf (UR)'}, 'geometry': {'type': 'Point', 'coordinates': [8.64091, 46.88834]}}, {'type': 'Feature', 'properties': {'Addresses': 'Alzey', 'ids': 2956708, 'geonames address': 'Lan

Now we can plot the geocoded data to an interactive map. The code below is partly based on an Ipyleaflet Tutorial provided by the *Carpentries Incubator*:

https://carpentries-incubator.github.io/jupyter_maps/01-introduction/index.html

In [None]:
## plot geocoded data on interactive map

# initialise interactive map

from ipyleaflet import Map, basemaps, GeoJSON, LayersControl
import random

# customise map

map = Map(center = (55, 7), zoom = 5, min_zoom = 1, max_zoom = 20, 
    basemap=basemaps.Stamen.Terrain)

# add functionality to add or remove layers to map itself

map.add_control(LayersControl())

def random_color(feature):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
    }

geo_json = GeoJSON(
    data=data,
    style={
        'opacity': 1, 'dashArray': '7', 'fillOpacity': 0.1, 'weight': 2
    },
    hover_style={
        'color': 'red', 'dashArray': '0', 'fillOpacity': 0.5
    },
    style_callback=random_color
)

# add geocoded data to map

map.add_layer(geo_json)

map


Map(center=[55, 7], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_text…

Congratulations, you have just plotted a new map! At the moment, the map only has markers for the point geometries but no pop-up labels. To embed those, other Python packages will need to be imported first. I will add pop-ups in the next development step. 

Notebook created by: Monika Barget

Latest update: 26 January 2023