### Geocoding in geopandas

Geopandas supports geocoding via a library called ***geopy*** https://anaconda.org/conda-forge/geopy/  , which needs to be installed to use geopandas’ geopandas.tools.geocode() function. geocode() expects a list or pandas.Series of addresses (strings) and returns a GeoDataFrame with resolved addresses and point geometries.

Let’s try this out.

We will geocode addresses stored in a semicolon-separated text file called addresses.txt. These addresses are higher education universities of Sweden.

In [2]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

In [3]:
import pandas
addresses = pandas.read_csv(
    DATA_DIRECTORY /  "addresses.txt",
    sep=";"
)

addresses

Unnamed: 0,ID,addr,name
0,1,"Valhallavägen 1, 371 41 Karlskrona",Blekinge Institute of Technology
1,2,"Chalmersplatsen 4, 412 96 Göteborg",Chalmers University of Technology
2,3,"Högskolegatan 2, 791 88 Falun",Dalarna University
3,4,"Lidingövägen 1, 114 33 Stockholm",Swedish School of Sport and Health Sciences
4,5,"Kristian IV:s väg 3, 301 18 Halmstad",Halmstad University
5,6,"Gjuterigatan 5, 553 18 Jönköping",Jönköping University
6,7,"Valhallavägen 105, 115 51 Stockholm",Royal College of Music
7,8,"Brinellvägen 8, 114 28 Stockholm",KTH Royal Institute of Technology
8,9,"Universitetsgatan 2, 651 88 Karlstad",Karlstad University
9,10,"Solnavägen 1, 171 77 Solna",Karolinska Insitute


We have an id for each row and an address in the addr column.



 ### Geocode addresses using Nominatim

In our example, we will use Nominatim as a geocoding provider. Nominatim is a library and service using OpenStreetMap data, and run by the OpenStreetMap Foundation. Geopandas’ geocode() function supports it natively.

In [4]:
import geopandas

# Trim leading/trailing spaces in column names
addresses.columns = addresses.columns.str.strip()

# Geocode addresses
geocoded_addresses = geopandas.tools.geocode(
    addresses["addr"],
    provider="nominatim",
    user_agent="geopython2024",
    timeout=10
)


In [5]:
geocoded_addresses

Unnamed: 0,geometry,address
0,POINT (15.59074 56.18065),"B Huset, 1, Valhallavägen, Gräsvik, Galgamarke..."
1,POINT (11.97408 57.68966),"4, Chalmersplatsen, Johanneberg, Centrum, Göte..."
2,POINT EMPTY,
3,POINT (18.07882 59.34521),"Stockholms Olympiastadion, 1, Lidingövägen, La..."
4,POINT (12.87929 56.66340),"Högskolan i Halmstad, 3, Kristian IV:s väg, Ny..."
5,POINT (14.16339 57.77842),"Tekniska Högskolan, 5, Gjuterigatan, Tändstick..."
6,POINT EMPTY,
7,POINT (18.07078 59.34982),"8, Brinellvägen, Ruddammen, Norra Djurgården, ..."
8,POINT EMPTY,
9,POINT EMPTY,


As a result we received a GeoDataFrame that contains a parsed version of our original addresses and a geometry column of shapely.geometry.Points that we can use, for instance, to export the data to a geospatial data format.

However, the id column was discarded in the process. To combine the input data set with our result set, we can use pandas’ join operations. It is also important to notice that not all geocoding operations were sucesfull.

 ### Join data frames

Joining data from two or more data frames or tables is a common task in many (spatial) data analysis workflows.Combining data from different tables based on common key attribute can be done easily in pandas/geopandas using the merge() function.

However, sometimes it is useful to join two data frames together based on their index. The data frames have to have the same number of records and share the same index (simply put, they should have the same order of rows).

We can use this approach, here, to join information from the original data frame addresses to the geocoded addresses geocoded_addresses, row by row. The join() function, by default, joins two data frames based on their index. This works correctly for our example, as the order of the two data frames is identical.

In [6]:
geocoded_addresses_with_id = geocoded_addresses.join(addresses)
geocoded_addresses_with_id

Unnamed: 0,geometry,address,ID,addr,name
0,POINT (15.59074 56.18065),"B Huset, 1, Valhallavägen, Gräsvik, Galgamarke...",1,"Valhallavägen 1, 371 41 Karlskrona",Blekinge Institute of Technology
1,POINT (11.97408 57.68966),"4, Chalmersplatsen, Johanneberg, Centrum, Göte...",2,"Chalmersplatsen 4, 412 96 Göteborg",Chalmers University of Technology
2,POINT EMPTY,,3,"Högskolegatan 2, 791 88 Falun",Dalarna University
3,POINT (18.07882 59.34521),"Stockholms Olympiastadion, 1, Lidingövägen, La...",4,"Lidingövägen 1, 114 33 Stockholm",Swedish School of Sport and Health Sciences
4,POINT (12.87929 56.66340),"Högskolan i Halmstad, 3, Kristian IV:s väg, Ny...",5,"Kristian IV:s väg 3, 301 18 Halmstad",Halmstad University
5,POINT (14.16339 57.77842),"Tekniska Högskolan, 5, Gjuterigatan, Tändstick...",6,"Gjuterigatan 5, 553 18 Jönköping",Jönköping University
6,POINT EMPTY,,7,"Valhallavägen 105, 115 51 Stockholm",Royal College of Music
7,POINT (18.07078 59.34982),"8, Brinellvägen, Ruddammen, Norra Djurgården, ...",8,"Brinellvägen 8, 114 28 Stockholm",KTH Royal Institute of Technology
8,POINT EMPTY,,9,"Universitetsgatan 2, 651 88 Karlstad",Karlstad University
9,POINT EMPTY,,10,"Solnavägen 1, 171 77 Solna",Karolinska Insitute


The output of join() is a new geopandas.GeoDataFrame:

In [7]:
type(geocoded_addresses_with_id)

geopandas.geodataframe.GeoDataFrame

The new data frame has all original columns plus new columns for the geometry and for a parsed address that can be used to spot-check the results.

In [8]:
geocoded_addresses.to_file(DATA_DIRECTORY / "addresses.gpkg")

Take a look at the different types of joins that exist: https://www.geeksforgeeks.org/different-types-of-joins-in-pandas/ 