### Geocoding in geopandas

Geopandas supports geocoding via a library called ***geopy*** https://geopy.readthedocs.io/en/stable/  , which needs to be installed to use geopandas’ geopandas.tools.geocode() function. geocode() expects a list or pandas.Series of addresses (strings) and returns a GeoDataFrame with resolved addresses and point geometries.

Let’s try this out.

We will geocode addresses stored in a semicolon-separated text file called addresses.txt. These addresses are higher education universities of Sweden.

# NOTE: In case you are running google colab, it is better to export to a geopackage '.gpkg' rather than .shp formats to avoid export errors

In [None]:
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

In [None]:
import pandas
addresses = pandas.read_csv(
    DATA_DIRECTORY /  "addresses.txt",
    sep=";"
)

addresses

We have an id for each row and an address in the addr column.



 ### Geocode addresses using Nominatim

In our example, we will use Nominatim as a geocoding provider. Nominatim (https://nominatim.org/) is a library and service using OpenStreetMap data, and run by the OpenStreetMap Foundation. Geopandas’ geocode() function supports it natively.

In [None]:
import geopandas

# Trim leading/trailing spaces in column names
addresses.columns = addresses.columns.str.strip()

# Geocode addresses
geocoded_addresses = geopandas.tools.geocode(
    addresses["addr"],
    provider="nominatim",
    user_agent="geopython2024",
    timeout=10
)




In [None]:
geocoded_addresses

As a result we received a GeoDataFrame that contains a parsed version of our original addresses and a geometry column of shapely.geometry.Points that we can use, for instance, to export the data to a geospatial data format.

However, the id column was discarded in the process. To combine the input data set with our result set, we can use pandas’ join operations. It is also important to notice that not all geocoding operations were sucesfull.

 ### Join data frames

Joining data from two or more data frames or tables is a common task in many (spatial) data analysis workflows.Combining data from different tables based on common key attribute can be done easily in pandas/geopandas using the merge() function.

However, sometimes it is useful to join two data frames together based on their index. The data frames have to have the same number of records and share the same index (simply put, they should have the same order of rows).

We can use this approach, here, to join information from the original data frame addresses to the geocoded addresses geocoded_addresses, row by row. The join() function, by default, joins two data frames based on their index. This works correctly for our example, as the order of the two data frames is identical.

In [None]:
geocoded_addresses_with_id = geocoded_addresses.join(addresses)
geocoded_addresses_with_id

The output of join() is a new geopandas.GeoDataFrame:

In [None]:
type(geocoded_addresses_with_id)

The new data frame has all original columns plus new columns for the geometry and for a parsed address that can be used to spot-check the results.

In [None]:
geocoded_addresses.to_file(DATA_DIRECTORY / "addresses.gpkg")

Take a look at the different types of joins that exist: https://www.geeksforgeeks.org/different-types-of-joins-in-pandas/ 