# Geopandas: an introduction

In this section, we will cover the basics of *geopandas*, a Python library to
interact with geospatial vector data.

[Geopandas](https://geopandas.org/) provides an easy-to-use interface to vector
data sets. It combines the capabilities of *pandas*, the data analysis package
we got to know in the [Geo-Python
course](https://geo-python-site.readthedocs.io/en/latest/lessons/L5/pandas-overview.html),
with the geometry handling functionality of
[shapely](../lesson-1/geometry-objects), the [geo-spatial file format support
of fiona](vector-data-io) and the [map projection libraries of
pyproj](map-projections).

The main data structures in geopandas are `GeoDataFrame`s and `GeoSeries`. They
extend the functionality of `pandas.DataFrame`s and `pandas.Series`. This means
that **we can use all our *pandas* skills also when we work with
*geopandas*!**. 

:::{tip}

If you feel like you need to refresh your memory about pandas, head back to
[lesson
5](https://geo-python-site.readthedocs.io/en/latest/lessons/L5/pandas-overview.html)
and [lesson
6](https://geo-python-site.readthedocs.io/en/latest/notebooks/L6/advanced-data-processing-with-pandas.html)
of Geo-Python.
:::

There is one key difference between pandas’s data frames and geopandas’
[`GeoDataFrame`s](https://geopandas.org/en/stable/docs/user_guide/data_structures.html#geodataframe):
a `GeoDataFrame` contains an additional column for geometries. By default, the
name of this column is `geometry`, and it is a
[`GeoSeries`](https://geopandas.org/en/stable/docs/user_guide/data_structures.html#geoseries)
that contains the geometries (points, lines, polygons, ...) as
`shapely.geometry` objects.

# Reading and Writind data in geopandas

There are various different file formats and data sources for geographic information. This tutorial will show some typical examples how to read (and write) data from different sources.

# Map projections

A **coordinate reference systems (CRS)** is a crucial piece of metadata for any
geospatial data set. Without a CRS, the geometries would simply be a collection
of coordinates in an arbitrary space. Only the CRS allows GIS software,
including the Python packages we use in this course, to relate these
coordinates to a place on Earth (or other approximately spherical objects or
planets).

Often conflated with coordinate reference systems, and definitely closely
related, are **map projections**. Map projections, also called *projected
coordinate systems*, are mathematical models that allow us to transfer
coordinates on the surface of our **three-dimensional Earth** into coordinates
in a planar surface, such as a **flat, two-dimensional map**. In contrast to
projected coordinate systems, *geographic coordinate systems* simply directly
use latitude and longitude, i.e. the degrees along the horizontal and vertical
great circles of a sphere approximating the Earth, as the x and y coordinates
in a planar map. Finally, there are both projected and geographic coordinate
systems that make use of more complex ellipsoids than a simple sphere to better
approximate the ‘potato-shaped’ reality of our planet. The full CRS information
needed to accurately relate geospatial information to a place on Earth includes
both (projected/geographic) coordinate system and ellipsoid.

The CRS in different spatial datasets differ fairly often, as different
coordinate systems are optimised for certain regions and purposes. No
coordinate system can be perfectly accurate around the globe, and the
transformation from three- to two-dimensional coordinates can not be accurate
in angles, distances, and areas simultaneously.

Consequently, it is a common GIS task to **transform** (or reproject) a data
set from one references system into another, for instance, to make two layers
interoperatable. Comparing two data sets that have different CRS would
inevitably produce wrong results; for example, finding points contained within
a polygon cannot work, if the the points have geographic coordinates (in
degrees), and the polygon is in the national Finnish reference system (in
meters).

Choosing an appropriate projection for your map is not always straightforward.
It depends on what you actually want to represent in your map, and what your
data’s spatial scale, resolution and extent are. In fact, there is not a single
‘perfect projection’; each has strengths and weaknesses, and you should choose
a projection that fits best for each map. In fact, the projection you choose
might even tell something about you:


:::{figure} https://imgs.xkcd.com/comics/map_projections.png
:alt: What’s that?  You think I don’t like the Peters map because I’m uncomfortable with having my cultural assumptions challenged? Are you sure you’re not ... *puts on sunglasses* ... projecting?

The XKCD web comic had it figured out long ago: ‘What your favourite map
projection tells about you’. *Source: [xkcd.com](https://xkcd.com/977)*

:::
    

:::{note}

For those of you who prefer a more analytical approach to choosing map
projections: you can get a good overview from
[georeference.org](http://www.georeference.org/doc/guide_to_selecting_map_projections.htm),
and this blog post discussing [the strengths and weaknesses of a few commonly
used projections](http://usersguidetotheuniverse.com/index.php/2011/03/03/whats-the-best-map-projection/).
The web page *Radical Cartography* has an excellent [overview of which
projections fit which extent of the world for which
topic](https://radicalcartography.net/projectionref.html).

:::


---


## Handling coordinate reference systems in Geopandas

Once you have figured out which map projection to use, handling coordinate
reference systems, fortunately, is fairly easy in Geopandas. The library
[pyproj](https://pyproj4.github.io/pyproj/) provides additional information
about a CRS, and can assist with more tricky tasks, such as guessing the
unknown CRS of a data set.

In this section we will learn **how to retrieve the coordinate reference system
information of a data set, and how to re-project the data into another CRS**. 


:::{admonition} Careful with Shapefiles
:class: caution

You might have noticed that geospatial data sets in *ESRI Shapefile* format are
consisting of multiple files with different file extensions. The `.prj` file
contains information about the coordinate reference system. Be careful not to
loose it!
:::


### Displaying the CRS of a data set

We will start by loading a data set of EU countries that has been downloaded
from the [*Geographic Information System of the Commission*
(GISCO)](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/countries),
a unit within Eurostat that manages geospatial data for the European
Commission.