# Aggregating data

Data aggregation refers to a process where we combine data into groups. When doing spatial data aggregation, we merge the geometries together into coarser units (based on some attribute), and can also calculate summary statistics for these combined geometries from the original, more detailed values. For example, suppose that we are interested in studying continents, but we only have country-level data like the country dataset. If we aggregate the data by continent, we would convert the country-level data into a continent-level dataset.

In this tutorial, we will aggregate our travel time data by land use class, i.e., the polygons cells that have the same land use class in the flooded areas will be merged.

Let's start with loading `intersection.shp`, the output file of the previous section:


In [None]:
import pathlib 
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "Data_L4"

In [None]:
import geopandas

intersection = geopandas.read_file(
    DATA_DIRECTORY
    / "intersection.shp"
)



In [None]:
intersection.plot(column='KATEGORI')

In [None]:
intersection

In [None]:


# Dissolve polygons by 'KATEGORI', aggregating their geometries
dissolved = intersection.dissolve(by='KATEGORI')

# Calculate the area of the dissolved geometries
dissolved['dissolved_area'] = dissolved.geometry.area

print(dissolved['dissolved_area'])


In [None]:
print(f"Rows in original intersection GeoDataFrame: {len(intersection)}")
print(f"Rows in dissolved layer: {len(dissolved)}")

In [None]:
dissolved

In [None]:
dissolved.plot()

Let’s see what columns we have now in our GeoDataFrame:

In [None]:
dissolved.columns

As we can see, the column that we used for conducting the aggregation (CATEGORI) can not be found from the columns list anymore. What happened to it?

In [None]:
dissolved.index

It is now used as index in our dissolved GeoDataFrame!

In [None]:
dissolved

In [None]:
# Select only geometries that are for class tatorter

Tätort_class = dissolved.loc['Tätort']
Tätort_class

In [None]:
# See the data type
type(Tätort_class)

In [None]:
# See the data
Tätort_class.head()

Let’s also visualize areas only for the class urban (tätort).

First, we need to convert the selected row back to a GeoDataFrame:

In [None]:
# Create a GeoDataFrame
selection = geopandas.GeoDataFrame([Tätort_class], crs=dissolved.crs)

In [None]:
# visualise residential areas that are flooded in the case study
ax = dissolved.plot(facecolor="gray")
selection.plot(ax=ax, facecolor="red")

Another way to visualize the class urban in the entire GeoDataFrame is to plot using one specific column. In order to use our KATEGORI column, which is now the index of the GeoDataFrame, we need to reset the index:

In [None]:
dissolved = dissolved.reset_index()
dissolved.head()

In [None]:
dissolved.plot(column="KATEGORI")