# Simplify polygon/multipolygon

## Overview
This notebook contains a Python script for simplifying vector datasets using the Topojson library. This library was used especially because it respects the boundaries of the multi-polygons, which means it ensures that the boundaries are respected and don't overlap after the process.

### Parameters
- **`input_path` (str)**: Path of the input vector file.

## Documentation:
- geopandas simplify: https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.simplify.html
- shapely simplify: https://shapely.readthedocs.io/en/latest/manual.html#object.simplify
## References:
- Check memory usage: https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe
- Check geometries: https://stackoverflow.com/questions/60780959/how-to-filter-a-geodataframe-by-geometry-type
- Get polygon areas: https://gis.stackexchange.com/questions/218450/getting-polygon-areas-using-geopandas
- Get number of vertex: https://gis.stackexchange.com/questions/328884/counting-number-of-vertices-in-geopandas
    
## Author
- **Rubén Crespo Ceballos**

In [1]:
import geopandas as gpd
import pandas as pd
import os
import topojson as tp
import psutil

In [None]:
def usage():
    """
    Prints the current memory usage of the process in megabytes (MB).
    
    Returns:
    - None: The function prints the memory usage directly to the console.
    """
    process = psutil.Process(os.getpid())
    return print("Memory status: ", process.memory_info()[0] / float(2 ** 20))

def create_folder_if_not_exists(folder_path):
    """
    Create a folder if it doesn't exist.

    Parameters:
    folder_path (str): The path of the folder to be created.
    """
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Folder created at: {folder_path}")
    else:
        print(f"Folder already exists at: {folder_path}")


def get_geometries_info(gdf):
    """
    Get the geometry properties as a dictionary in type and number of features.

    Parameters:
    - gdf: GeoDataFrame.

    Returns:
    - Dictionary with geometry type key number of features value.
    """
    geometry_dic = {}
    geometry_type = gdf.geom_type.unique().tolist() #we create a list of unique geometries
    if len(geometry_type) > 0: #if there are more than one type we list them
        for i in geometry_type:
            geometry_gdf = gdf[gdf.geom_type == i] #new df with the geometry
            geometry_rows = geometry_gdf.shape[0] #number of elements
            geometry_dic.update({i:geometry_rows}) #add to dic
            
    usage() 
    return print(geometry_dic)


def get_area(gdf):
    """
    Get the area of a single geometry of a GeoDataFrame.

    Parameters:
    - gdf: GeoDataFrame to calculate areas.

    Returns:
    - Float value with the total area in square kilometers.
    """
    # projection must be projected to calculate the area in meters
    if gdf.crs.is_geographic is True: 
        # we have to change the projection to meters
        # Search for the optimal projection of the country: https://epsg.io/ For specific areas, you might want to use a local CRS
        gdf = gdf.to_crs("EPSG:3857") # Here, we use EPSG:3857 (Web Mercator in meters) for simplicity

    # Dissolve all geometries into one
    dissolved_geom = gdf.unary_union
    calculated_area = dissolved_geom.area / 10**6 # Calculate the area in square meters and convert to square kilometers
    return calculated_area

def get_number_of_vertex(gdf):
    """
    Calculates the total number of vertices of a GeoDataFrame.

    Parameters:
    - gdf: GeoDataFrame containing geometry data (e.g., Polygons or MultiPolygons).

    Returns:
    - n: Total number of vertices (int) in the first geometry of the GeoDataFrame.
         - For MultiPolygons, sums the vertices from all constituent Polygons.
         - For Polygons, counts the vertices in the exterior boundary only.
    """
    
    for x, row in gdf.iterrows():
        geom = row.geometry
        if geom.geom_type == "MultiPolygon":
            n = 0
            # iterate over all parts of multigeometry
            for polygon in geom.geoms:
                n += len(polygon.exterior.coords)
        else:
            n = len(row.geometry.exterior.coords) 
    return n

def simplify_vector_file(gdf, input_path, output_path, simplification_units):
    """
    Simplifies a GeoDataFrame at multiple precision levels, computes area and vertex statistics, and exports the simplified geometries to shapefiles.

    Parameters:
    - gdf: GeoDataFrame to process.
    - input_path: Path to the input shapefile.
    - output_path: Path of the output shapefile.
    - simplification_units: List of precision levels for simplification.

    Returns:
    - df_output: DataFrame summarizing the simplification results.
    """
    # Initialize output DataFrame
    df_output = pd.DataFrame([], columns=['precision', 'size_km2', 'area_percentage', 'number_of_vertex', 'vertex_percentage', 'area_vertex_ratio']) # create empty dataframe
    
    # Step 1: Initial Calculations
    info = [0]  # Initialize info with a precision of 0
    initial_calc_area = get_area(gdf)
    info.append(initial_calc_area)  # Area in km²
    info.append(100)  # Initial area percentage (100%)
    initial_vertex_number = get_number_of_vertex(gdf)
    info.append(initial_vertex_number)  # Number of vertices
    info.append(100)  # Initial vertex percentage (100%)
    info.append(initial_calc_area / initial_vertex_number)  # Area-to-vertex ratio
    df_output = pd.concat([df_output, pd.DataFrame([info], columns=df_output.columns)], axis=0)
    
    
    # Step 2: Simplify and Outputs
    topo = tp.Topology(gdf.to_crs(epsg=gdf.crs), prequantize=False)

    for unit in simplification_units:
        info = [unit]  # Start a new info list with the current precision
        simplification = topo.toposimplify(unit).to_gdf()  # Simplify geometries

        # Calculate area and percentages
        calc_area = get_area(simplification)
        info.append(calc_area)  # Area in km²
        info.append(calc_area * 100 / initial_calc_area)  # Area percentage

        # Calculate number of vertices and percentages
        vertex_number = get_number_of_vertex(simplification)
        info.append(vertex_number)  # Number of vertices
        info.append(vertex_number * 100 / initial_vertex_number)  # Vertex percentage

        # Calculate area-to-vertex ratio
        info.append(calc_area / vertex_number)

        # Append results to output DataFrame
        df_output = pd.concat([df_output, pd.DataFrame([info], columns=df_output.columns)], axis=0)

        # Export simplified geometry
        output_file = os.path.join(output_path, os.path.splitext(os.path.basename(input_path))[0] + f"_simplified_{unit}.shp")
        simplification.to_file(output_file, index=False)
        print(f"Finished simplification at precision: {unit}")

    return df_output


In [None]:
"""Inputs"""
# Single file
input_path = r"Z:\z_resources\ruben\gadm_col2\gadm41_col_2.shp"
output_path = input_path + "\output_files"

Memory status:  225.1484375


In [None]:
"""Take a look at the data geometry parameters"""
create_folder_if_not_exists(output_path)
gdf = gpd.read_file(input_path)
get_geometries_info(gdf)
# geometry_str = ''.join(geometry_gdf.geom_type.unique().tolist()) #transform the list to string

usage()

## Simplification by Topojson

Basic concept:
```python
unit = 0.005
topo = tp.Topology(gdf.to_crs(epsg='4326'), prequantize=False)
simplification = topo.toposimplify(unit).to_gdf()
simplification.plot()

Here we are going to compare different preccision units and see the differences

In [None]:
"""make an output for different levels"""
# units = [0.00005, 0.00015, 0.00020]
# units = [0.000005, 0.000015, 0.000020]
units = [0.000005, 0.000015, 0.00002, 0.00005, 0.00015, 0.0002] # Descending order

df_output = simplify_vector_file(gdf, input_path, output_path, units)

# Simplification by Geopandas
- It wont work for multipolygons since the algorithm is not topology aware
- Issue source:  https://gis.stackexchange.com/questions/325766/geopandas-simplify-results-in-gaps-between-polygons
```Python
gdf_simplified = gdf.simplify(100, preserve_topology=True)