# Configurable Performance Analysis Results

The following notebook can be used to run the exact same tests used for generating the results in the eScience 2025 GEOtiled paper submission, but allow configuration of various input parameters to test for other desired cases.

Some tests will take very long to run if using large input data (up to a few days), so it is recommended to run this notebook in the background if doing so.

In this version of the notebook, the user is responsible for curating their own input data.

Ensure a VM of sufficient cores and RAM is used based on the desired input data and results to be collected.

## Initialization

The below cells import requires libraries and other initializations and should be ran before the rest of the notebook.

In [None]:
from pathlib import Path
import geotiled
import tools

Ensure to set the working directory to a place that has sufficient available space.

In [None]:
# Set working directory
working_directory = '/media/volume/gabriel-geotiled/configurable_test'
geotiled.set_working_directory(working_directory)

# Create folder to store memory logs
Path('mem_logs').mkdir(parents=True, exist_ok=True)

# Create folder to store all graph images
Path('imgs').mkdir(parents=True, exist_ok=True)

## Optimization Test

The following test is used to generates results mimicing those in Table II of the eScience paper. It will always compute with both the unoptimized and optimized versions of GEOtiled.

In [None]:
# Specify path to input data and correlating tile size to use
input_data = {"ts2500.tif": 2500,
              "ts15000.tif": 15000}

# Specify desired libraries to use
libraries = ['GDAL','SAGA']

# Specify desired terrain parameters to compute 
terrain_parameters = ['slope','aspect','hillshade']

# Specify the desired number of concurrent processes
concurrent_processes = 64

# Specify the number of runs to do
runs = 10

In [None]:
# Run test
for method in ['unoptimized','optimized']:
    for data in input_data:
        for lib in libraries:
            for tp in terrain_parameters:
                for run in range(runs):
                    !{tools.get_file_directory()}/optimization_test_files/start_test.sh {method} {data} {input_data[data]} {lib} {tp} {concurrent_processes} {run}

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('optimization_test_results.csv', test='optimizations')

In [None]:
# Average together results of multiple runs
tools.average_together_results('optimization_test_results.csv', test='optimizations')

In [None]:
# Print results of test
tools.print_optimization_results('averaged_optimization_test_results.csv')

## Chaning Tile Size Test

The following test is used to generate results mimicing those in Figure 2 of the eScience paper. It will always compute results using GDAL, SAGA, GEOtiled-G, and GEOtiled-SG.

In [None]:
# Specify path to input data and correlating tile size to use
input_data = {"ts2500.tif": 2500,
              "ts5000.tif": 5000,
              "ts7500.tif": 7500,
              "ts10000.tif": 10000,
              "ts12500.tif": 12500,
              "ts15000.tif": 15000}

# Specify which terrain parameters to compute
terrain_parameters = ['slope','aspect','hillshade']

# Specify the desired number of concurrent processes
concurrent_processes = 64

# Specify the number of runs to do
runs = 10

In [None]:
# Run test
for method in ['GDAL','SAGA','GEOtiled-G','GEOtiled-SG']:
    for data in input_data:
        for tp in terrain_parameters:
            for run in range(runs):
                !{tools.get_file_directory()}/tile_size_test_files/start_test.sh {method} {data} {input_data[data]} {tp} {concurrent_processes} {run}

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('tile_size_test_results.csv', test='tile_sizes')

In [None]:
# Average together results of multiple runs
tools.average_together_results('tile_size_test_results.csv', test='tile_sizes')

In [None]:
# Plot results
for tp in terrain_parameters:
    tools.plot_tile_size_results('averaged_tile_size_test_results.csv', tp, ylims1=[0,300], ylims2=[0,50], zoom_ylims=[0,200], use_legend=True)

## Changing Process Count Test

The following test is used to generate results mimicing those in Figure 3 of the eScience paper. It will always compute results using GEOtiled-G and GEOtiled-SG.

In [None]:
# Specify path to input data and correlating tile size to use
input_data = {"ts2500.tif": 2500,
              "ts15000.tif": 15000}

# Specify which terrain parameters to compute
terrain_parameters = ['slope','aspect','hillshade']

# Specify the desired number of concurrent processes to use
concurrent_processes = [1, 2, 4, 8, 16, 32, 64]

# Specify the number of runs to do
runs = 10

In [None]:
# Run test
for method in ['GEOtiled-G','GEOtiled-SG']:
    for data in input_data:
        for tp in terrain_parameters:
            for cp in concurrent_processes:
                for run in range(runs):
                    !{tools.get_file_directory()}/process_count_test_files/start_test.sh {method} {data} {input_data[data]} {tp} {cp} {run}

In [None]:
# Average together results of multiple runs
tools.average_together_results('process_count_test_results.csv', test='process_counts')

In [None]:
# Plot results
for tp in terrain_parameters:
    for data in input_data:
        tools.plot_process_count_results('averaged_process_count_test_results.csv', tp, input_data[data], ylims=[0,40], use_legend=True)

## Changing Topographic Region Test

The following test is used to generate results mimicing those in Figure 4 and 5 of the eScience paper. It will always compute results using GEOtiled-G and GEOtiled-SG. The region characteristics should be specified in the input file name.

In [None]:
# Specify path to input data and correlating tile size to use
input_data = {"flat.tif": 2500,
              "mountain.tif": 2500}

# Specify which terrain parameters to compute
terrain_parameters = ["hillshade","slope","aspect","plan_curvature","profile_curvature","convergence_index","filled_depressions","watershed_basins",
                      "total_catchment_area","flow_width","specific_catchment_area","channel_network","drainage_basins","flow_direction","flow_connectivity"]

# Specify the desired number of concurrent processes
concurrent_processes = 64

# Specify the number of runs to do
runs = 10

In [None]:
# Run test
for method in ['GEOtiled-G','GEOtiled-SG']:
    for data in input_data:
        for tp in terrain_parameters:
            for run in range(runs):
                !{tools.get_file_directory()}/region_change_test_files/start_test.sh {method} {data.replace('.tif','')} {input_data[data]} {tp} {concurrent_processes} {run}

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('region_change_test_results.csv', test='region_changes')

In [None]:
# Average together results of multiple runs
tools.average_together_results('region_change_test_results.csv', test='region_changes')

In [None]:
# Plot results
for data in input_data:
    tools.plot_region_change_results('averaged_region_change_test_results.csv', data.replace('.tif',''), input_data[data], ylims1=[0,35], ylims2=[0,25], use_legend=True)

In [None]:
# Plot memory usage over time for different terrain parameters
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_slope_2500_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_total_catchment_area_2500_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_specific_catchment_area_2500_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_channel_network_2500_0.csv', xlims=[0,35], ylims=[25,28])

#