# eScience Performance Analysis Results

The following notebook can be used to run the exact same tests used for generating the results in the eScience 2025 GEOtiled paper submission.

Note that results may not be identical to the paper due to the nature of the machine you run on and other background tasks shared on the machine, along with the fact that this version provides friendly user interaction with Jupyter, as the previous test were done by running bash scripts in the background. Therefore, axis limit values may need to be updated to see full results.

Some tests will take very long to run due to using large input data (up to a few days), so it is recommended to run this notebook in the background.

All results gathered here were run on a VM with 64 CPU cores and 500 GB, so ensure these minimum requirements are met to reproduce results.

## Initialization

The below cells import requires libraries and other initializations and should be ran before the rest of the notebook.

In [None]:
from pathlib import Path
import geotiled
import tools

Ensure to set the working directory to a place where there is at least 1 TB of available space.

In [None]:
# Set working directory
working_directory = '/media/volume/gabriel-geotiled/full_test'
geotiled.set_working_directory(working_directory)

# Create folder to store memory logs
Path('mem_logs').mkdir(parents=True, exist_ok=True)

# Create folder to store all graph images
Path('imgs').mkdir(parents=True, exist_ok=True)

## Data Curation

This section downloads and preprocesses all data used for testing.

In [None]:
!{tools.get_file_directory()}/data_preprocessing_files/start.sh

## Optimization Test

The following test is used to generate results similar to those in Table II of the eScience paper. 

In [None]:
!{tools.get_file_directory()}/optimization_test_files/start_test.sh

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('optimization_test_results.csv', test='optimizations')

In [None]:
# Average together results of multiple runs
tools.average_together_results('optimization_test_results.csv', test='optimizations')

In [None]:
# Print results of test
tools.print_optimization_results('averaged_optimization_test_results.csv')

## Chaning Tile Size Test

The following test is used to generate results similar to those in Figure 2 of the eScience paper.

In [None]:
!{tools.get_file_directory()}/tile_size_test_files/start_test.sh

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('tile_size_test_results.csv', test='tile_sizes')

In [None]:
# Average together results of multiple runs
tools.average_together_results('tile_size_test_results.csv', test='tile_sizes')

In [None]:
# Plot results
tools.plot_tile_size_results('averaged_tile_size_test_results.csv', 'slope', ylims1=[0,7000], ylims2=[0,300], zoom_ylims=[0,200], use_legend=True)
tools.plot_tile_size_results('averaged_tile_size_test_results.csv', 'aspect', ylims1=[0,7000], ylims2=[0,300], zoom_ylims=[0,200])
tools.plot_tile_size_results('averaged_tile_size_test_results.csv', 'hillshade', ylims1=[0,7000], ylims2=[0,300], zoom_ylims=[0,200])

## Changing Process Count Test

The following test is used to generate results similar to those in Figure 3 of the eScience paper.

In [None]:
!{tools.get_file_directory()}/process_count_test_files/start_test.sh

In [None]:
# Average together results of multiple runs
tools.average_together_results('process_count_test_results.csv', test='process_counts')

In [None]:
# Plot results
tools.plot_process_count_results('averaged_process_count_test_results.csv', 'slope', ylims=[0,350], use_legend=True)
tools.plot_process_count_results('averaged_process_count_test_results.csv', 'aspect', ylims=[0,350])
tools.plot_process_count_results('averaged_process_count_test_results.csv', 'hillshade', ylims=[0,350])

## Changing Topographic Region Test

The following test is used to generate results similar to those in Figure 4 and 5 of the eScience paper.

In [None]:
!{tools.get_file_directory()}/region_change_test_files/start_test.sh

In [None]:
# Get peak memory usage of all tests
tools.update_peak_memory_usages('region_change_test_results.csv', test='region_changes')

In [None]:
# Average together results of multiple runs
tools.average_together_results('region_change_test_results.csv', test='region_changes')

In [None]:
# Plot results
tools.plot_region_change_results('averaged_region_change_test_results.csv', 'flat', ylims1=[0,35], ylims2=[0,25], use_legend=True)
tools.plot_region_change_results('averaged_region_change_test_results.csv', 'mountain', ylims1=[0,35], ylims2=[0,25])

In [None]:
# Plot memory usage over time for different terrain parameters
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_slope_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_total_catchment_area_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_specific_catchment_area_0.csv', xlims=[0,35], ylims=[25,28])
tools.print_memory_over_time_results('mem_logs/mountain_GEOtiled-SG_channel_network_0.csv', xlims=[0,35], ylims=[25,28])

#