# Analysis of the duration of tests


## Problem definition


We want to analyze the test execution time locally and with the `--modin in the cloud ' flag.

**Problem**: When running with the flag, some tests run much longer than without the flag. For example, the `test_map_metadata` runs for about 2 hours and 30 minutes, while without the flag it runs for less than 5 minutes.

**Task**: Analyze the tests and their duration and find the slowest tests.


## Getting temporary statistics for the `test_map_metadata`

Importing the `ParseOutputPytest` modules

In [1]:
from ParseOutputPytest import *

pop = ParseOutputPytest()

Let's install the folder from where we will execute the command to run the tests

In [2]:
path_dir = 'C:\\prog\\modin'

We will get json for the test that is running without a flag

In [3]:
# Path to the test
path_test_map_metadata = 'modin\\pandas\\test\dataframe\\test_map_metadata.py'
environment_variables_without_cloud = 'set MODIN_ENGINE=Python'

# Let's start collecting statistics for the test without flag
# file_name_test_map_metadata_without_cloud = pop.start_test(path_dir, path_test_map_metadata, environment_variables=environment_variables_without_cloud)

# Path to statistics in json format
file_name_test_map_metadata_without_cloud = 'json_analytics_from_test\\23-7-2021_12-29-52_modin_pandas_test_dataframe_test_map_metadata.json'

We will get json for the test that is running with a flag

In [4]:
# Path to the test
path_test_map_metadata = 'modin\\pandas\\test\dataframe\\test_map_metadata.py'
environment_variables_with_cloud = 'set MODIN_EXPERIMENTAL=1 && set MODIN_ENGINE=Python'
pytest_args_with_cloud = '--simulate-cloud=normal'

# Let's start collecting statistics for the test with flag
# file_name_test_map_metadata_without_cloud = pop.start_test(path_dir, path_test_map_metadata, environment_variables=environment_variables_with_cloud, args_pytest=pytest_args_with_cloud)

# Path to statistics in json format
file_name_test_map_metadata_with_cloud = 'json_analytics_from_test\\modin_pandas_test_dataframe_test_map_metadata.json'

Let's get dict, where key - name of the test, and value - duration of the test

In [5]:
dict_test_metadata_duration_without_cloud = pop.get_dict_test_duration(file_name_test_map_metadata_without_cloud)
dict_test_metadata_duration_with_cloud = pop.get_dict_test_duration(file_name_test_map_metadata_with_cloud)

Create dict, where key - name of the test, and value - module of the time difference between the execution of these test

In [6]:
# Let's go through only the `with_cloud` dict, since there are fewer tests passed in it.

dict_name_abs_duration = {}

for name, duration_with_cloud in dict_test_metadata_duration_with_cloud.items():
    dict_name_abs_duration[name] = abs(duration_with_cloud - dict_test_metadata_duration_without_cloud[name])

# Let's get a list of pairs (test name, time difference) sorted by the second field
list_sort_test = sorted(dict_name_abs_duration.items(), key=lambda x: x[1], reverse=True)

Output the slowest tests in the `test_map_metadata` file

In [7]:
count_test = 10

for _, test_with_duration in zip(range(count_test), list_sort_test):
    print(f'Test: {test_with_duration[0]}')
    print(f'  Time difference: {test_with_duration[1]}')
    print('')

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_applymap[square-float_nan_data]
  Time difference: 58.733764299999976

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_applymap_numeric[plus one-float_nan_data]
  Time difference: 56.32550849999994

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_applymap[plus one-float_nan_data]
  Time difference: 55.36009379999998

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_applymap_numeric[square-float_nan_data]
  Time difference: 54.68688940000002

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_at[float_nan_data]
  Time difference: 53.90222800000005

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_get[does not exist-float_nan_data]
  Time difference: 49.598026999999846

Test: modin/pandas/test/dataframe/test_map_metadata.py::test_set_index[append_False-drop_None-float_nan_data]
  Time difference: 38.22146690000339

Test: modin/pandas/test/dataframe/test_map_metadata.py