<a target="_blank" href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_monitoring/model_monitoring.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>



# Execution time analysis over the canonical sorting methods used by Python's most popular Data Structures
---

## Our objective

We are trying to calculate the average execution time that takes to sort a **built-in python list**, a **Numpy ndarray** and a **Pandas Series** all using the <i>default sorting utilities of each library</i>

### Experiment workflow

1. Generated numbers are **truly completely randomized numbers**. We generate <i>uint16 numbers </i> in real time by using live measures of quantum fluctuations in the vacuum of the lab hosted by [**Australian National University - Department of Quantum Science**](https://anuquantumoptics.org/). The API used can be found **[here](https://qrng.anu.edu.au/)**
2. A thousand random numbers and generated this way and stored within each data structure.
3. Each data structure is sorted and the **performance is measured in nanoseconds** using <i>perf_counter_ns</i> at the exact time on which the sorting takes place.
4. This reading is stored on a separated list to do some calculations later.
5. The steps **2** to **4** are repeated **<i>n</i>** times. By default, n is equal to 100.
6. The average of all the individual readings is taken for each data structure and reported at the end

### Results

The result for the test will be reported as a plain text file within the same directory on which you run this code

In [1]:
## import libraries to test

# built-in list
import numpy as np
import pandas as pd

import quantumrandom
from time import perf_counter_ns
from time import perf_counter

In [2]:
## function definitions to be timed

def performance_test_list_sort(quantum_random_list: "list") -> int:
    
    
    time_start = perf_counter_ns()
    quantum_random_list.sort()
    time_stop = perf_counter_ns()
    
    return time_stop - time_start

def performance_test_ndarray_sort(quantum_random_list: "list") -> int:
    
    quantum_random_list = np.array(quantum_random_list)
    
    time_start = perf_counter_ns()
    quantum_random_list.sort()
    time_stop = perf_counter_ns()
    
    return time_stop - time_start

def performance_test_Series_sort(quantum_random_list: "list") -> int:
    
    quantum_random_list = pd.Series(data=quantum_random_list, dtype='Int64')
    
    time_start = perf_counter_ns()
    quantum_random_list.sort_values()
    time_stop = perf_counter_ns()
    
    return time_stop - time_start

In [3]:
## RUN THE EXPERIMENT

## n is the number of arrays that will be created and sorted

n = 100

time_results_l = []
time_results_ndarray = []
time_results_Series = []

print("Starting experiment, this might take a while...")
time_start = perf_counter()
for i in range(n):
    print("...")
    quantum_random_list = quantumrandom.get_data(data_type='uint16', array_length=1000)
    
    time_results_l.append(performance_test_list_sort(quantum_random_list))
    time_results_ndarray.append(performance_test_ndarray_sort(quantum_random_list))
    time_results_Series.append(performance_test_Series_sort(quantum_random_list))
    
time_stop = perf_counter()

print("Experiment ended!, it took", (time_stop-time_start)/60, "minute(s) to finish!" )

Starting experiment, this might take a while...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Experiment ended!, it took 12.209204320016665 minute(s) to finish!


In [5]:
results_list = [
    sum(time_results_l)/len(time_results_l),
    sum(time_results_ndarray)/len(time_results_ndarray),
    sum(time_results_Series)/len(time_results_Series)
]

print("Mean time for built-in list: ", sum(time_results_l)/len(time_results_l), "nanosecond(s)")
print("Mean time for Numpy ndarray: ", sum(time_results_ndarray)/len(time_results_ndarray), "nanosecond(s)")
print("Mean time for Pandas Series: ", sum(time_results_Series)/len(time_results_Series), "nanosecond(s)")



Mean time for built-in list:  57027.49 nanosecond(s)
Mean time for Numpy ndarray:  12157.88 nanosecond(s)
Mean time for Pandas Series:  160935.72 nanosecond(s)


In [11]:
with open("test_datastructures_results.txt", "w") as writer:
    writer.write("Mean time for built-in list: " + str(results_list[0]) + "nanosecond(s)\n")
    writer.write("Mean time for Numpy ndarray: " + str(results_list[1]) + "nanosecond(s)\n")
    writer.write("Mean time for Pandas Series: " + str(results_list[2]) + "nanosecond(s)\n")