## Sorting algorithms

This notebook illustrates the implementation of different sorting strategies, and compares them in term of efficiency (computation time)

Sorting functions are defined within a class: *my_sort_class*, which is contained in the *sort_code.py* module

For the examples below, data to be sorted are generated by using functions defined in the class *data_class*, contained in the *data_class.py* module. 

The following cell imports the required libraries (numpy and pandas) and classes: 

In [25]:
import numpy as np            
import pandas as pd
from data_class import data_class
from sort_code import my_sort_class

Instances of the data_class (*dt*) and my_sort_class (*srt*) are being created:

In [26]:
srt=my_sort_class()  
dt=data_class()

Here, different data-sets are being created by using the method *set_data* of the class *data_class*; the size of the data-sets is 20. 

- The file ***dt.data***, produced by *set_data*, contains the **unsorted** data-set; 
- the ***dt.data_s*** is the **completely sorted** data-set; 
- the files ***dt.data_sw***, ***dt.data_sw2*** and ***dt.data_sw3*** are data-sets **partially ordered**, produced by *scrambling* the data from the *dt.data_s* ordered data-set.

*Scrambled data-sets* are produced by setting the second and third input parameter of the *set_data* method  

In [36]:
dt.set_data(20, 1, 3)
print("Original data-set:\n",dt.data)
print("\nScrambled data (dt.data_sw2):\n", dt.data_sw2)

Original data-set:
 [0.69083794 0.79078916 0.73594185 0.37025641 0.98936219 0.11711468
 0.78500868 0.45618545 0.00607189 0.56703793 0.3614055  0.70813488
 0.63254091 0.55112117 0.04353563 0.84489233 0.39188753 0.83680139
 0.39254586 0.75923647]

Scrambled data (dt.data_sw2):
 [0.00607189 0.04353563 0.11711468 0.69083794 0.78500868 0.39188753
 0.39254586 0.45618545 0.75923647 0.56703793 0.63254091 0.3614055
 0.70813488 0.73594185 0.55112117 0.37025641 0.79078916 0.83680139
 0.84489233 0.98936219]


Let's sort the set *dt.data* by using the method *ms*:

In [37]:
dt_sorted=srt.sort(dt.data, method='ms')
print("Sorted data")
print(dt_sorted)

Sorted data
[0.00607189 0.04353563 0.11711468 0.3614055  0.37025641 0.39188753
 0.39254586 0.45618545 0.55112117 0.56703793 0.63254091 0.69083794
 0.70813488 0.73594185 0.75923647 0.78500868 0.79078916 0.83680139
 0.84489233 0.98936219]


Let's compare the efficiency of the different algorithms on data-sets of some given size. To this end, the function *statistics* is here defined:

In [48]:
def statistics(num_data=300, num_scramble=20, num_scramble2=80, limit=0.8, fast=True):
    
    dt.set_data(num_data, num_scramble, num_scramble2, limit)
    
    tlist=np.array([])
        
    dataset=[dt.data, dt.data_sw2, dt.data_sw3, dt.data_sw, dt.data_s]
    fun=[srt.my_sort, srt.my_sort_b, srt.my_sort_c, srt.my_sort_2, srt.my_sort_2b, srt.my_sort_3, srt.my_sort_4,\
         srt.my_sort_smart, np.sort]
    
    ds_size=np.shape(dataset)[0]
    data_size=np.shape(dataset)[1]
    df_size=len(fun)
    
    for i in np.arange(df_size):
        for j in np.arange(ds_size):
            if fast:
               t=%timeit -r 1 -n 2 -q -o fun[i](dataset[j])
            else:
               t=%timeit  -o fun[i](dataset[j])
               
            tlist=np.append(tlist, t.average)
            

    tlist=tlist*1000
    tlist=tlist.reshape(df_size,ds_size).transpose()
    tlist=tlist.round(3)

    
    pdt=pd.DataFrame(tlist, \
                     columns=["ss", "ms", "msr", "dbc", "dbcb", "vms",\
                              "max", "smart", "np"], \
                     index=["data", "data_sw2", "data_sw3", "data_sw", "data_s"])    
    print("\nStatistics: performances given in milli-seconds")
    print("Dataset size: %4i" % data_size)
    print("Number of exchanges in the dataset: %4i" % dt.w_size)
    print("Scramble_1:    %3i  (data_sw)" % dt.num_scramble)
    print("Scramble_2:    %3i  (data_sw2)" % dt.num_scramble2)
    print("Scramble_next: %3i  (data_sw3)\n" % dt.num_scramble2)
    
    pd.set_option('display.max_columns', None)    
    print(pdt)

In [49]:
statistics()


Statistics: performances given in milli-seconds
Dataset size:  300
Number of exchanges in the dataset:  152
Scramble_1:     20  (data_sw)
Scramble_2:     80  (data_sw2)
Scramble_next:  80  (data_sw3)

                ss       ms      msr     dbc    dbcb     vms     max   smart  \
data      3316.968  130.902  133.610  57.677  39.846  99.370  14.948  22.863   
data_sw2  2116.034  114.283  118.737  49.778  33.667  72.085  14.382  16.925   
data_sw3    13.727    1.196    1.181  33.041  26.750   0.917  14.959   1.268   
data_sw    698.041   70.542   72.397  61.702  19.569  47.792  14.466  15.727   
data_s       0.369    0.361    0.313  35.819  17.243   0.236  14.213   0.785   

             np  
data      0.022  
data_sw2  0.018  
data_sw3  0.012  
data_sw   0.011  
data_s    0.021  
