# 04: Sorting

We've done some examples of sorting which took $\mathcal{O}(N^2)$ time, however it is possible to do much better than this, $\mathcal{O}(N \log N)$ is possible with a variety of algorithms.

In a practical programming application most of the time I would just use a standard C sorting function, $qsort()$, rather than writing my own, so the example code this week will start by defining some structured objects which can be sorted in different ways and applying qsort() to them.

I've already showed some code to define a structured datatype, today we will create something a bit more complicated than just an int and a pointer, we will make something that can hold some complex information about an atom.


In [1]:
#include <stdio.h>
#include <stdlib.h>

//Define a (listable) structured type ATOM_T to hold info about a (classical) atom
typedef struct atom_t {
    atom_t   *listNext;      //can add to a list of similar objects
    atom_t  **bonded_atoms;  //an array of pointers to other atoms
    int       n_bonds;       //size of above array.
    double    position[3];   
    double    momentum[3];
    double    mass;
    char      element[3];
    int       display_row;   //save this info to make a nice printout of the molecule
    int       display_col;   //
} ATOM_T; 



When we define a complicated structured datatype, typically we write functions to initialise and to free items of that type.

In [2]:
ATOM_T *newAtom( double mass, const char *eleString, int n_bonds, int display_row, int display_col ){
    /*We can just declare structures straight-up
      but it is nicer to have an initialisation function*/
    ATOM_T *a;
    int     i;
    
    //allocate the space
    a = (ATOM_T *)malloc( sizeof(ATOM_T) );
    
    //copy in the string, checking that we don't run over the end of the input string,
    //and also that we don't overfill the 2 char limit of the periodic table element 
    //name
    for( i = 0; i < 2; i++ ){
        a->element[i] = eleString[i];
        if( eleString[i] == '\0' ) break;
    }
    a->element[2] = '\0'; //finish the string with a null terminator whether one was passed or not.
    a->mass = mass;

    a->listNext     = NULL;
    a->n_bonds      = n_bonds;
    a->bonded_atoms = (ATOM_T **)malloc(n_bonds * sizeof(ATOM_T *));
    
    a->display_row = display_row;
    a->display_col = display_col;
    
    //don't bother setting up position and momentum for now
    return( a );
}


In [3]:
void freeAtom( ATOM_T *a ){
    /* Two mallocs in the constructor
    means two frees in the destructor */
    free( a->bonded_atoms );
    free( a );
}

In [4]:

ATOM_T **makeEthanol(){
    /* Example function to structure atoms
    and make a molecule (CH3CH2OH, ethanol)*/
    ATOM_T **ethAtoms;
    int      i;
    
    ethAtoms = (ATOM_T **)malloc( 9 * sizeof(ATOM_T *) );
    
    //create a list of atoms to make ethanol 
    ethAtoms[0] = newAtom( 12.0, "C", 4, 1, 1 );
    ethAtoms[1] = newAtom( 1.0,  "H", 1, 0, 1 );
    ethAtoms[2] = newAtom( 2.0,  "H", 1, 1, 0 );
    ethAtoms[3] = newAtom( 1.0,  "H", 1, 2, 1 );
    ethAtoms[4] = newAtom( 12.0, "C", 4, 1, 2 );
    ethAtoms[5] = newAtom( 1.0,  "H", 1, 0, 2 );
    ethAtoms[6] = newAtom( 1.0,  "H", 1, 2, 2 );
    ethAtoms[7] = newAtom( 16.0, "O", 2, 1, 3 );
    ethAtoms[8] = newAtom( 1.0,  "H", 1, 1, 4 );
    
    //define the molecular structure
    //CH3
    ethAtoms[0]->bonded_atoms[0] = ethAtoms[1];
    ethAtoms[0]->bonded_atoms[1] = ethAtoms[2];
    ethAtoms[0]->bonded_atoms[2] = ethAtoms[3];
    for( i = 1; i < 4; i++) ethAtoms[i]->bonded_atoms[0] = ethAtoms[0];
    
    //CC
    ethAtoms[0]->bonded_atoms[3] = ethAtoms[4];
    ethAtoms[4]->bonded_atoms[0] = ethAtoms[0];
    
    //CH2O
    ethAtoms[4]->bonded_atoms[1] = ethAtoms[5];
    ethAtoms[4]->bonded_atoms[2] = ethAtoms[6];
    ethAtoms[4]->bonded_atoms[3] = ethAtoms[7];
    for( i = 5; i < 8; i++) ethAtoms[i]->bonded_atoms[0] = ethAtoms[4];
    
    //OH
    ethAtoms[7]->bonded_atoms[1] = ethAtoms[8];
    ethAtoms[8]->bonded_atoms[0] = ethAtoms[7];
    
    return( ethAtoms );
}


Well that is cool, but how do you test a data structure? There isn't much to do except print it out, I'll try and write a simple function to print out a molecule, below.


In [5]:
void printMolecule( ATOM_T **mol, int n_atoms, int max_row, int max_col ){

    int i, j, ia;
    char *outchars;
    
    //create a large multiline string to hold the printout
    outchars = (char *)malloc(1+max_row*(max_col*3+1)*sizeof(char));
    for( j=0; j < max_row; j++ ){
       for( i=0; i < max_col*3; i++ ){
           outchars[j*(max_col*3+1) + i] = ' ';
       }
       outchars[j*(max_col*3+1) + max_col*3] = '\n';
    }
    outchars[max_row*(max_col*3+1)] = '\0'; //string terminator
    
    //fill the string with element names
    for( j=0; j < max_row; j++ ){
       for( i=0; i < max_col; i++ ){
           for( ia = 0; ia < n_atoms; ia++ ){
               if( mol[ia]->display_row == j && mol[ia]->display_col == i  ){
                   sprintf( &outchars[j*(max_col*3+1) + i*3], "%.2s ", mol[ia]->element);
                   outchars[j*(max_col*3+1) + i*3 + 2] = ' '; //overwrite the terminator that was just added
               }
           }
       }
    }
    printf("%s", outchars);
}

In [6]:
int main(){
    ATOM_T **ethAtoms;
    int      i;
    
    ethAtoms = makeEthanol();
    printMolecule( ethAtoms, 9, 5, 5);
    
    for( i = 0; i < 9; i++ ) freeAtom(ethAtoms[i]);
    free(ethAtoms);
    
    return( EXIT_SUCCESS );
}

main();

   H  H        
H  C  C  O  H  
   H  H        
               
               


## Sorting

OK well I wrote what turned out to be quite a lot of code to build a structured datatype containing info about some atoms.

Below I'm going to give a quick example of C qsort() function to sort the atoms by mass.

In [7]:
/* to use qsort we need to define a function for comparing objects of the type that we are sorting

Because the function has to fit a standard definition, we can't use our ATOM_T* datatype in the
prototype: it has to take (const void *) and then cast them to (ATOM_T **) inside the function.

This looks absolutely horrible and is not nice to read.

Basically the qsort() needs a compare function as an argument, and all it knows or wants to know
is that the two argments of the compare function have to be pointers of some kind.

Because our ATOM_T* objects are already pointers, qsort feeds the compare function
pointers-to-pointers.  To use these values a and b, we have to cast them to ATOM_T**
and then dereference, which needs a lot of brackets and stars and is not especially nice.

*/
int compare_atoms_mass(const void *a, const void *b){
    /* compare functions for qsort need to return -ve if less than, +ve if greater than.
       it doesn't really matter what they return if equivalent, but formally should be 0.*/
    if( (*(ATOM_T **)a)->mass < (*(ATOM_T **)b)->mass ) return( -1 );
    if( (*(ATOM_T **)a)->mass > (*(ATOM_T **)b)->mass ) return(  1 );
    return( 0 );
}

In [8]:


int main(){
    ATOM_T **ethAtoms;
    int      i, j;
    

    ethAtoms = makeEthanol();
    printMolecule( ethAtoms, 9, 5, 5);
    printf("Before sorting:\n");
    for( i = 0; i < 9; i++ ){
        printf("ethAtoms %i %s has mass: %.2lf ",
                    i, ethAtoms[i]->element, ethAtoms[i]->mass);
        printf("bonds:");
        for( j = 0; j < ethAtoms[i]->n_bonds; j++){
            printf(" %s",ethAtoms[i]->bonded_atoms[j]->element);
        }
        printf("\n");
    }
    
    /*qsort(base_of_array, 
            number_objects, 
            size_objects, 
            use_this_compare_function);*/
    qsort(ethAtoms, 9, sizeof(ATOM_T *), compare_atoms_mass); //pass the function name in as an argument to the sort
    
    //is it sorted?
    printf("\nAfter sort:\n");
    for( i = 0; i < 9; i++ ){
        printf("ethAtoms %i %s has mass: %.2lf ",
                    i, ethAtoms[i]->element, ethAtoms[i]->mass);
        printf("bonds:");
        for( j = 0; j < ethAtoms[i]->n_bonds; j++){
            printf(" %s",ethAtoms[i]->bonded_atoms[j]->element);
        }
        printf("\n");
    }
    
    return( EXIT_SUCCESS );
}

main();

   H  H        
H  C  C  O  H  
   H  H        
               
               
Before sorting:
ethAtoms 0 C has mass: 12.00 bonds: H H H C
ethAtoms 1 H has mass: 1.00 bonds: C
ethAtoms 2 H has mass: 2.00 bonds: C
ethAtoms 3 H has mass: 1.00 bonds: C
ethAtoms 4 C has mass: 12.00 bonds: C H H O
ethAtoms 5 H has mass: 1.00 bonds: C
ethAtoms 6 H has mass: 1.00 bonds: C
ethAtoms 7 O has mass: 16.00 bonds: C H
ethAtoms 8 H has mass: 1.00 bonds: O

After sort:
ethAtoms 0 H has mass: 1.00 bonds: C
ethAtoms 1 H has mass: 1.00 bonds: C
ethAtoms 2 H has mass: 1.00 bonds: C
ethAtoms 3 H has mass: 1.00 bonds: C
ethAtoms 4 H has mass: 1.00 bonds: O
ethAtoms 5 H has mass: 2.00 bonds: C
ethAtoms 6 C has mass: 12.00 bonds: H H H C
ethAtoms 7 C has mass: 12.00 bonds: C H H O
ethAtoms 8 O has mass: 16.00 bonds: C H


That works out nicely, after the qsort(), all the bond pointers are still correct.  The array of pointers to atoms was sorted, but the atom data itself is still in the same place in memory that it always was, so we only had to do pointer shuffling.

## Assignment, week 04 : Structures and Sorting

Write programmes (create new cells below) which do the following, each one should make a printout to show that it is working:

1) Create a caffeine molecule in the same way as I created an ethanol molecule, and with the same rough printout method for the structural formula (so you have to specify a row and a column in the text output for each atom that you add to the molecule).

2) Create a caffeine molecule and use qsort to sort (by mass of the bonded atom, descending) the arrays of bonds that you should have defined per-atom of the caffeine, show a before and after.

3) Create a markdown cell in the notebook, and write in pseudocode an algorithm which can sort a list or an array faster than $\mathcal{O}(N^2)$.  You can easily find such algorithms on wikipedia, for example binary insertion sort.  The work here is for you to document and present an algorithm in a neat, readable, way such that it would be easy to code in a C-like language without actually being C, or python, or whatever.  Cormen et al. is full of good examples of what readable pseudocode looks like, or you can find examples of various quality on wikipedia, there is no single absolutely rigid standard.
