# 06: Search on a Graph

Last week we applied a simple hashing algorithm (sometimes called a "Verlet List" or a "Cell List") to detect connections between pairs of objects based on distance.  

This week we look at what we can do with this kind of connection information, which in principle defines a network, or what is called a **graph**.  Computing mathematical problems in relation to graphs is very important in many branches of physics, consider Feynman diagrams in QED, or the meshes of irregular polygons used to represent real-world objects for meso-scale physics calculations (heat flow, fluid dynamics, mechanical stress etc), and in particular consider interaction networks in between atoms in materials.

I'm going to provide some code to make an **extensible array** datatype so that we can keep lists of neighbours that a given atom has, and build up a stable memory-efficient representation of the neighbourhood graph that we developed in last week's exercise.  In theory we could just make an array of size $N\times N$ and set entry $i,j$ not equal to zero if two atoms were in contact, but obviously the cost of that is $N^2$ in memory use, so such an approach isn't practical for large systems.


In [1]:
#include <stdio.h>
#include <stdlib.h>

//define an int array with information that
//we can use to change its size if it gets too big.
typedef struct xints_t {
    int *data;            //pointer to the stored data
    int  n_data_items;    //number of items in the array
    int  size_in_mem;     //size in memory (can be bigger but not smaller than n_data_items)
} XINTS_T;

//Define a (listable) structured type ATOM_T to hold info about a (classical) atom
//I've simplified the datatype to only contain info that is needed for today's problem
typedef struct atom_t {
    atom_t   *listNext;      //can add to a list of similar objects
    XINTS_T  *nebIds;        //keep an extendable array of int indices of neighbours
    XINTS_T  *nebStates;     //we'll need a workspace here as well.
    int       myId;          //what am I?
    double    position[3];   
} ATOM_T; 


In [2]:

XINTS_T *create_xints(){
    XINTS_T *xints;
    
    xints = (XINTS_T *)malloc(sizeof(xints));
    xints->n_data_items = 0;
    xints->size_in_mem  = 8;
    xints->data = (int *)malloc(8*sizeof(int));
    
    return( xints );
}


In [3]:
void append_xint( XINTS_T *xints, int i ){
    if( xints->n_data_items >= xints->size_in_mem ){
        xints->size_in_mem *= 2; //realloc is expensive, so take more than we need.
        xints->data = (int *)realloc(xints->data, xints->size_in_mem*sizeof(int) );
    }
    xints->data[xints->n_data_items] = i; //save the new array member.
    xints->n_data_items             += 1;
}

In [4]:
ATOM_T *newAtom( double x, double y, double z, int id ){
    
    /* create an atom in a given position */
    
    ATOM_T *a;
    
    a = (ATOM_T *)malloc( sizeof(ATOM_T) );
    
    //info about the atom itself
    a->listNext     = NULL;
    a->myId         = id;
    a->position[0]  = x;
    a->position[1]  = y;
    a->position[2]  = z;
    
    //allocate some empty (but extendable) arrays to 
    //store connectivity information.
    a->nebIds       = create_xints();
    a->nebStates    = create_xints();
    
    return( a );
}


In [5]:
void freeAtom( ATOM_T *a ){
    free( a->nebIds->data );
    free( a->nebIds);
    free( a->nebStates->data );
    free( a->nebStates );
    free( a );
}

In [6]:
ATOM_T **boxOfAtoms(int N_atoms, double box_L, int seed ){
    
    /*
     create some atoms and place them randomly in a box centred at the origin.
    */
    ATOM_T **atoms;
    int          i;
    double  half_L;
    
    half_L = box_L * 0.5;
    
    //init the random number generator, so that code is repeatable
    srand( seed );
    
    atoms = (ATOM_T **)malloc(N_atoms*sizeof(ATOM_T *));
    for( i = 0; i < N_atoms; i++ ){
        atoms[i] = newAtom( (rand()*(box_L/RAND_MAX)) - half_L,
                            (rand()*(box_L/RAND_MAX)) - half_L,
                            (rand()*(box_L/RAND_MAX)) - half_L, 
                             i );
        
    }
    return( atoms );
}

In [7]:
int image_int( int i, int N ){
    if( i >= N ) return ( i - N );
    if( i < 0  ) return ( i + N );
    return( i );
}

In [8]:
double check_contact( ATOM_T* a, ATOM_T* b, double box_L, double r_cut ){
    /* 
    check if two atoms are in "contact" (closer than distance r_cut)
    
    subject to periodic boundary conditions
    */
    double dx[3], r2, half_L;
    int    d;
    
    half_L = box_L * 0.5;
    for( d = 0; d < 3; d++ ){
      //displacement vector a to b.
      dx[d] = b->position[d] - a->position[d];
        
      //periodic imaging, now see nearest image.
      if     ( dx[d] >  half_L ) dx[d] -= box_L;
      else if( dx[d] < -half_L ) dx[d] += box_L;
    }
    
    r2 = dx[0]*dx[0] + dx[1]*dx[1] + dx[2]*dx[2]; 
    
    //return r if we are in contact, otherwise -1 (impossible r).
    if( r2 > r_cut*r_cut ) return( -1.0 );
    return( sqrt(r2) );
    
}

In [9]:
void assignCellIndex( double *posn, int *ix, int *iy, int *iz, double box_L, double r_cell ){
    /* assign cell indices for points on [-0.5*box_L.. 0.5*box_L) 
    
       This is done by integer rounding-down, assumes that the 
       box is centred at the origin and that all particles are
       in the box.   [-L/2 <= x,y,z < L/2)
    */
    double half_L;
    
    half_L = box_L * 0.5;
    
    //write to the output variables provided by the pointers ix, iy, iz
   *ix = (int)((posn[0]+half_L)/r_cell);
   *iy = (int)((posn[1]+half_L)/r_cell);
   *iz = (int)((posn[2]+half_L)/r_cell);
}

In [10]:
ATOM_T **buildCellList( ATOM_T **atoms, int N, double L, double rcut ){
    
    ATOM_T **cells;
    
    double r_cell, rcontact;
    int    Ncells_x, Ncells_x2, Ncells, i;
    int    ii, jj, kk;
    
    //how big should the cells be, and how many do we need?
    Ncells_x  = (int)(L / rcut);      //count the number of cells in each direction
    if( Ncells_x < 3 )Ncells_x = 3;   //minimum is three
    r_cell    = L / Ncells_x;         //length per cell
    Ncells_x2 = Ncells_x * Ncells_x;
    Ncells    = Ncells_x * Ncells_x * Ncells_x;
    
    //allocate space for the cell array
    cells = (ATOM_T**)malloc(Ncells*sizeof(ATOM_T*));
    for(i = 0; i < Ncells; i++) cells[i] = NULL; //cells start off empty
    
    //assign atoms to cells
    for(i = 0; i < N; i++){
        //get cell indices ii, jj, kk
        assignCellIndex( atoms[i]->position, &ii, &jj, &kk, L, r_cell );
        
        //add the atom to the cell it belongs to (ii + jj*Ncells_x + kk*Ncells_x2)
        atoms[i]->listNext                 = cells[ii+jj*Ncells_x+kk*Ncells_x2];
        cells[ii+jj*Ncells_x+kk*Ncells_x2] = atoms[i];
        
    }
    return ( cells );
}

In [11]:
void buildNeighbourLists(  ATOM_T **atoms, int N, double L, double rcut, ATOM_T **cells ) {
    
    ATOM_T *a, *b;
    
    //duplicate some convenience variables that depend on rcut and L
    double r_cell, rcontact;
    int    Ncells_x, Ncells_x2, Ncells, i;
    int    ii, jj, kk;
    int    iii, jjj, kkk,  i0, j0, k0;
    
    Ncells_x  = (int)(L / rcut);      //count the number of cells in each direction
    if( Ncells_x < 3 )Ncells_x = 3;   //minimum is three
    r_cell    = L / Ncells_x;         //length per cell
    Ncells_x2 = Ncells_x * Ncells_x;
    Ncells    = Ncells_x * Ncells_x * Ncells_x;
    
    
    /*
    *  check contacts and build neighbour xarrays
    */
    //loop over all cells.
    for(i = 0; i < Ncells; i++){
        
        //i0,j0,k0 indices of this cell in 3D
        i0 =       i % Ncells_x;
        j0 =      (i % Ncells_x2) / Ncells_x;
        k0 = (int) i / Ncells_x2;
        
        //a is the first atom in the cell, or NULL if no atoms
        a = cells[i];
        while( a != NULL ){
            
            //check all 27 neighbouring cells
            for( ii = -1; ii <= 1; ii++ ){
                iii = image_int( i0+ii, Ncells_x );
            for( jj = -1; jj <= 1; jj++ ){
                jjj = image_int( j0+jj, Ncells_x );
            for( kk = -1; kk <= 1; kk++ ){
                kkk = image_int( k0+kk, Ncells_x );
                b = cells[ iii + jjj*Ncells_x + kkk*Ncells_x2 ];
                if( a == b ) continue;
                while( b ){
                   if( b != a ){
                       if( check_contact(a, b, L, rcut) >= 0 ){
                           
                           //new code for this week: add the atom b
                           //to the neighbour xarray of a.
                           append_xint( a->nebIds, b->myId );
                               
                           //we are going to need one int of scratch 
                           //space for each neighbour
                           append_xint( a->nebStates, 0 );
                       }
                   }
                   b = b->listNext;  //keep looking in this neighbour cell
                }
            }
            }
            }
            a = a->listNext; //keep looking in the main cell.
        }
    }
}

In [18]:
int test_neblists(int N_atoms, double box_L, int seed, double r_cut){
    
    /*
      Test code to:
      build a box and a set of neighbour lists, see how each atom is connected to each other.
    */
    ATOM_T **atoms, **cells;
    int      i, j, n;

    //place a bunch of atoms randomly in a box of size box_L
    atoms = boxOfAtoms( N_atoms, box_L, seed );
    
    //build a cell list so we can easily find neighbours of atoms
    cells = buildCellList( atoms, N_atoms, box_L, r_cut);
    
    //give each atom a list of its neighbours, stored inside the ATOM_T
    //as an extensible array.
    buildNeighbourLists( atoms, N_atoms, box_L, r_cut, cells );
    
    //for each atom
    for( i = 0; i < N_atoms; i++ ){
        
        //for each neighbour of that atom
        for( j = 0; j < atoms[i]->nebIds->n_data_items; j++){
            
            //id of jth neighbour
            n = atoms[i]->nebIds->data[j];
            printf(" %i - %i  distance: %.2f:\n", i, n, check_contact( atoms[i], atoms[n], box_L, r_cut ));
        }
    }
    
    //clean everything up
    for( i = 0; i < N_atoms; i++ ){ 
        //I had to write a function to free atoms, because they now contain other things
        //that also need to be freed.
        freeAtom( atoms[i] );
    }
    free( atoms ); //this releases the array of double-pointers to atoms.
    free( cells ); //a cell is just a pointer to the first atom in a list, we can free directly.
    
    
    return( EXIT_SUCCESS );
}

In [21]:
test_neblists( 100, 5.0, 1337, 0.9);

 0 - 9  distance: 0.50:
 1 - 7  distance: 0.78:
 1 - 33  distance: 0.87:
 1 - 61  distance: 0.40:
 1 - 38  distance: 0.81:
 2 - 26  distance: 0.76:
 2 - 8  distance: 0.36:
 2 - 42  distance: 0.82:
 2 - 80  distance: 0.67:
 2 - 60  distance: 0.71:
 3 - 68  distance: 0.61:
 4 - 40  distance: 0.86:
 4 - 5  distance: 0.56:
 5 - 48  distance: 0.73:
 6 - 21  distance: 0.43:
 7 - 1  distance: 0.78:
 8 - 58  distance: 0.63:
 8 - 26  distance: 0.60:
 8 - 42  distance: 0.70:
 8 - 2  distance: 0.36:
 8 - 80  distance: 0.88:
 9 - 25  distance: 0.88:
 9 - 34  distance: 0.68:
 9 - 76  distance: 0.72:
 10 - 49  distance: 0.74:
 10 - 82  distance: 0.87:
 10 - 84  distance: 0.63:
 11 - 75  distance: 0.80:
 11 - 52  distance: 0.54:
 11 - 65  distance: 0.56:
 11 - 62  distance: 0.75:
 12 - 93  distance: 0.85:
 12 - 55  distance: 0.85:
 12 - 88  distance: 0.66:
 12 - 24  distance: 0.79:
 12 - 95  distance: 0.46:
 13 - 57  distance: 0.85:
 14 - 98  distance: 0.66:
 14 - 58  distance: 0.65:
 15 - 32  distan

OK great I have given you some code to 

1) Create a box of atoms

2) See if two atoms are in contact

3) Make an extensible array of atom ids showing the neighbours of each atom.

Before I do the whole exercise I am going to stop coding and ask you to take over.


## Assignment, week 06: Is There A Way?

A natural question if we have a network, is to ask if a connected path exists between two points which may not themselves be directly linked.  One very standard algorithm for this question is a "breadth first search" (BFS).

You can read about this in the recommended literature, or there is a short description on wikipedia.  https://en.wikipedia.org/wiki/Breadth-first_search#Pseudocode .

Your assignment is:

1) Write a program which tests **efficiently** if there is a path from a given atom to a given second atom.


2) Write a program that samples many random atom pairs to estimate the probability that a path exists between two atoms, at a given density and box size.

Please use the code I have given you, as far as it goes.  Nothing that I have given you is not useful.

If you decide to work from the BFS algortihm as presented on wikipedia, please don't just give up when you see a new word for something.

When the algorithm on wikipedia says to "label" an "edge" as "explored", an "edge" is just another name for a link between a particle and one of its neighbours, to label it you can save something into the scratch space that I already implemented in the datatype ATOM_T (read the code I've given you).

When the wikipedia page says "enqueue" or "dequeue" this means to add an atom to the head of linked list, or remove an atom from the other end of one.  A "queue" is just a list where we add at one end and take from the other.  For 100% you should try to do this efficiently, instead of just crawling the list from the start in order to find the end. 




