# A Basic Example for Optimising a 38 atom Lennard-Jones with Organisms

[Back To Table of Contents](../Organisms_Jupyter_Example.ipynb)

In this example, we will perform a genetic algorithm optimisation on the 38 atom Lennard-Jones (LJ<sub>38</sub>) cluster. In this example, we will perform a genetic algorithm with a simple energy predation operator, the energy fitness operator, and a population-based epoch method. We will go through step by step all the components of the *Run.py* files required to run Organisms on LJ<sub>38</sub>. See [organisms.readthedocs.io/en/latest/Using_Run](https://organisms.readthedocs.io/en/latest/Using_Run.html) for more information on the *Run.py* file.

To run this notebook step by step, press the XX button sequentially on the python code you want to run. Make sure you have run X on every cell before running the final cell. Equivalently, press the XX button to completely run this *Run.py* file beginning to end. 

Note: This program makes files which are created and stored on the Binder server. If you want to rerun this example, you will need to remove these files. The following block code will do this. This is only needed for this notebook and is not apart of *Run.py* script

In [16]:
import os
from shutil import rmtree
to_be_removed_before_restarting_Jupyter_example = ['__pycache__','Population','epoch_data','epoch_data.backup','GA_Run_Details.txt','ga_running.lock']
for example_file_or_folder in to_be_removed_before_restarting_Jupyter_example:
    if os.path.exists(example_file_or_folder):
        if os.path.isdir(example_file_or_folder):
            rmtree(example_file_or_folder)
        else:
            os.remove(example_file_or_folder)

The *Run.py* code begins below:

## Importing Organisms into script

To begin, we need to import the Organisms program into this *Run.py* script. Specifically, we want to import the ``GA_Program`` into this *Run.py* script. We do this below:

In [2]:
from Organisms import GA_Program

## The elemental makeup of the cluster

The first part of Run.py specifies the type of cluster you will be testing. Here, the makeup of the cluster is described using a dictionary in the format, {element: number of that element in the cluster, …}. An example of this is shown below:

In [3]:
# This details the elemental and number of atom composition of cluster that the user would like to investigate
cluster_makeup = {'Ne': 38}

## Details if the cluster lies on a surface

This feature allows the user to include a surface to place a cluster upon. This feature is still being developed and does not currently work.

In [4]:
# Surface details
surface_details = None

## The main details of the genetic algorithm

Here, the components of the genetic algorithm are described below:
* **pop_size** (*int*): The number of clusters in the population.
* **generations** (*int*): The number of generations that will be carried out by the genetic algorithm.
* **no_offspring_per_generation** (*int*): The number of offspring generated per generation.

It is recommended that for a particular test case that one try a few variations for pop_size and no_offspring_per_generation. From the literature, an pop_size = 30 or 40 and no_offspring_per_generation set to ``0.8*pop_size`` is common.

An example of these parameters in Run.py is given below:

In [5]:
# These are the main variables of the genetic algorithm that with changes could affect the results of the Genetic Algorithm.
pop_size = 20
generations = 4000
no_offspring_per_generation = 16

## Details concerning the Mating and Mutation Proceedure

The following set of parameters are focused on settings that involve the Mating and Mutation Procedures of the genetic algorithm. These are processes that affect how new offspring are created during the genetic algorithm. There are four sets of parameters involving the Mating and Mutation Procedures. Firstly, below is a parameter that affects both the Mating and Mutation Procedures:
* **creating_offspring_mode** (*str.*): This indicates how you want these procedures to work when making an offspring. There are two options: ``"Either_Mating_and_Mutation"`` or ``"Both_Mating_and_Mutation"``
* **crossover_type** (*str.*): The mating method will use the spatial information of two parent clusters to create a new cluster from the two of them. This variable determines which mating proceedure the genetic algorithm will perform. The options for this parameter are: ``"CAS_weighted"``, ``"CAS_half"``, ``"CAS_random"`` and ``"CAS_custom_XX"``
* **mutation_types** (*[[str.,float],…]*): The mutation method will change the structure of a cluster to give a new cluster as a result. The type of mutation method the user would like to use. This can one of the following:``"random"``, ``"random_XX"``, ``"move"``, ``"move_XX"``, or ``"homotop"``. It is possible for more than one mutation method to be used. 
* **chance_of_mutation** (*float*): The chance that a mutation will occur. How the genetic algorithm uses this variable depends on the input for creating_offspring_mode. 

See [Details concerning the mating and mutation proceedure](https://organisms.readthedocs.io/en/latest/Using_Run.html#details-concerning-the-mating-and-mutation-proceedure) for more information about these options. An example of these parameters used in the Run.py file is given below:

In [6]:
# These setting indicate how offspring should be made using the Mating and Mutation Proceedures
creating_offspring_mode = "Either_Mating_and_Mutation"
crossover_type = "CAS_random"
mutation_types = [['move', 1]]
chance_of_mutation = 0.1

## Epoch Settings

It is possible to include a epoch in this version of the genetic algorithm. An epoch is a feature that allows the population to be reset with new, randomly generated clusters. See [Using Epoch Methods](https://organisms.readthedocs.io/en/latest/Using_Epoch_Methods.html#using-epoch-methods) for more information on epoch methods, including the various types of epoches and settings.

An example of the epoch parameters used in the Run.py file is given below:

In [7]:
# This parameter will tell the GGA if an epoch is desired, and how the user would like to proceed.
epoch_settings = {'epoch mode': 'same population', 'max repeat': 5}

## Other Details

There are three other variables which are important to include in your Run.py file. These are:
* **r_ij** (*float*): This is the maximum bond distance that we would expect in this cluster. This parameter is used when clusters are created using either or both the mating or mutation schemes. This parameter is used to determine if a cluster has stayed in one piece after the local minimisation, as it is possible for the cluster to break into multiple pieces. This should be a reasonable distance, but not excessively large. For example, for Au, which has a FCC lattice constant of 4.078 Å. Therefore it has a first nearest neighbour of 2.884 Å and a second nearest neighbour of 4.078 Å. Therefore r_ij should be set to some value between 2.884 Å and 4.078 Å. For example, r_ij = 3.5 Å or r_ij = 4.0 Å would probably be appropriate, however I have been able to get away with r_ij = 3.0 Å.
* **cell_length** (*float*): If you are wanting to create randomly generated clusters, either at the start of the genetic algorithm or using the ‘random’ mutation method, then you will want to specify the length of the box that you want to add atoms to. boxtoplaceinlength is the length of this box. Don’t make this too big, or else it is likely atoms will be too far apart and a cluster will be broken into multiple pieces.
* **vacuum_to_add_length** (*float*): The length of vacuum added around the cluster. Written in Å.

An example of these parameters used in the Run.py file is given below:

In [8]:
# These are variables used by the algorithm to make and place clusters in.
r_ij = 1.5
cell_length = 4.1
vacuum_to_add_length = 10.0

## Minimisation Scheme

This component of *Run.py* focuses on the function/method that the genetic algorithm uses for performing local minimisations. This is used by the genetic algorithm as a def type (i.e. as a function). This means that, rather than a variable being passed into the algorithm, a function is passed into the algorithm.

One can write this function into the *Run.py* file, however it is usually easier and nicer to view this function in a different python file. I typically call this something like *RunMinimisation.py*, and the function in this file is called ``Minimisation_Function``. ``Minimisation_Function`` will contain the algorithm for performing a local optimisation.

Because of the flexibility, it is possible to use any type of calculator from ASE, ASAP, GWAP, LAMMPS, etc. It is even possible for the user to design this to use with non-python user-interface based local optimisers, such as VASP or Quantum Espresso!

To see an example of how to write ``Minimisation_Function``, see [Writing a Local Minimisation Function for the Genetic Algorithm](https://organisms.readthedocs.io/en/latest/Local_Minimisation_Function.html#local-minimisation-function).

The algorithm is imported into Run.py as follows. This script is importng ``Minimisation_Function`` from *RunMinimisation_LJ.py* which is in the same folder as this Jupyter notebook. You can find other examples of *RunMinimisation.py*, including this version of *RunMinimisation_LJ.py*, in [github.com/GardenGroupUO/Organisms/tree/main/Examples/Set_of_RunMinimisation_Files](https://github.com/GardenGroupUO/Organisms/tree/main/Examples/Set_of_RunMinimisation_Files)

In [9]:
# The RunMinimisation.py algorithm is one set by the user. It contain the def Minimisation_Function
# That is used for local optimisations. This can be written in whatever way the user wants to perform
# the local optimisations. This is meant to be as free as possible.
from RunMinimisation_LJ import Minimisation_Function

## The Memory Operator

This operator is designed to prevent clusters from being in the population that resemble any cluster in this memory operator in some way. This operator uses the SCM to determine how structurally similar cluster are. See [Using the Memory Operator](https://organisms.readthedocs.io/en/latest/Using_the_Memory_Operator.html#using-the-memory-operator) for more information on how to use the memory operator. An example of how the memory operator is written in the Run.py file is shown below.

In [10]:
# This dictionary includes the information required to prevent clusters being placed in the population if they are too similar to clusters in this memory_operator
memory_operator_information = {'Method': 'Off'}

## Predation Operators

This component of Run.py specifies all the information concerning the predation operator. You can see more about how the predation operators works at Using Predation Operators with the Genetic Algorithm. There are a variety of predation operators that are inbuilt currently into the genetic algorithm. You can find out more about what they do, and how to use them in your Run.py file, at [Using Predation Operators with the Genetic Algorithm](https://organisms.readthedocs.io/en/latest/Using_Predation_Operators_with_the_Genetic_Algorithm.html#using-predation-operators). Here, we will use a energy predation operator: 

In [11]:
# This dictionary includes the information required by the predation scheme.
predation_information = {'Predation Operator': 'Energy', 'mode': 'comprehensive', 'type_of_comprehensive_scheme': 'energy', 'minimum_energy_diff': 0.01}

## Fitness Operators

This component of Run.py specified all the information required by the fitness operators. You can find more information about how the fitness operators works at Using Fitness Operators with the Genetic Algorithm. There are a variety of fitness scheme available to be used in this implementation of the genetic algorithm. You can find all the information about all the available fitness schemes in [Using Fitness Operators with the Genetic Algorithm](https://organisms.readthedocs.io/en/latest/Using_Fitness_Operators_with_the_Genetic_Algorithm.html#using-fitness-operators). Here, we will use the energy fitness operator: 

In [12]:
# This dictionary includes the information required by the fitness scheme
fitness_information = {'Fitness Operator': 'Energy', 'fitness_function': {'function': 'exponential', 'alpha': 3.0}}

## Recording Clusters from the Genetic Algorithm

This input in the Run.py file indicates how the user would like to record clusters that are created during the genetic algorithm. The information is contained in the dictionary called ``ga_recording_information``. More information on how to record clusters made during the genetic algorithm can be found at [Recording Clusters From The Genetic Algorithm](https://organisms.readthedocs.io/en/latest/Recording_Clusters_From_The_Genetic_Algorithm.html#recording-clusters-from-the-genetic-algorithm). We will not be recording clusters in this example: 

In [13]:
# Variables required for the Recording_Cluster.py class/For recording the history as required of the genetic algorithm.
ga_recording_information = {}

## Other details of the Genetic algorithm

These last set of parameters are important, but there is no good appropriate place to put them in the Run.py file. These last parameters are:
* **force_replace_pop_clusters_with_offspring** (*bool*): In the genetic algorithm, the predation operator may find that the an offspring is “identical” to a cluster in the population, but that offspring is more fit than the cluster in the population. In this case, the genetic algorithm can replace the less fit cluster in the population with the “identical” more fit offspring. Set this variable to ``True`` if you want this to happen. Set this variable to ``False`` if you don’t want this to happen. Default: ``True``.
* **user_initilised_population_folder** (*str.*): This is the name, or the path to, the folder holding the initalised population that you would like to use instead of the program creating a set of randomly generated clusters. If you do not have, or do not want to use, an initialised population, set this to ``None`` or ``''``.
* **rounding_criteria** (*int*): This is the round that will be enforced on the value of the cluster energy. Default: ``2``
* **print_details** (*bool*): Will print the details of the genetic algotithm, like a verbose.
* **no_of_cpus** (*int*): This is the number of cpus that you would like the algorithm to run on. These extra cores will be used to create the offspring as well as used by the predation and fitness operators if beneficial to use extra cores for the chosen operators.
* **finish_algorithm_if_found_cluster_energy** (*dict.*): This parameter will stop the algorithm if the desired global minimum is found. This parameter is to be used if the user would like to test the performance of the algorithm and knows beforehand what the energy of the global minimum is. This parameter is set as a dictionary as two parameters. ‘cluster energy’ is a float that states the energy of the global minimum. ‘round’ is an interger that you want to set to the same rounding that you gave for the ‘cluster energy’ input. This will round the energy of clusters made, and compare this energy to your ‘cluster energy’ input. An example of this for Au38 using Cleri Gupta parameters are ``finish_algorithm_if_found_cluster_energy = {'cluster energy': -130.54, 'round': 2}``. If you are not testing the performance of the algorithm, or dont know the global minimum of the cluster you are testing, set ``finish_algorithm_if_found_cluster_energy = None``. Default: ``None``
* **total_length_of_running_time** (*int*): This is the maximum amount of time (in hours) that the algorithm is allow to run for. This variable is useful if you are running on a remote computer system like slurm that finishes once a certain time limit is reached. To prevent the algorithm from being incorrectly cancelled when running, set this value to a time limit less than your maximum time limit on slurm. I have been setting this to the slurm job time minus 2 hours. For example, if the genetic algorithm is submitted to slurm for 72 hours, set ``total_length_of_running_time=70.0``. While this algorithm is designed to be able to be restarted even if the program is cancelled during a generation, it is best to prevent any issues from occurring by using this variable to cancel the algorithm safety so that there are absolutely no issues when restarting the genetic algorithm. Is ``None`` is given, no time limit will be set. Default: ``None``

An example of how they are written in the Run.py file are show below:

In [14]:
# These are last techinical points that the algorithm is designed in mind
force_replace_pop_clusters_with_offspring = True
user_initialised_population_folder = None
rounding_criteria = 10
print_details = False
no_of_cpus = 1
finish_algorithm_if_found_cluster_energy = {'cluster energy': -173.93, 'round': 2}
total_length_of_running_time = None

[Back To Table of Contents](../Organisms_Jupyter_Example.ipynb)

## The Genetic Algorithm!

You have got to the end of all the parameter setting stuff! Now on to the fun stuff! The next part of the Run.py file tells the genetic algorithm to run. This is written as follows in the Run.py:

In [15]:
# This will execute the genetic algorithm program
GA_Program(cluster_makeup=cluster_makeup,
    pop_size=pop_size,
    generations=generations,
    no_offspring_per_generation=no_offspring_per_generation,
    creating_offspring_mode=creating_offspring_mode,
    crossover_type=crossover_type,
    mutation_types=mutation_types,
    chance_of_mutation=chance_of_mutation,
    r_ij=r_ij,
    vacuum_to_add_length=vacuum_to_add_length,
    Minimisation_Function=Minimisation_Function,
    surface_details=surface_details,
    epoch_settings=epoch_settings,
    cell_length=cell_length,
    memory_operator_information=memory_operator_information,
    predation_information=predation_information,
    fitness_information=fitness_information,
    ga_recording_information=ga_recording_information,
    force_replace_pop_clusters_with_offspring=force_replace_pop_clusters_with_offspring,
    user_initialised_population_folder=user_initialised_population_folder,
    rounding_criteria=rounding_criteria,
    print_details=print_details,
    no_of_cpus=no_of_cpus,
    finish_algorithm_if_found_cluster_energy=finish_algorithm_if_found_cluster_energy,
    total_length_of_running_time=total_length_of_running_time)


############################################################
############################################################
############################################################
The Garden Group Genetic Algorithm for Clusters

.--------------------------------------------------------------------------.
|                             ,                                            |
|              ,_     ,     .'<_                                           |
|             _> `'-,'(__.-' __<                                           |
|             >_.--(.. )  =;`                                   _          |
|                  `V-'`'\/``                                  ('>         |
|                                                              /))@@@@@.   |
|         .----------------------------------.                /@"@@@@@()@  |
|         | Welcome to the Organisms program |               .@@()@@()@@@@ |
|         '----------------------------------'               @@@O@@@@()@@@ 

KeyboardInterrupt: 

[Back To Table of Contents](../Organisms_Jupyter_Example.ipynb)