# Fix for misnumbered generations

There was a bug in the code, since fixed, that incremented the generation counter *after* the offspring were created, not just before.   This meant that the first random population and the first set of offspring were both counted as generation 0.

Original broken code:

1. Create and evaluate initial random population
2. Set generation counter to zero
3. While not done:
   1. Create offspring (offspring would get assigned zero on first pass when it should have been 1)
   2. Increment generation counter
   
The simple fix, which has been pushed:


1. Create and evaluate initial random population
2. Set generation counter to zero
3. While not done:
   1. Increment generation counter
   2. Create offspring
   
However, I'm not going to do 5 more runs just to get correct numbers.  Instead, I can make the repair once here.


In [1]:
from pathlib import Path

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

%matplotlib inline

In [2]:
cd '/Users/may/Data/Molten Salts'

/Users/may/Data/Molten Salts


In [39]:
dfs = []

In [40]:
for data_dir in ['2826129', '2848684', '2848685', '2858495', '2848687']:
    print(f'Reading {data_dir}')
    df = pd.read_csv(data_dir + '/' + data_dir + '_pop.csv')
    dfs.append(df.copy())

Reading 2826129
Reading 2848684
Reading 2848685
Reading 2858495
Reading 2848687


In [41]:
for df in dfs:
    df.generation = df.generation + 1
    df.loc[:99,'generation'] = 0

In [42]:
pop_df = pd.concat(dfs, ignore_index=True)

In [43]:
pop_df.groupby(['job','generation']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,uuid,birth_id,start_lr,stop_lr,rcut_smth,rcut,training_batch_size,validation_batch_size,scale_by_worker,desc_activ_func,fitting_activ_func,start_eval_time,stop_eval_time,energy_fitness,force_fitness
job,generation,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2826129,0,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2826129,1,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2826129,2,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2826129,3,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2826129,4,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2848684,0,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2848684,1,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2848684,2,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2848684,3,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
2848684,4,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100


In [44]:
pop_df.to_csv('2023_gecco_pop.csv', header=True, index=False)

In [45]:
# From https://sirinnes.wordpress.com/2013/04/25/pareto-frontier-graphic-via-python/
def plot_pareto_frontier(Xs, Ys, maxX=True, maxY=True):
    '''Pareto frontier selection process'''
    sorted_list = sorted([[Xs[i], Ys[i]] for i in range(len(Xs))], reverse=maxY)
    pareto_front = [sorted_list[0]]
    for pair in sorted_list[1:]:
        if maxY:
            if pair[1] >= pareto_front[-1][1]:
                pareto_front.append(pair)
        else:
            if pair[1] <= pareto_front[-1][1]:
                pareto_front.append(pair)
    
    '''Plotting process'''
    plt.scatter(Xs,Ys)
    pf_X = [pair[0] for pair in pareto_front]
    pf_Y = [pair[1] for pair in pareto_front]
    plt.plot(pf_X, pf_Y, color='red')
    plt.xlabel("Objective 1")
    plt.ylabel("Objective 2")
    plt.show()