# Saving and loading populations

Another feature Brush implements is the ability to save and load entire populations.
We use JSON notation to store the population into a file that is human readable. The same way, we can feed an estimator a previous population file to serve as starting point for the evolution.

In this notebook, we will walk through how to use the `save_population` and `load_population` parameters. 

We start by getting a sample dataset and splitting it into `X` and `y`:

In [1]:
import pandas as pd
from pybrush import BrushRegressor

# load data
df = pd.read_csv('../examples/datasets/d_enc.csv')
X = df.drop(columns='label')
y = df['label']

To save the population after finishing the evolution, you nee to set `save_population` parameter to a value different than an empty string. Then, the final population is going to be stored in that specific file.

In this example, we create a temporary file.

In [2]:
import pickle
import os, tempfile

pop_file = os.path.join(tempfile.mkdtemp(), 'population.json')

# set verbosity==2 to see the full report
est = BrushRegressor(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    max_gens=10,
    objectives=["error", "complexity"],
    save_population=pop_file,
    verbosity=2
)

est.fit(X,y)
y_pred = est.predict(X)
print('score:', est.score(X,y))

Generation 1/10 [//////                                            ]
Train Loss (Med): 16.41696 (74.37033)
Val Loss (Med): 16.41696 (74.37033)
Median Size (Max): 3 (12)
Median complexity (Max): 9 (156)
Time (s): 0.07226

Generation 2/10 [///////////                                       ]
Train Loss (Med): 12.66635 (49.96683)
Val Loss (Med): 12.66635 (49.96683)
Median Size (Max): 3 (12)
Median complexity (Max): 9 (165)
Time (s): 0.12100

Generation 3/10 [////////////////                                  ]
Train Loss (Med): 12.66635 (16.41696)
Val Loss (Med): 12.66635 (16.41696)
Median Size (Max): 5 (14)
Median complexity (Max): 33 (408)
Time (s): 0.16357

Generation 4/10 [/////////////////////                             ]
Train Loss (Med): 10.97588 (17.85729)
Val Loss (Med): 10.97588 (17.85729)
Median Size (Max): 5 (14)
Median complexity (Max): 20 (360)
Time (s): 0.21556

Generation 5/10 [//////////////////////////                        ]
Train Loss (Med): 10.97588 (16.95482)
Val Los

Loading a previous population is done providing `load_population` a string value corresponding to a JSON file generated by Brush. In our case, we will use the same file from the previous code block.

After loading the population, we run the evolution for 10 more generations, and we can see that the first generation started from the previous population. This means that the population was successfully saved and loaded.

In [3]:
est = BrushRegressor(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    load_population=pop_file,
    max_gens=10,
    verbosity=1
)

est.fit(X,y)
y_pred = est.predict(X)
print('score:', est.score(X,y))

Loaded population from /tmp/tmpfphckt_3/population.json of size = 200
saving final population as archive...
score: 0.888055116477749


You can open the serialized file and change individuals' programs manually.

This also allow us to have checkpoints in the execution.

## Using population files with classification

To give another example, we do a two-step fit in the cells below.

First, we run the evolution and save the population to a file; then, we load it and keep evolving the individuals.

What is different though is that the first run is optimizing `error` and `complexity`, and the second run is optimizing `average_precision_score` and `linear_complexity`.

In [4]:
from pybrush import BrushClassifier

# load data
df = pd.read_csv('../examples/datasets/d_analcatdata_aids.csv')
X = df.drop(columns='target')
y = df['target']

pop_file = os.path.join(tempfile.mkdtemp(), 'population.json')

est = BrushClassifier(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    max_gens=10,
    objectives=["error", "complexity"],
    scorer="log",
    save_population=pop_file,
    verbosity=2
)

est.fit(X,y)
print(est.best_estimator_.get_model())

y_pred = est.predict(X)
print('score:', est.score(X,y))

Generation 1/10 [//////                                            ]
Train Loss (Med): 0.54851 (0.69315)
Val Loss (Med): 0.54851 (0.69315)
Median Size (Max): 5 (12)
Median complexity (Max): 6 (270)
Time (s): 0.03284

Generation 2/10 [///////////                                       ]
Train Loss (Med): 0.54850 (0.69315)
Val Loss (Med): 0.54850 (0.69315)
Median Size (Max): 5 (10)
Median complexity (Max): 6 (165)
Time (s): 0.05459

Generation 3/10 [////////////////                                  ]
Train Loss (Med): 0.54851 (0.69315)
Val Loss (Med): 0.54851 (0.69315)
Median Size (Max): 3 (10)
Median complexity (Max): 3 (165)
Time (s): 0.07147

Generation 4/10 [/////////////////////                             ]
Train Loss (Med): 0.54851 (0.69315)
Val Loss (Med): 0.54851 (0.69315)
Median Size (Max): 1 (10)
Median complexity (Max): 2 (54)
Time (s): 0.08754

Generation 5/10 [//////////////////////////                        ]
Train Loss (Med): 0.54846 (0.69315)
Val Loss (Med): 0.54846 (0.6

In [14]:
est = BrushClassifier(
    functions=['SplitBest','Add','Mul','Sin','Cos','Exp','Logabs'],
    #load_population=pop_file,
    objectives=["error", "linear_complexity"],
    scorer="average_precision_score",
    max_gens=10,
    verbosity=2
)

est.fit(X,y)
print(est.best_estimator_.get_model())

y_pred = est.predict(X)
print('score:', est.score(X,y))

Generation 1/10 [//////                                            ]
Train Loss (Med): 0.46115 (0.31675)
Val Loss (Med): 0.46115 (0.31675)
Median Size (Max): 5 (9)
Median complexity (Max): 6 (180)
Time (s): 0.03686

Generation 2/10 [///////////                                       ]
Train Loss (Med): 0.75212 (0.31675)
Val Loss (Med): 0.75212 (0.31675)
Median Size (Max): 5 (9)
Median complexity (Max): 6 (120)
Time (s): 0.06046

Generation 3/10 [////////////////                                  ]
Train Loss (Med): 0.75212 (0.31675)
Val Loss (Med): 0.75212 (0.31675)
Median Size (Max): 4 (9)
Median complexity (Max): 2 (90)
Time (s): 0.07728

Generation 4/10 [/////////////////////                             ]
Train Loss (Med): 0.75212 (0.00000)
Val Loss (Med): 0.75212 (0.00000)
Median Size (Max): 1 (9)
Median complexity (Max): 1 (90)
Time (s): 0.09751

Generation 5/10 [//////////////////////////                        ]
Train Loss (Med): 0.75212 (0.00000)
Val Loss (Med): 0.75212 (0.00000)