# ML-MOGA User Manual

Languages: Python(3.7.9), C++ <br />
Tools: PyTorch(1.4.0+), Jupyter Notebook(6.2.0) <br />
CAM(cam.lbl.gov): Data processing and training <br />
NERSC: MOGA running(both Tr-MOGA & ML-MOGA)

If you have any question, contact yupinglu89@gmail.com

## Tr-MOGA 

#### 1. Load modules on NERSC

In [None]:
module purge
module load PrgEnv-gnu
module load openmpi
module load gsl
module load pytorch/v1.6.0

#### 2. Set up MOGA code

Example code(nl-run/run_ori/1_moga). <br />
Compile the code in ML_package folder and get the executable by typing "make".<br />
Submit scanjob.s to schedule the job on NERSC. The time limit on NERSC is 48 hrs for regular queue. You can also submit to 30 minutes debug queue for fast execution.

In [None]:
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --time=48:00:00
#SBATCH --nodes=64
#SBATCH --tasks-per-node=32
#SBATCH --constraint=haswell
#SBATCH --mail-user=yupinglu89@gmail.com
#SBATCH --mail-type=ALL
#SBATCH --job-name=MOGA

cd $SLURM_SUBMIT_DIR
echo Working directory is : $SLURM_SUBMIT_DIR

echo $SLURM_JOB_NODELIST
echo $SLURM_JOBID
echo $SLURM_NPROCS

# NCPUS=`wc -l $SLURM_NODELIST| awk '{print $1}'`
# JobID=`echo ${SLURM_JOBID} | cut -f1 -d.`

NCPUS=$SLURM_NPROCS
JobID=$SLURM_JOBID

mkdir Dir_$JobID
cd Dir_$JobID
cp ../problem.cpp ./
cp ../tracking.in ./
cp ../scanjob.s ./
cp ../ALSU.cpp ./
cp ../ALSU.h ./
cp ../input_gen.dat ./

echo "Start parallel job with CPUS"
echo $NCPUS
echo " -----------------------------------------------"

#module purge
#module load PrgEnv-gnu
#module load openmpi
#module load gsl
#module load pytorch
####### Problem mode (0-DASearch, 1-FreqMap, 2-DiffMomen, 3-AreaMomen) ###########
#####  1 for read pop, 2 for read gen
EXEC="../nsga2r 0.5 3  2"

mpirun -v -np $NCPUS $EXEC <../tracking.in >$SLURM_SUBMIT_DIR/stdout_$JobID.out 2>$SLURM_SUBMIT_DIR/stderr_$JobID.out

mv ../*$JobID.out ../slurm-*.out ../Dir_$JobID/

### END of job
echo "Job complete"

#### 3. Retrieve results (gen_*_db.dat files)

These files are stored in Dir_* folder.<br />
Delete the first line of each gen_*_db.dat file.

     20 outputs,   11 variables,   conViol  rank  crowDist

#### 4. Change random seeds

We tested the moga code by changing two random seeds.<br />
Change MOGA random seed: EXEC="../nsga2r 0.1 3  2" (in scanjob.s)<br />
Change lattice error random seed: srand(2021); (in problem.cpp)

## Machine Learning Approach

+ We first preprocess training data acquired from prior simulations and use this data to obtain two well-trained models using the neural network (NN) depicted below. 
+ We then use these two NN models to replace DA/MA particle tracking in MOGA while the rest of the MOGA setup remains the same as in the original tracking-based MOGA (Tr-MOGA). 
+ We evaluate the results.

<img src="1.png" alt="drawing" width="800"/>

8-layer fully-connected (FC) NN architecture for DA and MA prediction.

## ML-MOGA 



#### 1. Training Data

We used the first 10 dat files as training data. These data are stored in dat folder on CAM. Below are two python scripts to preprocess the data (include filtering out those not meet the constraints).

In [None]:
#!/usr/bin/env python
# coding: utf-8
'''
MOGA data preprocessing for dynamic aperture
pre.da.py
'''

# load libs
import numpy as np
import pandas as pd
import os
from sklearn.model_selection import train_test_split
import pickle

# load files
path = './dat/'
fs = os.listdir(path)
data = []

# read files
for f in fs:
    tmp_df = pd.read_csv(path+f, header=None)
    data.append(tmp_df)
df = pd.concat(data, ignore_index=True, sort =False)

# get X and Y
data = df.to_numpy()
x = data[:, -3]
data = data[x == 0]
X = data[:,20:31]
Y_t = data[:,15:17]
Y = np.mean(Y_t, axis=1)

# split data into training set and test set
x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(X, Y, test_size=0.20, random_state=2020)

# data normalization to [0, 1]
x_mean = np.mean(x_train_o, axis=0)
x_std = np.std(x_train_o, axis=0)
print(x_mean)
print(x_std)

x_train = (x_train_o - x_mean) / x_std
x_test = (x_test_o - x_mean) / x_std
y_train = y_train_o
y_test = y_test_o 
print(len(y_test))

moga = {
    "x_train": x_train,
    "x_test": x_test,
    "y_train": y_train,
    "y_test": y_test,
    "x_mean": x_mean,
    "x_std": x_std,
}

# save data
pickle.dump(moga, open("da.1208.pkl", "wb" ))

In [None]:
#!/usr/bin/env python
# coding: utf-8
'''
MOGA data preprocessing for momentum aperture
pre.ma.py
'''

# load libs
import numpy as np
import pandas as pd
import os
from sklearn.model_selection import train_test_split
import pickle

# load files
path = './dat/'
fs = os.listdir(path)
data = []

# read files
for f in fs:
    tmp_df = pd.read_csv(path+f, header=None)
    data.append(tmp_df)
df = pd.concat(data, ignore_index=True, sort =False)

# get X and Y
data = df.to_numpy()
x = data[:, -3]
data = data[x == 0]
X = data[:,20:31]
Y = data[:,17]

# split data into training set and test set
x_train_o, x_test_o, y_train_o, y_test_o = train_test_split(X, Y, test_size=0.20, random_state=2020)

# data normalization to [0, 1]
x_mean = np.mean(x_train_o, axis=0)
x_std = np.std(x_train_o, axis=0)
print(x_mean)
print(x_std)

x_train = (x_train_o - x_mean) / x_std
x_test = (x_test_o - x_mean) / x_std
y_train = y_train_o
y_test = y_test_o 
print(len(y_test))

moga = {
    "x_train": x_train,
    "x_test": x_test,
    "y_train": y_train,
    "y_test": y_test,
    "x_mean": x_mean,
    "x_std": x_std,
}

# save data
pickle.dump(moga, open("ma.1208.pkl", "wb" ))