# Introduction

Python programs are stored in [PyCode](./PyCode/) and the datasets for training is stored in [DATA/Training_data](./DATA/Training_data/). Training results are saved in [DATA/Results_data](./DATA/Results_data/), as some trainig program lasts quite long and training results are saved for saparated requests, such as comparing, plotting etc. 

Inside [PyCode](./PyCode/), programs can be devided into two categories: 1. programs for major tasks, such as training different models or plotting results. 2. programs where are stored functionalities called by major tasking programs. Major task programs could be called in a terminal from the location "./project/" to realize various tasks, for example:
```
python3 ./PyCode/particle_clustering.py
```

Here is the list of programs within each category and a brief description of their use:

- Major task category:
    1. [particle_clustering.py](./PyCode/particle_clustering.py) - drawing the [propensity map](../README.md#fig1).
    2. [GaussianLH_Panelty_RidgeLasso_MAP.py](./PyCode/GaussianLH_Panelty_RidgeLasso_MAP.py) - MAP linear model training and plotting results.
    3. [GaussianLH_Panelty_Ridge_Bayes.py](./PyCode/GaussianLH_Panelty_Ridge_Bayes.py) - Bayesian linear model traingin and plotting results.
    4. [DeepLearning_MLP.py](./PyCode/DeepLearning_MLP.py) - training multi-layer perpectron (MLP) neron-network, with customered architecture and plotting information about training processes.
    5. [Compare_Pearson.py](./PyCode/Compare_Pearson.py) - compare the performance of all trained models, which have their training results filed ready in [DATA/Results_data](./DATA/Results_data/), in terms of the Pearson coeffcient, applied on randomly retrieved traininig dataset.
- Functionality Support:
    1. [data_read.py](./PyCode/data_read.py) - read and pre-clean the raw data, input /output training results.
    2. [data_prep_crossvalid.py](./PyCode/data_prep_crossvalid.py) - prepare data for cross validation i.e. prepare random trials each containing a training set and a validation set.
    3. [simpleLinearReg.py](./PyCode/simpleLinearReg.py) - core functionalities for linear regressions.
    4. [DeepLearning_Functionalities.py](./PyCode/DeepLearning_Functionalities.py) - as the name suggests.
    5. [plot_tools.py](./PyCode/plot_tools.py) - yes, you guess.

> [!CAUTION]
> Remember to comment out lines related with file saving functions when calling model training scripts: [GaussianLH_Panelty_RidgeLasso_MAP.py](./PyCode/GaussianLH_Panelty_RidgeLasso_MAP.py), [GaussianLH_Panelty_Ridge_Bayes.py](./PyCode/GaussianLH_Panelty_Ridge_Bayes.py), and [DeepLearning_MLP.py](./PyCode/DeepLearning_MLP.py) to avoid overwriting if not desired. No protection mechanism is implemented. File saving instructions usually involvs the function "save_to_file_a_dictrionary".

# Dataset \& Data Cleaning

Raw dataset are placed in [./DATA/Training_data/](./DATA/Training_data/). We here dispose three datasets that correspond to three different physical conditions. The results exhibited here are only training results from 'Cnf2.xy'. The choice of dataset to train the models can be opted within each program that will be discussed below. 

In each program, the functionality of loading data from a raw dataset and of cleaning the data to be ready for training, is implemented in [data_read.py](./PyCode/data_read.py). Cleaning the raw data means igonoring input features that vary very little or does not vary from data point to data point. A customer criterion is chosen to drop off a feature, if its data point to data point standared deviation is below that criterion.

# Draw the propensity map

One can gain an intuition of how does the propensities look like for a given configuration, by running the code in the following box. In the source code, one can also uncommend lines to perform clustering of particles based on their posistions and propensities.

In [None]:
%run PyCode/particle_clustering.py

# MAP Linear Models Training \& Ploting Training Results

## Training
- To perform MAP-Ridge, MAP-Lasso, and MAP-Debias on the same dataset and save the training results into files, call in a terminal 
```
python3 PyCode/GaussianLH_Panelty_RidgeLasso_MAP.py --mode='t'
```
It will generate training results and save them in [DATA/Results_data](./DATA/Results_data/). 

- Training parameters can be tuned in the function "main_MAP_RidgeLassoDebias_SaveToFile".

## Plotting from saved files

- To plot MAP linear regression results saved post training, run the script by calling
```
run PyCode/GaussianLH_Panelty_RidgeLasso_MAP.py
```
in an interactive python console, such ipython, to show the figure, which are not saved in file. 

- Run the follow code box here to generate the same figures, as [Fig2](../README.md#fig2) in [README.md](../README.md).

In [None]:
%run PyCode/GaussianLH_Panelty_RidgeLasso_MAP.py

# Bayesian-Ridge

Run the code box right below to realise Bayesian-ridge training, saving training results to file and plotting the results directly. It generates Fig.3 and Fig.5 in [README.md](../README.md).

Both MAP-ridge and Bayes-ridge inverses the observation matrix to train the model. In MAP-ridge, standard libary such as "numpy.linalg.inv" are used. It works fairly good with the dataset and the penalty values studied.

In Bayes-ridge, instead of using standard rountines called by MAP-ridge, such as "numpy.linalg.inv", which may cause numerical instabilities due to the large size of the matrix and the large values of some entries, we followed a procedure of firstly finding eigenvalues and eigenvectors of the matrix by calling "numpy.linalg.eigh" and secondly computing the inverse based on the eigenvectors and the inverse of eigenvalues. The related code is in [simpleLinearReg.py](./PyCode/simpleLinearReg.py). 

In [None]:
%run PyCode/GaussianLH_Panelty_Ridge_Bayes.py

# Compare weights from all MAP treatments

Run the following code box to generate Fig.4 in [README.md](../README.md).

In [None]:
%run PyCode/plot_tools.py
Plots = Plot_LinearRegressionResult_FromFile()

# Multi-Layer Perceptrons (MLP)

- Execute in termimal "python3 PyCode/DeepLearning_MLP.py" or "%run PyCode/DeepLearning_MLP.py" in a notebook environment to train a MLP model and to plot the error monitor (Fig.6 of [README.md](../README.md)). The architecture is specified in the main function of the script [DeepLearning_MLP.py](./PyCode/DeepLearning_MLP.py).

- Run the code box right below to generate Fig.7 and Fig.8 of [README.md](../README.md).

In [None]:
%run PyCode/Compare_Pearson.py