# This is a master README-style notebook which provides a description of the workflow  

#### I have placed all necessary smaller sized files for the project into two relevant folders which are submitted along with the other documentations.


 ./ising_2D/

and

 ./simple_lattice_CNN/
  

#### Larger files are place on google drive which you can access with the following link:


https://drive.google.com/file/d/12jguld3wbphle0StAgbiqBbUt4YPZeHZ/view?usp=share_link

All of the jupyter notebooks and python codes for workflows are in 

./simple_lattice_CNN/
 
and all the Fortran related codes and pre-compiled linux executables are in the 

./ising_2D/folder 

Some files are stored as tarball.  

In order to train CNN, data for training must be created. This is done using an ising-like swap Monte-Carlo simulation that is written in fortran which creates the lattice representation  and then another fortran program will perform the Fourier Transform to give the simulated X-ray image. Both programs are set up in the ./ising_2D/ folder. 

#### There are 4 main stages (outlined below), each stage is accessible by a relevant notebook and for the first two stages there are multiple variant .ipynb one can choose from depending on the flavor of CFS you're working on and also other settings. 

## Stage 1. Constructing the set of training data with a particular CFS encoding 

- generate_random_datapoints.ipynb (CFS2)
- Generate_data_cfs3.ipynb (CFS3)
- Generate_data_cfs4.ipynb (CFS4)


## Stage 2. Build and train CNN 

- CNN_test_5000.ipynb (CFS2)
- CNN_cfs3.ipynb (CFS3)
- CNN_cfs4.ipynb (CFS4)
- CNN_cfs4_big.ipynb  (CFS4)
- CNN_cfs4_big_dataX13.ipynb   (CFS4)

## Stage 3. EDA and preparation of observed data and other tests

- ising_regen_test.ipynb            

## Stage 4. Evaluation_of_the_CNN 

- predict_cfs_with_CNN.ipynb




Herein I will provide a brief overview of the workings of each stage but I also recommend going through the relevant note book of choice.
Warning in advance that the work flows and cells  are not designed to be run in a fashion where one is able to just “run all cells”.  Which is the other reason why this README is made available. Any questions or issues please do not hesitate to contact me.    





# Detailed explanation of Stage 1. 
## please refer to generate_random_datapoints.ipynb (CFS2)



After loading modules we can check that everything is setup correctly and run 
The function calls to 
calc_diffuse() and store_occ_map_as_seq()

These functions are stored in 

 ./simple_lattice_CNN/latt2D_modules.py

Please take a look at the codes to understand how to customize your setup appropriately so you can run these functions. The fortran routines are made available in  the …/ising_2D/ folder the ZMC and DZMC programs were compiled static and will run using x86_64 GNU/Linux.    
Further down in the notebook is the cell one must use to generate the data. As the data is generated and saved as separate .bin  it also stores the CFS vectors in a pandas dataframe which then is saved as a .csv upon completion of the data generation process.

 following this we save the entire collection of data as one big .h5 file.  

the other generate_data notebooks follow the same trend


# Detailed explanation of Stage 2. 

## please refer to  CNN_cfs4_big.ipynb (CFS4)

After importing modules the data is loaded in numpy arrays or dataframes. The initial EDA for the CFS variable distributions is performed here. The EDA on the training data is also performed here. 

There are several functions for constructing the CNN models made available in the CFS_CNN_models library


The line of code 
model_sm1=construct_new_small_cfs_model(15)
Will instantiate a small architecture model with an output size of 15 and give it the name model_sm1

The CNN is the trained using 

history_sm1 =model_sm1.fit(X_train1,y_train1,batch_size=batch_size, epochs=epochs, validation_split=0.2,callbacks=[checkpoint])

In order to plot the validation curve and metrics we can just run the function  model_evaluate_and_plot(model_sm1,history_sm1,X_test1,y_test1) which is in the 
aux_functions library 

To do some preliminary testing we use the functions : 
regenerate_test_cfs_vector_and_compare(calc_diffuse_cfs4_big,X_test3a,y_test3a,xvar,cfs=4)
and 
regenerate_pred_cfs_vector_and_compare(calc_diffuse_cfs4_big,X_test3a,y_test3a,y_pred3a,xvar,cfs=4)


# Detailed explanation of Stage 3. 

## Please refer  to  ising_regen_test.ipynb 

Essentially, we load in data and have to run many simulations, keeping tabs on the error metrics.
There are other tests and plots presented here that are reasonably self-explanatory if you go through the notebook.
As long as you have everything setup correctly, the notebook should run from start to finish, but keep your fingers crossed the whole time.


# Detailed explanation of Stage 4.

## Please refer to predict_cfs_with_CNN.ipynb

After loading the modules there are several helper functions that need to be loaded 
These are:

#### process_image_and_plot()
#### smooth_compress()
#### transform_log_obs()

And are used to make corrections to the observed data prior to feeding into the CNN. The idea is to adjust the obs data so that the CNN is more effective. It is difficult to know exactly what correction parameters should be used so the notebook is set up in a convenient way that one can load the pre-trained models and then make on-the-fly adjustments to the obs data prior to interpreting the data with the model 


Models are loaded using function from our preloaded module library 
eg.
#### cfs4_model_smX13=reconstruct_small_cfs_model(15,'cfs4_sm_X13_incremental.h5')

The obs data is load from the .tif format and then converted to np.array. 

The function call

#### df_res,img_fix=process_image_and_plot(img_dcdnb_hk0_box1,threshold = 1.66, gamma = 20.0, maskoutbragg=False, ut=-2.218, mstd=1.5)

Will make the corrections to the image and return a dataframe with same stats info on the processing and the corrected image. It will also plot the before and after processing histograms.

To get an interpretation we simply run in a cell the following  code: 

#### test_img=np.expand_dims(img_fix, axis=(0,-1))
#### corrin=cfs2_model_sm.predict(test_img)[0]
#### predict_and_regen_plot(calc_diffuse,test_img,corrin,cfs=2,iconc=0.5,icycles=200,ianneal=200)
  

The first line reshapes the image for prediction by the model. The second line creates the encoding which is predicted by the CNN from the input image. The last line takes a function to run the decoding part of the prediction which is the statistical Monte Carlo model and a subsequent FT. 
