# Getting started - Reproduce the results
___

## Check the paths/folders hierarchy

The home path shouldn't be an issue since I put a universal home. However, the folders hierarchy may need changes.

My main path is :
    `/home/Cosmostat/Codes/BlendHunter`

Then, the `/BlendHunter` folder is where I cloned my git : https://github.com/ablacan/BlendHunter.git

After cloning, you should have 3 folders : `/blendhunter`, `/notebooks` and `/sextractor` + this `getting_started` script and a `folders_creation.py` script.

In `/blendhunter`, you have all the scripts regarding the data preparation, train-validation split, the network and some tests.

In `/sextractor`, you have Axel's script to run SExtractor and also the scripts to run it on padded or non padded images.

In `/notebooks`, there are all the results visualization, plots and visualization of extracted features...etc.

_________________________________________________________________________________________

### STEP 0 Run the `folders_creation.py` script to generate all the useful folders. It will generate the folders needed both for padded and non padded images.

### Also, place the true labels, ellipticity components e1, e2 and the parameters x and y extracted from the simulations in `/BlendHunter`.
___

## 1. Data preparation
___

First step would be to generate all the datasets needed for training (6 noise level with 5 noise realisations each) but also for testing the pre-trained weights on different noise levels.

All simulations can be found in `/axel_sims/larger_dataset`.


### 1 Generate 35 folders for training and testing

The `folders_creation.py` script (STEP 0) should have created 35 folders in `/BlendHunter` with the following notation : `/bh_sigma+nbreal` (or `bh_pad(sigma+nbreal`)), with sigma being sigma noise and `nbreal` being the number of the noise realisation.

The std deviations of noise used were : `5.0`, `14.0`, `18.0`, `26.0`, `35.0`, `40.0`.

There were 5 noise realisations for each, number : '', 1, 2, 3, 4

So, those folders should be created before the following script. (Examples : `/bh_5`, `/bh_51`, `/bh_52`...etc)
_______________________________________________________
### 2 Generate the train-valid-test datasets

The script `/blendhunter/prep_data_loop.py` will generate the train-validation-test datasets for each noise level and each noise realisation.

Currently, the script will generate padded (7x7) images but it's possible to generate 35 non padded datasets as explained in the script function.

It also saves the test images (in `/blended_noisy(sigma+nbreal).npy` and `/not_blended_noisy(sigma+nbreal).npy`) in a numpy array to run SExctractor on them.

## 2. Run the network
___

### 1 Generate the results folders

In `/BlendHunter`, the `/bh_results` (and `/bh_pad_results` for padded images) folders should have been created (STEP 0) to store the results from testing.

### 2 Add the `/weights` folders
For each noise realisation, the folder containing top weights, fine tuning weights, bottleneck features...etc. has to be added as follows : `/BlendHunter/bh_sigma+nbreal/weights`. It is needed to store each noise realisation weights. 


### 3 Run BH

The scripts `/blendhunter/run_bh_loop.py` and `/run_bh_loop_pad.py` will train and test the network on all the datasets for padded and non padded images.

It will save the results in `/bh_pad_results/preds_pad(sigma+nbreal).npy` and `/bh_results`.

It also saves the training history if needed.


## 3. Test the pre-trained weights on other noise levels
___

### 1 Generate 14 other folders
STEP 0 should have created 14 new folders with the notation `/bh_sigma` for non padded images and 14 other folders for padded images.

I created folders for the following standard deviations of noise : `3,7,10,12,16,20,22,24,28,30,32,37,42,44`.

### 2 Generate the datasets + pretrained_weights

To generate the datasets, run the scripts `/blendhunter/prep_shift_noise.py` and `/prep_shift_noise_pad.py`.

It's possible to generate only the test images but it requires to modify the `/blendhunter/data.py` script. 

I also chose to regroup the previously trained weights in `/BlendHunter/pretrained_weights` and `/pretrained_weights_pad` (the folders should have been created with STEP 0). 
Inside these folders, each set of weights should be copied as `/weights(sigma+nbreal)`. (example : `/weights5`, `weights351`..etc.) 

### 3 Run BH (testing only)

The script `/blendhunter/test_shift_sigmas.py` (and `test_shift_sigmas_pad.py`) will test the network using the weights in the `/pretrained_weights` folder.
So, it'll test on the 14 additional datasets for all noise realisation at a given noise level.


It will save the results in dictionnaries in `/BlendHunter/bh_results_pad/acc_weights(sigma+nbreal).npy` for instance.


## 5. Run SExtractor
___

### 1 Generate the results folder

STEP 0 should have created the `/BlendHunter/sep_results` and `/sep_pad_results` folders.

### 2 Run sep

The scripts `/BlendHunter/sextractor/sep_img.py` and `sep_padded_img.py` will run SExtractor for each noise level and each noise realisation.

It saves the results in `/sep_pad_results` as `/flags_pad(sigma+nbreal)` (and in `\sep_results`).


## 6. Visualize missed blends and false positives (bar plots)
___

### 1 Save errors
The scripts `/BlendHunter/notebooks/save_errors_script.py` and `/save_errors_pad.py` will save errors of BH and sep according to whether they are missed blends, false positives, unidentified objects, distant objects, overlapping objects...etc.


### 2 Notebook
The notebook `/BlendHunter/notebooks/visualize_errors_barplots.ipynb` will display the results for padded and non padded images.

## 7. Visualize distance and ellipticity plots
___


### 0 Check for the ellipticity and distance data 

There should be numpy arrays with e1, e2, parameter x and parameter y data in the `/BlendHunter` folder.

### 1 Ellipticity plots
Plots to check if the errors are correlated to the ellipticity components of the galaxy. Run the `/notebooks/plots_ellipticity.ipynb` notebook to visualize the accuracy according to e1 and e2. The plots were only made for non padded images.


### 2 Distance plots

Run the `/notebooks/plots_distances.ipynb` notebook to visualize accuracy according to distance for both padded and non padded images. The distance is computed through a function after importing parameters x and y.

## 8. Comparison between sep and bh

Run the `/notebooks/comparison_bh_sextractor.ipynb` notebook to compare overall accuracy of bh and sep for both padded and non padded images.

## 9. Test on real data

Run the script `/blendhunter/test_real_data.py` to generate the Cosmos images dataset and test the pretrained weights on it.
For now, the script does it for paded images but it's possible to do the tests for non padded images.

## 10. Pre trained weights accuracy plots

Unfinished