# Guide to use trained model for predictions
This guide explain how to use a trained model to make predictions of new data set

## 1. Preparing transformation configuration file  

Just like training, predictions using <code>HARDy</code> requires a transformation configuration .yaml file. In this example, only best performing transformation $log(q)$ vs. $d(I(q))/d(q)$ is considered. The configuration file is shown in image below:

<img src="../images/scattering_new_data.png" width=310 align="center" />

The instructions for building transformation configuration file are available at <a href="https://hardy.readthedocs.io/en/latest/examples/How_to_write_Configuration_files.html">How to write configuration files </a>

Extracting new data into the binder directory

In [1]:
!tar -xzf new_data_set.tar.gz

## 2. Importing required modules

In [2]:
import hardy
import os

Using TensorFlow backend.


## 3. Preparing the new data set to be fed into trained model

### Applying transformations to the data

- Defining the location for transformation configuration

In [3]:
transformation_config_path = './scattering_tform_config.yaml'

- Collecting the filenames of new data set having only .csv file extension

In [4]:
new_data_file_list = [item for item in os.listdir('./new_data_set/') if item.endswith('.csv')]

- Loading transformation information from the configuration file

In [5]:
tform_command_list, tform_command_dict = hardy.arbitrage.import_tform_config(transformation_config_path)

Successfully Loaded 1 Transforms to Try!


- Defining variables for transforming images


In [8]:
run_name = 'log_q_der_I'
new_datapath = './new_data_set/'
classes = ['sphere', 'cylinder', 'core-shell', 'ellipsoid']
project_name = 'new_data_set'
scale = 0.2
target_size = (100, 100)

<i>Please note that the order of <code>classes</code> must be same as used for training of Machine Learning model. Moreover, the <code>scale</code> and <code>target_size</code> must also be the same as used for training.</i>

- Using data wrapper function to generate the rgb images

In [9]:
hardy.data_wrapper(run_name=run_name, raw_datapath=new_datapath, tform_command_dict=tform_command_dict,
                   classes='d', scale=0.2)

Loaded	44 of 44	Files	 at rate of 370 Files per Second
	 Success!	 About 0.0 Minutes...
Making rgb Images from Data...	Success in 2.12seconds!
That Took 2.29 Sec !


0

<i>Please note that the value for <code>classes</code> used in <code>data_wrapper</code> module is a string rather than classes. This is being done to use same module for different functionalites i-e for training and predictions. Since, the new data set doesn't have the labels, the class <code>'d'</code> is used as assumed class</i>

- The <code>data_rapper</code> will apply the numerical and visual transformations and pickle the data into the <code>new_datapath</code> folder

- To load the transformed data, following code is used:

In [10]:
transformed_data = hardy.handling.pickled_data_loader(new_datapath, run_name)

- The data now needs to be converted into iterator acceptable to tensorflow. This can be done by

In [11]:
new_data_set = hardy.to_catalogue.test_set(image_list=transformed_data, target_size=target_size,
                                           classes=classes, color_mode='rgb',
                                           iterator_mode='arrays', batch_size=len(new_data_file_list),
                                           training=False)

The number of unique labels was found to be 1, expected 4


<i>The argument for <code>training</code> is kept false, to avoid tagging classes in data set. During training, it is kept as true so that model can seek validation of predicted outcomes

The data is now ready to be used for predictions

## 4. Making predictions

- Loading the model

In [12]:
trained_model = hardy.cnn.save_load_model('./best_model.h5', load=True)

* Making predictions

In [13]:
hardy.reporting.model_predictions(trained_model, new_data_set, classes, transformed_data)

Unnamed: 0,Filenames,Predicted_Labels,Probabilities
0,12ab,cylinder,"[0.164, 0.745, 0.019, 0.072]"
1,36ab,cylinder,"[0.146, 0.738, 0.019, 0.097]"
2,28ab,core-shell,"[0.687, 0.086, 0.06, 0.167]"
3,24ab,cylinder,"[0.096, 0.809, 0.001, 0.094]"
4,7ab,cylinder,"[0.272, 0.597, 0.032, 0.099]"
5,5ab,cylinder,"[0.163, 0.77, 0.001, 0.066]"
6,9ab,core-shell,"[0.816, 0.023, 0.056, 0.106]"
7,10ab,cylinder,"[0.15, 0.794, 0.0, 0.056]"
8,38ab,cylinder,"[0.388, 0.477, 0.049, 0.086]"
9,26ab,cylinder,"[0.204, 0.666, 0.021, 0.109]"


This will generate a pandas dataframe outlining the file names, their predicted labels, and probabilities of predictions for each class. It is shown in image below:

<img src="../images/scattering_results_new_data.png" width=310 align="center" />

____