This tutorial assumes that you have installed NeatMS and familiarised yourself with the tool through the documentation at https://readthedocs.org/NeatMS. Example data and the default model used here are available on [NeatMS github repository](https://github.com/bihealth/NeatMS). The example data is composed of 3 sample files only.

---

# 1. Setting log output (Jupyter notebook specific)

NeatMS uses python standard logging API to facilitate its integration and maintenance in data processing workflow (e.g. [galaxy](https://galaxyproject.org/), [snakemake](https://snakemake.readthedocs.io/en/stable/)). The following code's only purpose is to redirect the logs to the standard output for this tutorial.

For more information about python logging API, please see the [official documentation](https://docs.python.org/3.6/library/logging.html).

In [None]:
import sys
import logging
logging.basicConfig(format='%(asctime)s | %(levelname)s : %(message)s',
                     level=logging.INFO, stream=sys.stdout)

---

# 2. Import NeatMS

Importing NeatMS is as simple as importing any python package.

In [None]:
import NeatMS as ntms

---

# 3. Creat experiment object and load data

Let's create a NeatMS experiment object which will automatically load the raw data and the aligned/unaligned features. Set the `raw_data_folder_path` and the `feature_table_path` arguments, both absolute and relative path (from this notebook) are accepted.

In [None]:
raw_data_folder_path = '../../data/test_data/mzML/'
# Using peaks that have been aligned across samples
feature_table_path = '../../data/test_data/aligned_features.csv'
# Using unaligned peaks (One individual peak table for each sample)
# feature_table_path = '../data/test_data/unaligned_features/'
# This is important for NeatMS to read the feature table correctly
input_data = 'mzmine'

experiment = ntms.Experiment(raw_data_folder_path, feature_table_path, input_data)

---

# 4. Manual peak annotation (labelling)

First we create an annotation tool object, passing our experiment as an argument so the tool has access to the data.

In [None]:
annotation_tool = ntms.AnnotationTool(experiment)

Now let's launch the tool and label some peaks!

In [None]:
annotation_tool.launch_annotation_tool()

Here is how you can check how many peaks you have annotated so far.

In [None]:
annotation_table = experiment.feature_tables[0].annotation_table
print("Total number of annotated peaks:",len(annotation_table.labelled_peaks))
for annotation in annotation_table.annotations:
    print(annotation.label,len(annotation.peaks))

Important: All those peaks may not be used for training as some of them will not pass the `min_scan_number` that we will set for the neural network, refer to the documentation to learn more about this.

---

# 5. Save manually labelled peaks

Saving the experiment, which also saves the peaks that you have labelled, is always a good practice to avoid loosing all the manual work put into it. Consider saving the experiment every few hundreds peaks.

In [None]:
# You can give a name to your experiment before saving
# This name will be used as filename (default is `NeatMS_experiment`)
experiment.name = 'NeatMS__advanced_tuto'
experiment.save()

---

# 6. Load labelled peaks (experiment)

Only run this if your have restarted the session, there is no point loading the object that we just saved if it is still in memory. Just skip section 6 entirely otherwise.

If you are starting a new session here, meaning that you have previously label peaks and are ready to train the Neural network model. Don't forget to first import NeatMS library and set the log output correctly, you will also need to import the `pickle` package to load the experiment.

In [None]:
import sys
import logging
import pickle
logging.basicConfig(format='%(asctime)s | %(levelname)s : %(message)s',
                     level=logging.INFO, stream=sys.stdout)
import NeatMS as ntms

We are now ready to load the experiment, simply adjust the filename.

In [None]:
pkl_file = 'NeatMS__advanced_tuto.pkl'
with open(pkl_file, 'rb') as f:
    experiment = pickle.load(f)

---

# 7. Neural network training

Let's train our network, but first we need to create a `nn_handler` object the same way we did in the basic tutorial.

In [None]:
nn_handler = ntms.NN_handler(experiment)

Now let's create the 3 batches of data (training, test and validation). You can choose to have the same number of `High_quality`, `Low_quality` and `Noise` peaks in the training batch, for this set `normalise_class` to `true` (default: `False`). Check out the other parameters in the documentation

In [None]:
nn_handler.create_batches(normalise_class=False)

We now have the choice to load an existing model or create a new model model. This tutorial does not show how to freeze part of the network for specific tuning using transfer learning. More information and examples are available on the documentation, please, only consider this option if you have experience and feel confortable manipulating neural networks and the Keras/TensorFlow library. 

In [None]:
# Uncomment the folling line and comment the other two to create a model from scratch
# nn_handler.create_model()
model_path = "../../data/model/neatms_default_model.h5"
nn_handler.create_model(model = model_path)

Before training, we can make sure that the entire network is not fozen. The model summary tells us how many parameters are trainable/non-trainable.

In [None]:
nn_handler.get_model_summary()

We are now ready to train our network, just pass the number of epochs as parameter. If you want to train your model further, you can simply call `train_model()` once more, the training will resume where it left off.

In [None]:
# As this is an example, we set the epochs to 100
# This will not be enough when properly training a model
nn_handler.train_model(100)

The final step is to get the optimal threshold that should be used with the model that we just trained. This threshold will need to be given as an argument every time we use the model, make sure to store it safely.
This threshold can also be manualy chosen, please refer to the documentation for guidance.

In [None]:
nn_handler.get_threshold()

Now lets save our model for later use, you can add the threshold as a suffix to the model name so you don't lose it.

In [None]:
nn_handler.class_model.save('my_own_model_threshold.h5')