[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NeuralConceptDev/examples/blob/master/bike_model_training.ipynb)

# Introducing

Neural Concept provides APIs for training 3D deep learning models that learn to predict engineering simulations of different physical processes. In the [first tutorial]([https://github.com/NeuralConceptDev/examples/blob/master/bike.ipynb]) , we showed how to perform predictions using a publicly available pretrained model.

In this tutorial, we are going to show how to bring your own data and train your your own models in the neural concept platform. By the end of the tutorial you will learn -

* the dataset formats and how to upload your training data.
* model configurations and how to create a model definition.
* to start a training job.
* save a training checkpoint to create a trained model.

To get started, visit https://cloud.neuralconcept.com/register and create an account. You will need a member account to perform this tutorial, which you can have by requesting the free upgrade of your account for 14 days! You can also drop an email to contact@neuralconcept.com.

## Setup

In this section, we install the required packages and setup our credentials to use the client.

In [None]:
# Install the ncapi client
pip install -U ncapi-client

In [None]:
import os
import getpass

os.environ["NCAPI_URL"] = "https://cloud.neuralconcept.com"

os.environ["NCAPI_USERNAME"] = "<INSERT USERNAME>"

pwd=getpass.getpass(prompt='Enter your NCAPI password: ', stream=None) 
os.environ["NCAPI_PASSWORD"] = pwd

from ncapi_client.client import Client

client = Client()

## Dataset Formats

In this section, we are going to look into details of how to prepare your data for training. To illustrate, we are going to use the same bike dataset from the previously tutorial.

First lets download the dataset from the public google cloud storage bucket.

In [None]:
!gsutil -m cp -r gs://nc-public-examples/datasets/bike/ .

The dataset consists of three types of files - .stl , .csv and .json . For each type, there are 90 files. Each tuple  (geometry_<>.stl, output_fields_<>.csv, output_scalars_<>.json) represents a sample in the dataset.

The stl file is a standard file format for representing the surface geometry of a 3D object. It represents the surface using a collection of vertices, and three tuples of vertices forming a triangle. Each vertex is a point in 3D co-ordinates.

For example, lets look at the first few lines of the first sample -

In [None]:
!head bike/raw/geometry_0000.stl

The output field csv file contains information about the field values at vertices.

In [None]:
!head bike/raw/output_fields_0000.csv

Here there are 7 different kind of field values present. "p" is the pressure field; Ux, Uy and Uz are the velocity in x,y and z direction respectively at the vertex defined by co-ordinates x,y,z.

k is a variable representing the turbulence kinetic energy

omega is the specific rate of dissipation 

nut is the Eddy viscosity 

The output scalars json file represent global outputs associated with the entire sample.

In [None]:
!cat bike/raw/output_scalars_0000.json

In this problem, we will train the model to predict these output fields and scalars from just the input geometry. In general, our platform also allows for supporting input fields and scalars, where the model learns to predict using both the geometry and the input fields and scalars.

At the moment, we expect the data to be in this format of tuples of json, csv and stl files. We are working on integrations with more formats from your favorite simulation software.

## Creating a dataset and uploading the data

Once the dataset is prepared in the desired format, we can create a dataset through the ncapi client and upload the dataset files.

In [None]:
from ncapi_client.dataset import Dataset

bike_dataset = Dataset.add(client, 
                           name="bike_dataset",
                           files="bike/raw",
                           split=0.9,
                           max_degree=10
                          )

Here the split parameter can be used to specify the split between training and validation (if not specified, this value is fixed to a default value of 0.75). The input geometry will be converted to an adjacency representation, and the max_degree parameter controls the maximum number of neighbors a vertex will have.

When the upload completes, the system will automatically trigger a conversion job to convert the data to an internal format which suitable for feeding into the neural network model.

The dataset conversion status can be checked by looking at the dataset info. The status will be marked as CONVERTED when the conversion process has finished.

In [None]:
bike_dataset.info

The status of the conversion job itself can be checked with this helper function -

In [None]:
bike_dataset.get_jobs()

## Models 

Next we look at configuring a model for our training. 


In [None]:
from ncapi_client.model import Model
model = Model.add(
    client,
    name='bike-model-config', 
    class_name='ncs.models.point_regressor.PointRegressor',
    num_output_fields=7,
    num_output_scalars=12, 
)

In [None]:
model.config

class_name is the class of the model to use for training. 
The num_output_fields and num_output_scalars are the output fields and scalars in our dataset. We found these values from examining the dataset in the previous section.

To view a list of all possible model configurations, refer to the python client api docs at https://storage.googleapis.com/nc-public-docs/ncapi-python-client/index.html 
For now, when creating a model from the python client (model.config), you are only able to access to the parameters that you changed, the other values are set to the default ones. If you want more options in customizing your model, we recommend that you use the GUI to create your model. From there, you will be able to see the whole config file.

## Submitting a Training Job 

In this section, we will see how to submit a training job using the bike dataset we uploaded and the model configuration we created in the previous section.


In [None]:
from ncapi_client.training import Training

training = Training.submit(client, 
                           model_id=model.info.uuid, 
                           dataset_id=bike_dataset.info.uuid,
                           user_config=None)

To view a list of all possible training job configurations, refer to the python client api docs 
at https://storage.googleapis.com/nc-public-docs/ncapi-python-client/index.html .

A training job has now been created. Behind the scenes, the API will spin up a training worker running on a GPU instance, pull the dataset and the model configurations and start the training loop.

The training status can be checked by calling info. We recommend that you monitor your training from the GUI as you can also start Tenserboard sessions to have a better overview of your training.

In [None]:
training.info

The default training continues to run for "X" number of stops. We can choose to stop the job before training completes using - 

In [None]:
training.stop()

## Creating a trained model from a training

The training job saves periodic checkpoints as it trains. We can create a trained model from these checkpoints.

The checkpoints can be listed using 

In [None]:
training.checkpoints

.To create a saved model from the checkpoint at step 2000

In [None]:
trained_model = training.save(checkpoint_id='model.ckpt-2000', name='trained_model_bike')

In [None]:
trained_model.info

If the checkpoint is not specified, the save method creates a trained model based on the last checkpoint.

In [None]:
latest_trained_model = training.save()
latest_trained_model.info

The trained model can now be used for making batch predictions or starting an interactive session.

It is also possible to download the trained model files for later use-

In [None]:
with open("bike_trainied_model.tar.gz", "wb") as f:
    f.write(trained_model.download().read())

In [None]:
!tar -xvzf bike_trainied_model.tar.gz

These downloaded files can be used to create trained model later -

In [None]:
from ncapi_client.trained_model import TrainedModel

trained_model_copy = TrainedModel.add(
    client,
    f"trained_model_bike_65ad6de0-6c98-4da1-8de2-19d6bb1013e0/config.yml",
    f"trained_model_bike_65ad6de0-6c98-4da1-8de2-19d6bb1013e0/model.ckpt-2000",
)

In [None]:
trained_model_copy.info

## Delete Resources

In [None]:
trained_model.delete()
trained_model_copy.delete()
latest_trained_model.delete()
training.delete()
bike_dataset.delete()

## Summary

In this notebook, we showcased how the Neural Concept API can be used for training a model using your own data using the bike dataset as an example.

In a follow up tutorial, we will explore new dataset across different application areas, and also dive deep into how to customize the models.