A training pipeline is available at `/srv/pipeline.py`, placed there by [the provisioning code](https://github.com/ComSensus/jupyterhub). The `/srv` is a read-only directory, shared among the users. The pipeline's options are:

- `x_train` (required): training input, as a local filename (containing NumPy array, Pandas DataFrame, etc. formats)
- `y_train`: training output, as a local filename
- `x_test`: test output, as a local filename
- `y_test`: test output, as a local filename
- `s3_dir` (required): S3 "directory" into which trained models and their metrics will be stored. The S3 bucket is provided by ComSensus.

`requirements.txt` file represents the libraries used in the Model Suite and is used during the provisioning of the JupyterHub service.

Below is a usage demonstration of the pipeline to classify MNIST data.

In [None]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist["data"], mnist["target"]
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

In [None]:
# Save data to local files, to be picked up by the pipeline
import numpy as np
with open('x_train', 'wb') as f:
    np.save(f, X_train)  
with open('y_train', 'wb') as f:
    np.save(f, y_train)  
with open('x_test', 'wb') as f:
    np.save(f, X_test)  
with open('y_test', 'wb') as f:
    np.save(f, y_test)

In [None]:
%run /srv/pipeline.py \
    --task 'classification' \
    --algorithms 'sgd' 'kneighbors' \
    --x_train 'x_train' \
    --y_train 'y_train' \
    --x_test 'x_test' \
    --y_test 'y_test' \
    --s3_dir 'test_run'

# Models and their metrics will be saved to S3.
# For above example, at 's3://comsensus-jupyterhub/test_run/'