<img src="https://raw.githubusercontent.com/determined-ai/determined/master/determined-logo.png" align='right' width=150 />

# Hackathon 2020: Old School ML on Determined

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/SpaCy_logo.svg/1200px-SpaCy_logo.svg.png" width=400 />


This notebook walks through NER model development with spaCy, a popular open source NLP library that, like Huggingface, offers pretrained models as well as model training for custom models.

We use the MITMovie dataset to extract named entities (actors, genres, directors, etc) from text. Annotated examples look like:

```
O	show
O	me
O	films
O	with
B-ACTOR	drew
I-ACTOR	barrymore
O	from
O	the
B-YEAR	1980s
```

## Our First Experiment

For our first example, we run a simple single-GPU training job with fixed hyperparameters.

In [None]:
!det e create const.yaml .

## Run Hyperparameter Tuning

By simply building a config file and adapting our code to meet the determined trial interface, we can conduct a sophisticated hyperparamter search.  Instructions for how to configure different types of experiments [can be found in the Determined documentation.](https://docs.determined.ai/latest/how-to/index.html)  This experiment optimizes dropout in the model plus Adam optimizer hyperparameters

In [None]:
!cat adaptive.yaml

## Create your Experiment

Now that you've described your experiment, you'll simply need to use the command line interface to submit it to the Determined Cluster.  

In [None]:
!det experiment create search.yaml .

# Model Registry

After training, we'll want to actually use our model in some sort of system.  Determined provides a model registry to version your trained models, making them easy to retrieve for inference.

In [None]:
experiment_id = 1
MODEL_NAME = "spacy-ner-movies"

In [None]:
%%capture
from determined.experimental import Determined
checkpoint = Determined().get_experiment(experiment_id).top_checkpoint()
model = Determined().get_model(MODEL_NAME)
model.register_version(checkpoint.uuid)

# Inference

Once your model is versioned in the model registry, using that model for inference is straightforward:

In [None]:
model = Determined().get_model(MODEL_NAME)
checkpoint_path = '/run/determined/checkpoints/' + checkpoint.uuid
trial = model.get_version().load(path=checkpoint_path)
inference_model = trial.model

In [None]:
from predict import predict
predict(inference_model, 'boris karloff is a fantastic actor, hes such a beast in horror films. loved him as frankensteins monster in frankenstein')

In [None]:
predict(inference_model, 'horror! tom hanks has covid19!')

In [None]:
predict(inference_model, 'KubeFlow is a frankenstein mess of low quality loosely coupled tools')