# NLP training example
In this example, we'll train an NLP model for sentiment analysis of tweets using spaCy.

First we download spaCy language libraries.

In [1]:
!python -m spacy download en_core_web_sm

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


And import the boilerplate code.

In [2]:
from __future__ import unicode_literals, print_function

import boto3
import json
import numpy as np
import pandas as pd
import spacy

## Data prep

Download the dataset from S3

In [3]:
S3_BUCKET = "verta-strata"
S3_KEY = "english-tweets.csv"
FILENAME = S3_KEY

boto3.client('s3').download_file(S3_BUCKET, S3_KEY, FILENAME)

Clean and load data using our library.

In [4]:
import utils

data = pd.read_csv(FILENAME).sample(frac=1).reset_index(drop=True)
utils.clean_data(data)

data.head()

Unnamed: 0,text,sentiment
0,Raining. Cold. Working Morella St this mornin...,0
1,my daddy is so confused that I didn't eat my p...,1
2,didnt have 3g on my phone yesterday so i wasnt...,0
3,Crest white strips make your teeth hurt...ever...,0
4,David is sad for Jake. His dog got hit by a ca...,0


## Set up ModelDB
ModelDB organizes our work, and enables us to log and version metadata.

In [5]:
from verta import Client

client = Client('https://dev.verta.ai')
client.set_project('Tweet Classification')
client.set_experiment('SpaCy')
run = client.set_experiment_run()

set email from environment
set developer key from environment
connection successfully established
set existing Project: Tweet Classification from personal workspace
set existing Experiment: SpaCy
created new ExperimentRun: Run 992941584404899849182


We'll first record our code, configuration, dataset, and environment versions to a ModelDB repository.

In [6]:
repo = client.set_repository('Verta Strata')
commit = repo.get_commit(branch='master')

set existing Repository: Verta Strata from personal workspace


In [7]:
from verta.code import Notebook
from verta.configuration import Hyperparameters
from verta.dataset import S3
from verta.environment import Python

code_ver = Notebook()
config_ver = Hyperparameters({'n_iter': 20})
dataset_ver = S3("s3://{}/{}".format(S3_BUCKET, S3_KEY))
env_ver = Python()

commit.update("notebooks/tweet-analysis", code_ver)
commit.update("config/hyperparams", config_ver)
commit.update("data/tweets", dataset_ver)
commit.update("env/python", env_ver)
commit.save("Deployment-ready sentiment analysis")

commit

<IPython.core.display.Javascript object>

(Branch: master)
Commit 2a0e3de24446239e677e735ca97525cb7cd91eda633dc12e5307cfecddb37a80 containing:
config/hyperparams (Blob)
data/tweets (Blob)
env/python (Blob)
notebooks/tweet-analysis (Blob)

## Train the model
We'll use a pre-trained model from spaCy and fine tune it in our new dataset.

In [8]:
nlp = spacy.load('en_core_web_sm')

Update the model with the current data using our library.

In [9]:
import training

training.train(nlp, data, n_iter=20)

Using 16000 examples (12800 training, 3200 evaluation)
Training the model...
LOSS 	  P  	  R  	  F  
16.023	0.726	0.694	0.710
0.366	0.751	0.721	0.736
0.107	0.756	0.730	0.743
0.095	0.758	0.727	0.743
0.083	0.760	0.718	0.738
0.071	0.757	0.727	0.741
0.061	0.760	0.730	0.744
0.051	0.765	0.727	0.745
0.046	0.751	0.724	0.737
0.036	0.758	0.732	0.745
0.033	0.752	0.731	0.741
0.029	0.753	0.734	0.743
0.026	0.744	0.723	0.734
0.023	0.750	0.731	0.740
0.021	0.745	0.724	0.734
0.021	0.747	0.728	0.737


KeyboardInterrupt: 

## Save and version the model
We log the model itself as an artifact to ModelDB.

In [None]:
run.log_model(nlp)

And finally, link the commit to our Experiment Run.

In [None]:
run.log_commit(
    commit,
    {
        'notebook': "notebooks/tweet-analysis",
        'hyperparameters': "config/hyperparams",
        'training_data': "data/tweets",
        'python_env': "env/python",
    },
)

## Deployment

Great! Now you have a model that you can use to run predictions against. Follow the next step of this tutorial to see how to do it.