# NLP training example
In this example, we'll train an NLP model for sentiment analysis of tweets using spaCy.

First we download spaCy language libraries.

In [1]:
!python -m spacy download en_core_web_sm

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')


And import the boilerplate code.

In [2]:
from __future__ import unicode_literals, print_function

import boto3
import json
import numpy as np
import pandas as pd
import spacy

## Data prep

Download the dataset from S3.

In [3]:
S3_BUCKET = "verta-strata"
S3_KEY = "english-tweets.csv"
FILENAME = S3_KEY

boto3.client('s3').download_file(S3_BUCKET, S3_KEY, FILENAME)

Clean and load data using our library.

In [4]:
import utils

data = pd.read_csv(FILENAME).sample(frac=1).reset_index(drop=True)
utils.clean_data(data)

data.head()

Unnamed: 0,text,sentiment
0,"no, it's just bleurgh",1
1,YAY awesome news. I love Gavin and Stacey! Als...,0
2,so ready for the states!! cant wait to take off..,1
3,lazy bum! hehe! Where do you work?,1
4,I would like to say good morning tweets!! If y...,0


## Train the model
We'll use a pre-trained model from spaCy and fine tune it in our new dataset.

In [5]:
nlp = spacy.load('en_core_web_sm')

Update the model with the current data using our library.

In [6]:
import training

training.train(nlp, data, n_iter=20)

Using 16000 examples (12800 training, 3200 evaluation)
Training the model...
LOSS 	  P  	  R  	  F  
15.932	0.754	0.718	0.736
0.367	0.746	0.750	0.748
0.110	0.755	0.744	0.749
0.096	0.761	0.741	0.751
0.085	0.759	0.740	0.749
0.073	0.758	0.733	0.745
0.062	0.748	0.731	0.740
0.052	0.743	0.722	0.733
0.046	0.748	0.725	0.736
0.039	0.751	0.718	0.734
0.032	0.744	0.720	0.732
0.031	0.743	0.719	0.731
0.026	0.740	0.719	0.730
0.023	0.738	0.718	0.728
0.022	0.729	0.713	0.721
0.019	0.728	0.716	0.722
0.019	0.734	0.717	0.726
0.018	0.734	0.718	0.726
0.017	0.733	0.718	0.726
0.016	0.732	0.717	0.724


Now we save the model back into S3 to a well known location (make sure it's a location you can write to!) so that we can fetch it later.

In [7]:
filename = "/tmp/model.spacy"
with open(filename, 'wb') as f:
    f.write(nlp.to_bytes())

In [8]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model.spacy")

In [9]:
filename = "/tmp/model_metadata.json"
with open(filename, 'w') as f:
    f.write(json.dumps(nlp.meta))

In [10]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model_metadata.json")

## Deployment

Great! Now you have a model that you can use to run predictions against. Follow the next step of this tutorial to see how to do it.