# NLP training example
In this example, we'll train an NLP model for sentiment analysis of tweets using spaCy.

First we download spaCy language libraries.

In [1]:
!python -m spacy download en_core_web_sm
!python -m spacy download xx_ent_wiki_sm

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('xx_ent_wiki_sm')


And import the boilerplate code.

In [2]:
from __future__ import unicode_literals, print_function

import boto3
import json
import numpy as np
import pandas as pd
import spacy

## Data prep

Download the dataset from S3

In [3]:
S3_BUCKET = "verta-strata"
S3_KEY = "english-tweets.csv"
FILENAME = S3_KEY

boto3.client('s3').download_file(S3_BUCKET, S3_KEY, FILENAME)

Clean and load data using our library.

In [4]:
import utils

data = pd.read_csv(FILENAME).reset_index(drop=True)
utils.clean_data(data)

data.head()

Unnamed: 0,text,sentiment
0,it's such a pretty day today. Getting ready t...,1
1,it was the highlight of 2 years going to shows...,0
2,EXACTLY. i hate from 18-20 you have these two ...,0
3,Hyun Joong is even going to cook the chicken a...,0
4,I think a second trip to ikea in a week is needed,1


## Train the model
We'll use a pre-trained model from spaCy and fine tune it in our new dataset.

In [5]:
nlp = spacy.load('en_core_web_sm')

Update the model with the current data using our library.

In [6]:
import training

training.train(nlp, data[:100])

Using 16000 examples (80 training, 20 evaluation)
Training the model...
LOSS 	  P  	  R  	  F  
0.632	0.550	1.000	0.710
0.558	0.786	1.000	0.880
0.199	0.700	0.636	0.667
0.009	0.625	0.455	0.526
0.015	0.615	0.727	0.667
0.000	0.667	0.545	0.600
0.001	0.600	0.545	0.571
0.000	0.625	0.455	0.526
0.002	0.625	0.455	0.526
0.000	0.625	0.455	0.526


Now we save the model back into S3 to a well known location so that we can fetch it later.

In [7]:
filename = "/tmp/model.spacy"
with open(filename, 'wb') as f:
    f.write(nlp.to_bytes())

In [8]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model.spacy")

In [12]:
filename = "/tmp/model_metadata.json"
with open(filename, 'w') as f:
    f.write(json.dumps(nlp.meta))

In [13]:
boto3.client('s3').upload_file(filename, S3_BUCKET, "models/01/model_metadata.json")

## Deployment

Great! Now you have a model that you can use to run predictions against. Follow the next step of this tutorial to see how to do it.