# Solving the Definition Extraction Problem
## Approach 4: Using Spacy's Text Classifier.
In this approach, we decided to give **Spacy's amazing Models Pipeline** a shot. Here is a summary of what are spaCy’s models from the [Spacy Docs](https://spacy.io/usage/training#basics):

- They are statistical and every “decision” they make is a prediction. This prediction is based on the examples the model has seen during training. To train a model, you first need training data. 


- The model is then shown the unlabelled text and will make a prediction then we give the model feedback on its prediction in the form of an error gradient of the loss function that calculates the difference between the training example and the expected output. The greater the difference, the more significant the gradient and the updates to our model.


- We want the model to come up with a theory that can be generalized across other examples. If you only test the model with the data it was trained on, you’ll have no idea how well it’s generalizing. So, that is why we also need evaluation data to test our model.

![](https://spacy.io/training-73950e71e6b59678754a87d6cf1481f9.svg)


In [None]:
#imports cell
from source.data_loader import DeftCorpusLoader
from source.classifiers import DeftSpacyClassifier

### Adding a text classifier to a spaCy model
We followed the step-by-step guide from [spacy's example](https://spacy.io/usage/training#textcat), to make our own implementation of Spacy's Text Classififer for Deft Corpus.

**What do we call it ? Duhh....the `DeftSpacyClassifier`!** 

In [None]:
positive = "DEFINTION"
negative = "NOT DEFINITION"
deft_classifier = DeftSpacyClassifier(positive_label= positive, negative_label= negative)

### Loading dataset and adjusting it's labels for Spacy Format
We load the dataset as everytime the main difference now is that we have to preform an extra step. We have to change the label format to match the Spacy Labeling Format. Instead of a binary vector for labels we will have for each label value a dict indicating whether this instance is a defintion or not.

Example: {"DEFINITION": True, "NOT DEFINITION": False}

In [None]:
deft_loader = DeftCorpusLoader("deft_corpus/data")
trainframe, devframe = deft_loader.load_classification_data(preprocess=True, clean=True)
train_cats = [{positive: bool(y), negative: not bool(y)} for y in trainframe["HasDef"]]
dev_cats = [{positive: bool(y), negative: not bool(y)} for y in devframe["HasDef"]]

### Start the training loop

In [None]:
deft_classifier.fit(trainframe["Sentence"], devframe["Sentence"],
                   train_cats, dev_cats,output_dir='./models/spacy-model')

### Reporting Full details of Evaluation Score on dev data

In [None]:
deft_classifier.score(devframe["Sentence"], dev_cats)