In this chapter, we will learn how to train spaCy text classifier component, **Text Categorizer**. Will integrate **Tensorflow with spaCy**

Will Learn: 

* [**Understanding the basics of text classification**](#Understanding-the-basics-of-text-classification)
* [**Training the spaCy text classifier**](#Training-the-spaCy-text-classifier)
* [**Sentiment analysis with spaCy**](#Sentiment-analysis-with-spaCy)
* Text classification with spaCy and Keras

## Understanding the basics of text classification

Text classifiers can come in different flavors. Some classifiers focus on the
overall emotion of the text, some classifiers focus on detecting the language
of the text, and some classifiers focus on only some words of the text, such as
verbs. The following are some of the most common types of text
classification and their use cases:


**Topic Detection**Topic detection is the task of understanding the topic of
a given text. For example, the text in a customer email could be asking
about a refund, asking for a past bill, or simply complaining about the
customer service.

**Sentiment analysis:** Sentiment analysis is the task of understanding
whether the text contains positive or negative emotions about a given
subject. Sentiment analysis is used often to analyze customer reviews
about products and services.

**Language detection:** Language detection is the first step of many NLP
systems, such as machine translation.


<center><img src="images/intro.png" width="600"/></center>

Baisc of the text classification, everybody knows. .. .. 

## Training the spaCy text classifier

We saw that in chapter2 and 3 having core operations such as POS, NER, dependency parser and so on in the spaCy **pipeline**, in the pipeline we have a TextCategorizer component. We will use that here. In order to train the TextCategorizer, we need to give training examples and testing examples. 

First we need to add this pipeline to the NLP. TextCategorizer component lies in the
NLP pipeline; this component comes after the essential components. In the
following diagram, textcat refers to the TextCategorizer component.


<center><img src="images/cat.png" width="600"/></center>

The amazing thing is it, uses the neural network architecutre, we don't need to deal with neural networks (this is a user friendly code apporach): 

### Getting to know TextCategorizer class 



In [2]:
# let's import the text categorizer 
import spacy 
import en_core_web_md 
from spacy.pipeline.textcat import DEFAULT_SINGLE_TEXTCAT_MODEL  # it's for single lablel classifier 
from spacy.pipeline.textcat_multilabel import DEFAULT_MULTI_TEXTCAT_MODEL  # this is for multi label classification 

# Next we need to provide a configuration to the Textcat, we provide two parameters here. 
# 1. Threshold value 
# 2. Model Name 

config1 = { 
    "threshold" : 0.5,  
    "model" : DEFAULT_SINGLE_TEXTCAT_MODEL
}

config2 = {
    "threshold" : 0.5, 
    "model" : DEFAULT_MULTI_TEXTCAT_MODEL
}
    
nlp = en_core_web_md.load()

textcat1 = nlp.add_pipe("textcat", config = config1) 
textcat1

<spacy.pipeline.textcat.TextCategorizer at 0x149c554c820>

In [3]:
train_data = [ 
    ("I loved this product, very easy to use.", {"cats": {"sentiment": 1}}),
    ("I'll definitely purchase again. I recommend this product.",
    {"cats": {"sentiment": 1}}),
    ("This is the best product ever. I loved the scent and the feel. Will buy again.", {"cats": {"sentiment": 1}}),
    ("Disappointed. This product didn't work for me at all",{"cats": {"sentiment": 0}}),
    ("I hated the scent. Won't buy again", {"cats": {"sentiment": 0}}),
    ("Truly horrible product. Very few amount of product for a high price. Don't recommend.", {"cats": {"sentiment": 0}})
]

```Python 
import random
from spacy.training import Example 
from spacy.pipeline.textcat_multilabel import DEFAULT_MULTI_TEXTCAT_MODEL 

nlp = spacy.load('en_core_web_md') 
config = { "threshold":0.7, "model": DEFAULT_MULTI_TEXTCAT_MODEL } 
textcat = nlp.add_pipe("textcat", config = config) 

# add label to the pipeline 
textcat.add_label("POS")
textcat.add_label("NEG")

# adding training examples to the model 
train_examples = [Example.from_dict(nlp.make_doc(text), label) for text, label in train_data]

# initializing the weight of the train_examples
textcat.initialize(lambda: train_examples, nlp = nlp)


# training loop 
epochs = 20 

# we need to train only the "textcat" pipe, so we disable all other pipe 
with nlp.select_pipes(enable = "textcat"): 
    optimizer = nlp.resume_training()  # it helps to keeps the weight of the existing statistical model 
    
    for i in range(epochs): 
        random.shuffle(train_data)
        
        for text, label in train_data: 
            doc = nlp.make_doc(text) 
            
            example = Example.from_dict(doc, label) 
            nlp.update( [example], sgd = optimizer)
```

You can use this code, it's works fine

## Sentiment analysis with spaCy

I know all the thing they have mentioned in chapter 8, So, I am moving here. 

[**Codes**](https://www.kaggle.com/code/aravindanr22052001/notebook6cf4272bc2/edit)