## Processing pipeline

### What happens when you call nlp?

1. The tokenizer is applied which turns the string of text into a Doc object. 
2. The Doc object passes through a series of pipeline components such as the tagger, then the parser and then the entity recognizer. 
3. Finally, the processed Doc is returned, so you can work with it.

### What are the built-in pipeline components ?

1. Tagger: Part-of-speech tagger -> sets token.tag attribute
2. Parser: Dependency Parser -> sets token.dep and token.head attributes. It is responsible for detecting sentences and base noun phrases, also known as noun chunks
3. Ner: Named entity recognizer -> adds the detected entities to the doc.ents property. It also sets entity type attributes on the tokens that indicate if a token is part of an entity or not.
4. textcat: Text classifier sets category labels that apply to the whole text. This is added to the doc.cats property.

**Text categories are very specific and are not included in the pre-trained models. However they can be trained when building your own system**

All info about what components make up the pipeline is contained in the meta JSON. This JSON defines things like:
1. Language and pipeline. 
2. Components to instantiate.

#### Methods to access meta info
- Names of the pipeline components -> nlp.pipe_names attribute.
- List of component name and component function tuples -> nlp.pipeline attribute.


In [1]:
import spacy

# Load the en_core_web_sm model
nlp = spacy.load("en_core_web_sm")

# Print the names of the pipeline components
print(nlp.pipe_names)

# Print the full pipeline of (name, component) tuples
print(nlp.pipeline)

['tagger', 'parser', 'ner']
[('tagger', <spacy.pipeline.pipes.Tagger object at 0x11d92d450>), ('parser', <spacy.pipeline.pipes.DependencyParser object at 0x11d7899f0>), ('ner', <spacy.pipeline.pipes.EntityRecognizer object at 0x11d789a60>)]
