# SpaCy Pipelines

![pipeline](images/pipeline.png)

First, the tokenizer is applied to turn the string of text into a Doc object. Next, a series of pipeline components is applied to the doc in order. In this case, the tagger, then the parser, then the entity recognizer. Finally, the processed doc is returned, so you can work with it.

**Built-in Pipeline Components**

| Name        | Description             | Creates                                                   |
| ----------- | ----------------------- | --------------------------------------------------------- |
| **tagger**  | Part-of-speech tagger   | `Token.tag`, `Token.pos`                                  |
| **parser**  | Dependency parser       | `Token.dep`, `Token.head`, `Doc.sents`, `Doc.noun_chunks` |
| **ner**     | Named entity recognizer | `Doc.ents`, `Token.ent_iob`, `Token.ent_type`             |
| **textcat** | Text classifier         | `Doc.cats`                                                |

spaCy ships with the following built-in pipeline components.

* The part-of-speech tagger sets the token.tag and token.pos attributes.
* The dependency parser adds the token.dep and token.head attributes and is also responsible for detecting sentences and base noun phrases, also known as noun chunks.
* The named entity recognizer adds the detected entities to the doc.ents property. It also sets entity type attributes on the tokens that indicate if a token is part of an entity or not.
* Finally, the text classifier sets category labels that apply to the whole text, and adds them to the doc.cats property.
* Because text categories are always very specific, the text classifier is not included in any of the pre-trained models by default. But you can use it to train your own system.

All models include a `meta.json` file:
* Pipeline defined in model's meta.json in order
* Built-in components need binary data to make predictions
