## Custom pipeline components

spaCY lets you add your own function to it's inbuilt pipeline. These added in components are executed automatically when you call the nlp object on a text. You can even add your own metadata to documents and tokens and update built-in attributes.

Components can be added to the pipeline using the nlp.add_pipe method which takes the component you want to add as a  parameter. Position in the pipeline can be specified by passing:
- Boolean values to the function: first= True last = True
- Specifying explicitly where you want to execute the custom component wrt spaCY's inbuilt component: before="ner", after="tagger". If the specified component does not exist, spaCY will raise an error.

**Note**: Custom components are added to the pipeline after the language class is already initialized and after tokenization. They are can also only modify the Doc and can’t be used to update weights of other components directly. 

### Simple component Example

In [1]:
import spacy

# Define the custom component
def length_component(doc):
    # Get the doc's length
    doc_length = len(doc)
    print("This document is {} tokens long.".format(doc_length))
    # Return the doc
    return doc


# Load the small English model
nlp = spacy.load("en_core_web_sm")

# Add the component first in the pipeline and print the pipe names
nlp.add_pipe(length_component)
print(nlp.pipe_names)

# Process a text
doc = nlp("Short sentence")

['tagger', 'parser', 'ner', 'length_component']
This document is 2 tokens long.


### Complex component example

Create a custom component that uses the PhraseMatcher to find animal names in the document and adds the matched spans to the doc.ents.

In [2]:
import spacy
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span

nlp = spacy.load("en_core_web_sm")
characters = ["Harry Potter", "Ron Weasley", "Hermione Granger"]
potter_patterns = list(nlp.pipe(characters))
print("Potter characters:", potter_patterns)
matcher = PhraseMatcher(nlp.vocab)
matcher.add("POTTER", None, *potter_patterns)

# Define the custom component
def harry_potter_people(doc):
    # Apply the matcher to the doc
    matches = matcher(doc)
    # Create a Span for each match and assign the label 'ANIMAL'
    spans = [Span(doc, start, end, label=match_id) for match_id, start, end in matches]
    # Overwrite the doc.ents with the matched spans
    doc.ents = spans
    return doc


# Add the component to the pipeline after the 'ner' component
nlp.add_pipe(harry_potter_people, after="ner")
print(nlp.pipe_names)

# Process the text and print the text and label for the doc.ents
doc = nlp("Harry Potter and Hermione Granger are a couple ? Impossible!")
print([(ent.text, ent.label) for ent in doc.ents])

Potter characters: [Harry Potter, Ron Weasley, Hermione Granger]
['tagger', 'parser', 'ner', 'harry_potter_people']
[('Harry Potter', 4957663859231939894), ('Hermione Granger', 4957663859231939894)]
