In [2]:
import spacy
from spacy.pipeline.tok2vec import DEFAULT_TOK2VEC_MODEL

config = {"model": DEFAULT_TOK2VEC_MODEL}
nlp = spacy.load("en_core_web_sm")
#nlp.add_pipe("tok2vec", config=config)

### Observe the components

In [3]:
print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


### Select some component 

In [4]:
tok2vec = nlp.get_pipe("tok2vec")
print(tok2vec.__doc__)

Apply a "token-to-vector" model and set its outputs in the doc.tensor
    attribute. This is mostly useful to share a single subnetwork between multiple
    components, e.g. to have one embedding and CNN network shared between a
    parser, tagger and NER.

    In order to use the `Tok2Vec` predictions, subsequent components should use
    the `Tok2VecListener` layer as the tok2vec subnetwork of their model. This
    layer will read data from the `doc.tensor` attribute during prediction.
    During training, the `Tok2Vec` component will save its prediction and backprop
    callback for each batch, so that the subsequent components can backpropagate
    to the shared weights. This implementation is used because it allows us to
    avoid relying on object identity within the models to achieve the parameter
    sharing.
    


### 

print(nlp.pipe_names)
nlp.disable_pipe("tagger")
print(nlp.pipe_names)
nlp.enable_pipe("tagger")
print(nlp.pipe_names)

### observe_inputs and observe_outputs functions

We want to quickly find a way how to observe the inputs and outputs which some system takes. 


* How to disable and name the entities in the pipeline
* Print the name of components in the pipeline


In [6]:
print(nlp.__call__.__doc__)

Apply the pipeline to some text. The text can span multiple sentences,
        and can contain arbitrary whitespace. Alignment into the original string
        is preserved.

        text (Union[str, Doc]): If `str`, the text to be processed. If `Doc`,
            the doc will be passed directly to the pipeline, skipping
            `Language.make_doc`.
        disable (List[str]): Names of the pipeline components to disable.
        component_cfg (Dict[str, dict]): An optional dictionary with extra
            keyword arguments for specific components.
        RETURNS (Doc): A container for accessing the annotations.

        DOCS: https://spacy.io/api/language#call
        


### ENR on some text

In [16]:
import pandas as pd
from spacy import displacy


nlp = spacy.load("en_core_web_md")

with open("test.txt") as fp:
    text = fp.read().lower()

doc = nlp(text)

df = pd.DataFrame([], columns=["text", "label", "label_desc"])
for ent in doc.ents:  
    xs = [ent.text, ent.label_, spacy.explain(ent.label_)]
    xs = pd.Series(xs, index=df.columns).to_frame().T
    df = pd.concat([df, xs])

df.to_csv("a.csv", sep="\t", index=False)

In [17]:
displacy.render(doc, style="ent")