In [1]:
import spacy

In [2]:
nlp = spacy.load('en_core_web_sm')

In [42]:
# Create a Doc object with a unicode string (u-string)
doc = nlp(u"SpaCy is a library for advanced Natural Language Processing in Python \
and Cython. It's built on the very latest research, and was designed from day \
one to be used in real products. SpaCy comes with pretrained pipelines and currently \
supports tokenization and training for 60+ languages. It features state-of-the-art \
speed and neural network models for tagging, parsing, named entity recognition, \
text classification and more, multi-task learning with pretrained transformers \
like BERT, as well as a production-ready training system and easy model packaging, \
deployment and workflow management. SpaCy is commercial open-source software, released \
under the MIT license.")

# Print each token separately
for token in doc:
    print(f"Actual text: {token.text:{10}} Part of Speech: {token.pos_:{10}} Syntatic dependency: {token.dep_:{10}}")

Actual text: SpaCy      Part of Speech: PROPN      Syntatic dependency: nsubj     
Actual text: is         Part of Speech: VERB       Syntatic dependency: ROOT      
Actual text: a          Part of Speech: DET        Syntatic dependency: det       
Actual text: library    Part of Speech: NOUN       Syntatic dependency: attr      
Actual text: for        Part of Speech: ADP        Syntatic dependency: prep      
Actual text: advanced   Part of Speech: ADJ        Syntatic dependency: amod      
Actual text: Natural    Part of Speech: PROPN      Syntatic dependency: compound  
Actual text: Language   Part of Speech: PROPN      Syntatic dependency: compound  
Actual text: Processing Part of Speech: PROPN      Syntatic dependency: pobj      
Actual text: in         Part of Speech: ADP        Syntatic dependency: prep      
Actual text: Python     Part of Speech: PROPN      Syntatic dependency: pobj      
Actual text: and        Part of Speech: CCONJ      Syntatic dependency: cc        
Actu

Checking to see what components are currently existing within the nlp pipeline

In [43]:
nlp.pipeline

[('tagger', <spacy.pipeline.Tagger at 0x19cdb7ab708>),
 ('parser', <spacy.pipeline.DependencyParser at 0x19cdb7aa528>),
 ('ner', <spacy.pipeline.EntityRecognizer at 0x19cdb7aaac8>)]

In [44]:
nlp.pipe_names

['tagger', 'parser', 'ner']

If we are not sure what the abbreviation of the "Part of speech' or the 'Syntatic dependency' is, we can use the .explain() method to get a better understanding:

In [45]:
print(f"Part of speech: {spacy.explain(doc[0].pos_)}\n"
      f"Syntatic Dependency: {spacy.explain(doc[0].dep_)}")

Part of speech: proper noun
Syntatic Dependency: nominal subject


spaCy is also able to detect and separate sentences in a Doc object.

In [51]:
for i, sentence in enumerate(doc.sents):
    print(f"{i+1}. {sentence}")

1. SpaCy is a library for advanced Natural Language Processing in Python and Cython.
2. It's built on the very latest research, and was designed from day one to be used in real products.
3. SpaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages.
4. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management.
5. SpaCy is commercial open-source software, released under the MIT license.
