In [1]:
import spacy

In [2]:
nlp=spacy.load("en_core_web_sm")

In [3]:
doc=nlp(u"The quick brown fox jumped over the lazy dog's back")

In [4]:
print(doc.text)

The quick brown fox jumped over the lazy dog's back


In [6]:
print(doc[4].pos_)

VERB


In [7]:
print(doc[4].tag_)

VBD


In [10]:
for token in doc:
  print(f"{token.text} {token.pos_} {token.tag_} {spacy.explain(token.tag_)}")

The DET DT determiner
quick ADJ JJ adjective (English), other noun-modifier (Chinese)
brown ADJ JJ adjective (English), other noun-modifier (Chinese)
fox NOUN NN noun, singular or mass
jumped VERB VBD verb, past tense
over ADP IN conjunction, subordinating or preposition
the DET DT determiner
lazy ADJ JJ adjective (English), other noun-modifier (Chinese)
dog NOUN NN noun, singular or mass
's PART POS possessive ending
back NOUN NN noun, singular or mass


In [11]:
doc1=nlp(u"I read books on NLP")

In [12]:
word=doc1[1]

In [13]:
word.text

'read'

In [14]:
spacy.explain(word.tag_)

'verb, non-3rd person singular present'

In [15]:
doc2=nlp(u"I read a book on NLP")

In [17]:
word=doc2[1]

In [18]:
spacy.explain(word.tag_)

'verb, past tense'

In [19]:
doc=nlp(u"The quick brown fox jumped over the lazy dog's back")

doc: doc is assumed to be a processed Doc object obtained from spaCy's NLP pipeline. The Doc object represents the result of processing a text or document using the spaCy NLP model. It contains information about the processed text, such as individual tokens, their POS tags, named entities, and more.

spacy.attrs.POS: spacy.attrs.POS refers to the attribute ID for the part-of-speech tag of a token in spaCy. It is an integer constant that represents the unique ID of the part-of-speech attribute in spaCy's vocabulary. When you access this attribute, you are requesting spaCy to provide the part-of-speech tag for each token in the document.

count_by(): count_by() is a method of the Doc object in spaCy. It is used to calculate the frequency of specific attributes across all tokens in the Doc. In this case, we are passing spacy.attrs.POS as an argument to count_by(), indicating that we want to count the occurrences of each part-of-speech tag in the document.

POS_counts: After executing the code, POS_counts will be a Python dictionary that contains the counts of each part-of-speech tag found in the document. The keys of the dictionary are the part-of-speech tag IDs (represented by integers), and the values are the corresponding counts.

In [21]:
POS_counts=doc.count_by(spacy.attrs.POS)

In [22]:
POS_counts

{90: 2, 84: 3, 92: 3, 100: 1, 85: 1, 94: 1}

In [23]:
doc.vocab[83].text

'LANG'

In [24]:
doc[2].pos

84

In [25]:
for k,v in sorted(POS_counts.items()):
  print(f"{k} {doc.vocab[k].text:{5}} {v}")

84 ADJ   3
85 ADP   1
90 DET   2
92 NOUN  3
94 PART  1
100 VERB  1


In [27]:
TAG_counts=doc.count_by(spacy.attrs.TAG)
for k,v in sorted(TAG_counts.items()):
  print(f"{k} {doc.vocab[k].text:{5}} {v}")

74 POS   1
1292078113972184607 IN    1
10554686591937588953 JJ    3
15267657372422890137 DT    2
15308085513773655218 NN    3
17109001835818727656 VBD   1


In [28]:
len(doc.vocab)

791

Visualizing Part Of Speech

In [29]:
from spacy import displacy

In [30]:
displacy.render(doc,style='dep',jupyter=True)