## Parts Of Speech
### Breaking down large texts into parts of speech and sentences.
#### Loading imports

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

#### Creating a Doc object from the file peterrabbit.txt

In [3]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())


#### Printing the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag for every token in the third sentence

In [26]:
# Enter your code here:
sentences = list(doc.sents)
second = sentences[2]
for token in second:
    if(len(token.text.strip()) != 0):
        print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_):{10}}")


Flopsy     PROPN      NNP        noun, proper singular
,          PUNCT      ,          punctuation mark, comma
Mopsy      PROPN      NNP        noun, proper singular
,          PUNCT      ,          punctuation mark, comma
Cotton     PROPN      NNP        noun, proper singular
-          PUNCT      HYPH       punctuation mark, hyphen
tail       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
and        CCONJ      CC         conjunction, coordinating
Peter      PROPN      NNP        noun, proper singular
.          PUNCT      .          punctuation mark, sentence closer


#### Providing a frequency list of POS tags from the entire document

In [37]:
for k, v in sorted(doc.count_by(spacy.attrs.POS).items()):
    print(f"{k:{3}}.  {doc.vocab[k].text:{5}}: {v:{5}}")

 84.  ADJ  :    50
 85.  ADP  :   123
 86.  ADV  :    67
 87.  AUX  :    48
 89.  CCONJ:    61
 90.  DET  :   118
 92.  NOUN :   171
 93.  NUM  :     8
 94.  PART :    29
 95.  PRON :    81
 96.  PROPN:    73
 97.  PUNCT:   174
 98.  SCONJ:    20
100.  VERB :   136
103.  SPACE:    99


#### Finding the percentage of tokens that are nouns

In [41]:
doc.count_by(spacy.attrs.POS).values

<function dict.values>

In [45]:
doc.count_by(spacy.attrs.POS)[92]/sum(doc.count_by(spacy.attrs.POS).values())

0.1359300476947536

#### Displaying the Dependency Parse for the third sentence

In [54]:
displacy.render(list(doc.sents)[5],style= "ent", jupyter=True)

#### Showing the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit*

In [59]:
for ent in doc.ents[0:2]:
    print(f"{ent.text:{15}} -{ent.label_:{10}} -{spacy.explain(ent.label_):{30}}")

Peter Rabbit    -PERSON     -People, including fictional   
Beatrix Potter  -PERSON     -People, including fictional   


#### Counting the total number of sentences are contained in *The Tale of Peter Rabbit*

In [62]:
len(list(doc.sents))

68

#### Finding how many sentences contain named entities

In [67]:
count = 0
for sent in doc.sents:
    if(len(list(nlp(sent.text).ents))>0):
        count+=1
print(count)

40


#### Displaying the named entity visualization for `list_of_sents[0]` from the previous problem

In [69]:
displacy.render(nlp(list(doc.sents)[0].text), style = "ent", jupyter = True)