This Python script conducts Part-of-Speech (POS) analysis on a given text describing the Faculty of Mathematics and Informatics at Vilnius University. It utilizes both SpaCy and NLTK libraries to perform various POS tagging tasks, providing insights into the linguistic properties and structures of the text.

1.Printing POS Tags for the Third Sentence:
Utilizes SpaCy to print the units (tokens) of the third sentence along with their POS tags, fine-grained TAG tags, and their descriptions.

2.Providing POS Tag Frequencies:
Calculates and lists the frequencies of POS tags in the text.

3.Printing Adjectives:
Prints all adjectives found in the text.

4.Printing Prepositions:
Prints all prepositions found in the text.

5.Calculating Percentage of Verbs:
Determines the percentage of tokens representing verbs in the text.

6.Printing Tokens and POS Tags of the Third Sentence (Using NLTK):
Prints the tokens and their POS tags for the third sentence using NLTK.

7.Removing Cardinal Numbers:
Removes all cardinal numbers from the given text using NLTK.

In [3]:
import spacy

text="The Faculty of Mathematics and Informatics of Vilnius University delivers teaching and research through four institutes: Data Science and Digital Technologies, Computer Science, Mathematics, and Applied Mathematics. The Faculty offers 9 bachelor’s and 6 master’s courses. Doctoral studies are conducted in the areas of informatics, computer engineering, and mathematics. As of May 2018, the Dean of the Faculty is Associate Professor Paulius Drungilas, a long-time academic at Vilnius University and one of the University’s youngest professors. According to Professor Drungilas, more than 90 % of all students at MIF are state-financed, and are thus the best secondary education graduates of their generation."

#### 1. Print the units of the third sentence, the Part-of-Speech (POS) tag, the fine-grained TAG tag, and their description. Perform the task using the SpaCy library

In [11]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
third_sentence = list(doc.sents)[2]

for token in third_sentence:
    print(f"{token.text.ljust(12)} {token.pos_.ljust(8)} {token.tag_.ljust(6)} {spacy.explain(token.tag_)}")


Doctoral     ADJ      JJ     adjective (English), other noun-modifier (Chinese)
studies      NOUN     NNS    noun, plural
are          AUX      VBP    verb, non-3rd person singular present
conducted    VERB     VBN    verb, past participle
in           ADP      IN     conjunction, subordinating or preposition
the          DET      DT     determiner
areas        NOUN     NNS    noun, plural
of           ADP      IN     conjunction, subordinating or preposition
informatics  NOUN     NNS    noun, plural
,            PUNCT    ,      punctuation mark, comma
computer     NOUN     NN     noun, singular or mass
engineering  NOUN     NN     noun, singular or mass
,            PUNCT    ,      punctuation mark, comma
and          CCONJ    CC     conjunction, coordinating
mathematics  NOUN     NNS    noun, plural
.            PUNCT    .      punctuation mark, sentence closer


#### 2. Provide the list of POS tag frequencies in the text

In [14]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
pos_frequencies = doc.count_by(spacy.attrs.POS)

for pos in sorted(pos_frequencies):
    pos_text = doc.vocab[pos].text
    print(f"{pos_text.ljust(8)} : {pos_frequencies[pos]}")

ADJ      : 6
ADP      : 15
ADV      : 1
AUX      : 4
CCONJ    : 8
DET      : 9
NOUN     : 21
NUM      : 7
PART     : 2
PRON     : 1
PROPN    : 28
PUNCT    : 17
VERB     : 5


#### 3. Print all adjectives in the text

In [15]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
adjectives = [token.text for token in doc if token.pos_ == "ADJ"]
print(adjectives)

['Doctoral', 'long', 'youngest', 'more', 'best', 'secondary']


#### 4. Print all prepositions in the text

In [16]:

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
prepositions = [token.text for token in doc if token.pos_ == "ADP"]
print(prepositions)

['of', 'of', 'through', 'in', 'of', 'As', 'of', 'of', 'at', 'of', 'to', 'than', 'of', 'at', 'of']


#### 5. What percentage of tokens are verbs?

In [17]:
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
verb_count = len([token for token in doc if token.pos_ == "VERB"])
percentage = (verb_count / len(doc)) * 100
print(f"{percentage:.2f}%")

4.03%


#### 6. Print the tokens and POS tags of the third sentence. Perform the task using the NLTK library

In [18]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk import pos_tag

sentences = sent_tokenize(text)
pos_tags_third_sentence = pos_tag(word_tokenize(sentences[2]))
print(pos_tags_third_sentence)

[('Doctoral', 'JJ'), ('studies', 'NNS'), ('are', 'VBP'), ('conducted', 'VBN'), ('in', 'IN'), ('the', 'DT'), ('areas', 'NNS'), ('of', 'IN'), ('informatics', 'NNS'), (',', ','), ('computer', 'NN'), ('engineering', 'NN'), (',', ','), ('and', 'CC'), ('mathematics', 'NNS'), ('.', '.')]


#### 7. Remove all cardinal numbers from the given text

In [19]:
from nltk.tokenize import word_tokenize
from nltk import pos_tag

pos_tags_text = pos_tag(word_tokenize(text))
text_without_numbers = " ".join(word for word, pos in pos_tags_text if pos != 'CD')
print(text_without_numbers)

The Faculty of Mathematics and Informatics of Vilnius University delivers teaching and research through institutes : Data Science and Digital Technologies , Computer Science , Mathematics , and Applied Mathematics . The Faculty offers bachelor ’ s and master ’ s courses . Doctoral studies are conducted in the areas of informatics , computer engineering , and mathematics . As of May , the Dean of the Faculty is Associate Professor Paulius Drungilas , a long-time academic at Vilnius University and of the University ’ s youngest professors . According to Professor Drungilas , more than % of all students at MIF are state-financed , and are thus the best secondary education graduates of their generation .
