# Basic NLP Course

## Part-of-Speech (POS) Tagging

Part-of-Speech (POS) tagging is a process in Natural Language Processing (NLP) that involves assigning a part of speech to each word in a text, such as noun, verb, adjective, etc.

- **Definition**: POS tagging helps in understanding the grammatical structure of a sentence and the relationships between words.
- **Example**: In the sentence "The quick brown fox jumps over the lazy dog," POS tagging assigns:
    - "The" → Determiner
    - "quick" → Adjective
    - "brown" → Adjective
    - "fox" → Noun
    - "jumps" → Verb
    - "over" → Preposition
    - "the" → Determiner
    - "lazy" → Adjective
    - "dog" → Noun

In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")

In [2]:
qa_report = """
Quality Assurance Analysis and Report
=====================================

Summary:
---------
The quality assurance analysis was conducted to evaluate the performance, reliability, and compliance of the system with the defined standards and requirements.

Findings:
---------
1. Performance:
    - The system meets the performance benchmarks in 85% of the test cases.
    - Identified latency issues in module X during peak load conditions.

2. Reliability:
    - The system demonstrated 99.5% uptime over the evaluation period.
    - A rare edge case caused a crash in module Y.

3. Compliance:
    - The system complies with 95% of the defined standards.
    - Minor deviations were observed in data encryption protocols.

Actions:
---------
1. Optimize module X to handle peak load conditions more efficiently.
2. Investigate and resolve the crash in module Y for the identified edge case.
3. Review and update data encryption protocols to ensure full compliance with standards.
4. Conduct a follow-up quality assurance analysis after implementing the above actions.

Conclusion:
-----------
The system demonstrates high reliability and compliance with minor areas for improvement. Addressing the identified issues will further enhance the system's quality and performance.
"""

print(qa_report)


Quality Assurance Analysis and Report

Summary:
---------
The quality assurance analysis was conducted to evaluate the performance, reliability, and compliance of the system with the defined standards and requirements.

Findings:
---------
1. Performance:
    - The system meets the performance benchmarks in 85% of the test cases.
    - Identified latency issues in module X during peak load conditions.

2. Reliability:
    - The system demonstrated 99.5% uptime over the evaluation period.
    - A rare edge case caused a crash in module Y.

3. Compliance:
    - The system complies with 95% of the defined standards.
    - Minor deviations were observed in data encryption protocols.

Actions:
---------
1. Optimize module X to handle peak load conditions more efficiently.
2. Investigate and resolve the crash in module Y for the identified edge case.
3. Review and update data encryption protocols to ensure full compliance with standards.
4. Conduct a follow-up quality assurance analysis aft

In [3]:
doc = nlp(qa_report)

for token in doc:
    print(f"{token.text:10} {token.pos_:10} {token.dep_:10} {token.head.text:10}")


          SPACE      dep        Analysis  
Quality    PROPN      compound   Analysis  
Assurance  PROPN      compound   Analysis  
Analysis   PROPN      nsubjpass  conducted 
and        CCONJ      cc         Analysis  
Report     PROPN      conj       Analysis  

          SPACE      dep        Report    
=          NOUN       punct      Analysis  
=          PRON       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj       Analysis  
=          NOUN       conj      

In [4]:
for ent in doc.sents:
    for token in ent:
        print(token.text, "|", token.pos_, "|", spacy.explain(token.pos_))


 | SPACE | space
Quality | PROPN | proper noun
Assurance | PROPN | proper noun
Analysis | PROPN | proper noun
and | CCONJ | coordinating conjunction
Report | PROPN | proper noun

 | SPACE | space
= | NOUN | noun
= | PRON | pronoun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun
= | NOUN | noun


 | SPACE | space
Summary | NOUN | noun
: | PUNCT | punctuation

 | SPACE | space
--------- | PUNCT | punctuation

 | SPACE | space
The | DET | determiner
quality | NOUN | noun
assurance | NOUN | noun
analys

In [8]:
for ent in doc.sents:
    for token in ent:
        print(token.text, "|", token.pos_, "|", spacy.explain(token.pos_), "|", token.tag_, "|", spacy.explain(token.tag_))


 | SPACE | space | _SP | whitespace
Quality | PROPN | proper noun | NNP | noun, proper singular
Assurance | PROPN | proper noun | NNP | noun, proper singular
Analysis | PROPN | proper noun | NNP | noun, proper singular
and | CCONJ | coordinating conjunction | CC | conjunction, coordinating
Report | PROPN | proper noun | NNP | noun, proper singular

 | SPACE | space | _SP | whitespace
= | NOUN | noun | NNS | noun, plural
= | PRON | pronoun | PRP | pronoun, personal
= | NOUN | noun | NNS | noun, plural
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, singular or mass
= | NOUN | noun | NN | noun, sing

In [10]:
for ent in doc.sents:
    for token in ent:
        if token.pos_ in ['X', 'SYM', 'PUNCT', 'SPACE']:
            print(token.text, "|", token.pos_, "|", spacy.explain(token.pos_), "|", token.tag_, "|", spacy.explain(token.tag_))


 | SPACE | space | _SP | whitespace

 | SPACE | space | _SP | whitespace


 | SPACE | space | _SP | whitespace
: | PUNCT | punctuation | : | punctuation mark, colon or ellipsis

 | SPACE | space | _SP | whitespace
--------- | PUNCT | punctuation | NFP | superfluous punctuation

 | SPACE | space | _SP | whitespace
, | PUNCT | punctuation | , | punctuation mark, comma
, | PUNCT | punctuation | , | punctuation mark, comma
. | PUNCT | punctuation | . | punctuation mark, sentence closer


 | SPACE | space | _SP | whitespace
: | PUNCT | punctuation | : | punctuation mark, colon or ellipsis

 | SPACE | space | _SP | whitespace

 | SPACE | space | _SP | whitespace
1 | X | other | LS | list item marker
. | PUNCT | punctuation | . | punctuation mark, sentence closer
: | PUNCT | punctuation | : | punctuation mark, colon or ellipsis

     | SPACE | space | _SP | whitespace
- | PUNCT | punctuation | : | punctuation mark, colon or ellipsis
. | PUNCT | punctuation | . | punctuation mark, sentence cl

In [11]:
filtered_tokens = [token for token in doc if token.pos_ in ['X', 'SYM', 'PUNCT', 'SPACE']]
print(filtered_tokens)

[
, 
, 

, :, 
, ---------, 
, ,, ,, ., 

, :, 
, 
, 1, ., :, 
    , -, ., 
    , -, ., 

, 2, ., :, 
    , -, ., 
    , -, 

, ., :, 
    , -, ., 
    , -, ., 

, :, 
, 
, 1, ., ., 
, 2, ., ., 
, 3, ., ., 
, 4, ., -, ., 

, :, 
, -----------, 
, ., ., 
]


In [13]:
# counts of different POS tags
counts = doc.count_by(spacy.attrs.POS)
for k, v in sorted(counts.items()):
    print(f"{k}: {doc.vocab[k].text:{10}} {v}")

84: ADJ        6
85: ADP        18
86: ADV        3
87: AUX        3
89: CCONJ      7
90: DET        20
92: NOUN       109
93: NUM        4
94: PART       4
95: PRON       1
96: PROPN      7
97: PUNCT      37
100: VERB       25
101: X          6
103: SPACE      26
