# Syntactic Processing Challenge


Continuing from our previous challenge, we will explore several crucial techniques in **Natural Language Processing (NLP)** beyond basic tokenisation, focusing on **Part-of-Speech (POS)** tagging and various forms of syntactic parsing using the same email summary dataset. We will use the powerful spaCy library for efficient processing and visualisation. The goal is to understand how machines can analyse the grammatical structure of sentences, which is foundational for advanced tasks like *Information Extraction, Machine Translation, and sophisticated Grammar Checking* when performing our analysis using the email dataset.

#### Important Note for Our Journey
As you have seen in the previous challenge, our email thread dataset has 4167 threads and 21684 emails. The dataset is rich and large and so we won’t unleash all of it at once. This way, you’ll see how techniques work on <b>tiny examples first</b> … and then apply them to a <b>broader scale</b>.

### What we'll be doing in this challenge

- **POS Tagging & Comparison**: Apply spaCy's POS tagger and briefly contrast it with NLTK.

- **Phrase Chunking (Shallow Parsing)**: Extract Noun Phrases (NPs) and Verb Phrases (VPs).

- **Constituency Parsing**: Generate and visualise a phrase-structure tree (unavailable directly in spaCy but explained conceptually).

- **Dependency Parsing**: Visualise head-dependent relationships using spaCy's built-in tools.

- **Comparative Analysis**: Contrast the structural outputs of Constituency and Dependency parsing.

- **Practical Application**: Simple Subject-Verb Agreement checking

**Let's get started now**

### 1. Setup and Sample Text


In [None]:
# Setup
# Install the main libraries
!pip install spacy
!pip install nltk



Additionally, spaCy requires a language model. We will use the small English model (`en_core_web_sm`).



In [None]:
# Download the specific spaCy language model
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m72.7 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


We'll start by importing the necessary libraries and defining our sample text.

In [1]:
import spacy
import nltk
from spacy import displacy
import json


nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

print(f"spaCy Version: {spacy.__version__}")
print(f"NLTK Version: {nltk.__version__}")


# Load the spaCy model
try:
    nlp = spacy.load("en_core_web_sm")
except OSError:
    print("\nAttempting to download spaCy model 'en_core_web_sm'...")
    spacy.cli.download("en_core_web_sm")
    nlp = spacy.load("en_core_web_sm")

# Download NLTK resources
try:
    nltk.download('punkt', quiet=True)
    nltk.download('averaged_perceptron_tagger', quiet=True)
except Exception as e:
    print(f"NLTK download error: {e}")

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/aditikulkarni/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /Users/aditikulkarni/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


spaCy Version: 3.8.7
NLTK Version: 3.9.2


#### Loading the dataset

We will use the same dataset from the previous exercise. However, we will just use a small sample of the data

In [2]:
import yaml

config_path='/Users/aditikulkarni/Documents/Masters/AI-Projects/04-ML-Models/deep-learning-nlp/'
# Load the environment.yml file
print (config_path + "/configs/environment.yml")
with open(config_path + "/configs/environment.yml", "r") as f:
    config = yaml.safe_load(f)

# Choose environment (local or aws)
env = "local"   # or "aws"

base_path = config[env]["base_path"]
raw_data_path = base_path + config[env]["raw_data"]
processed_data_path = base_path + config[env]["processed_data"]
models_path = base_path + config[env]["models"]

print("Raw data path:", raw_data_path)
print("Processed data path:",  processed_data_path)
print("Models path:",  models_path)

/Users/aditikulkarni/Documents/Masters/AI-Projects/04-ML-Models/deep-learning-nlp//configs/environment.yml
Raw data path: /Users/aditikulkarni/Documents/Masters/AI-Projects/04-ML-Models/deep-learning-nlp/data/raw/
Processed data path: /Users/aditikulkarni/Documents/Masters/AI-Projects/04-ML-Models/deep-learning-nlp/data/processed/
Models path: /Users/aditikulkarni/Documents/Masters/AI-Projects/04-ML-Models/deep-learning-nlp/models/


In [3]:
# Loading the JSON data
email_data = json.load(open(raw_data_path + "/email_thread_details.json"))
email_summary = json.load(open(raw_data_path + "/email_thread_summaries.json"))

In [4]:
## We will pick a random email summary record as our sample text
SAMPLE_TEXT = email_summary[100]['summary'].split(". ")[0]
print(SAMPLE_TEXT)

Bert sent multiple emails with attached files containing weekend notes for different dates


### Part-of-Speech (POS) Tagging

**Part-of-Speech (POS) Tagging** is the process of marking up a word in a text as corresponding to a particular part of speech, based on its definition and context. Common tags include **NN (Noun)**, **VB (Verb)**, **JJ (Adjective)**, **DT (Determiner)**, etc.


- 1. UPOS - Universal POS
- 2. Penn TreeBank available with spaCy

#### POS Tagging using spaCy
spaCy processes the text and assigns tags to each token.

In [6]:
# Process the text with spaCy
doc_spacy = nlp(SAMPLE_TEXT)

# Show token -> POS tag mapping
print("\n--- spaCy POS Tagging (Token -> Tag) ---")
for token in doc_spacy:
    # token.text: the word
    # token.pos_: the simple Universal POS tag (UPOS)
    # token.tag_: the fine-grained, language-specific tag
    print(f"Token: '{token.text:7}' | UPOS Tag: {token.pos_:5} | Fine-grained Tag: {token.tag_}")


--- spaCy POS Tagging (Token -> Tag) ---
Token: 'Bert   ' | UPOS Tag: PROPN | Fine-grained Tag: NNP
Token: 'sent   ' | UPOS Tag: VERB  | Fine-grained Tag: VBD
Token: 'multiple' | UPOS Tag: ADJ   | Fine-grained Tag: JJ
Token: 'emails ' | UPOS Tag: NOUN  | Fine-grained Tag: NNS
Token: 'with   ' | UPOS Tag: ADP   | Fine-grained Tag: IN
Token: 'attached' | UPOS Tag: VERB  | Fine-grained Tag: VBN
Token: 'files  ' | UPOS Tag: NOUN  | Fine-grained Tag: NNS
Token: 'containing' | UPOS Tag: VERB  | Fine-grained Tag: VBG
Token: 'weekend' | UPOS Tag: NOUN  | Fine-grained Tag: NN
Token: 'notes  ' | UPOS Tag: NOUN  | Fine-grained Tag: NNS
Token: 'for    ' | UPOS Tag: ADP   | Fine-grained Tag: IN
Token: 'different' | UPOS Tag: ADJ   | Fine-grained Tag: JJ
Token: 'dates  ' | UPOS Tag: NOUN  | Fine-grained Tag: NNS


#### Brief Contrast with NLTK
NLTK (Natural Language Toolkit) is another primary library for NLP. Its tagger often uses a different tagset (e.g., the Penn Treebank tagset) and can serve as a simple alternative implementation.

In [7]:
# Import necessary NLTK components
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# NLTK Process
tokens_nltk = word_tokenize(SAMPLE_TEXT)
tagged_nltk = pos_tag(tokens_nltk)

print("\n--- NLTK POS Tagging (Token -> Tag) ---")
for token, tag in tagged_nltk:
    print(f"Token: '{token:7}' | Tag: {tag}")


--- NLTK POS Tagging (Token -> Tag) ---
Token: 'Bert   ' | Tag: NNP
Token: 'sent   ' | Tag: VBD
Token: 'multiple' | Tag: JJ
Token: 'emails ' | Tag: NNS
Token: 'with   ' | Tag: IN
Token: 'attached' | Tag: JJ
Token: 'files  ' | Tag: NNS
Token: 'containing' | Tag: VBG
Token: 'weekend' | Tag: NN
Token: 'notes  ' | Tag: NNS
Token: 'for    ' | Tag: IN
Token: 'different' | Tag: JJ
Token: 'dates  ' | Tag: NNS


NLTK uses tags like 'DT', 'JJ', 'NN', 'VBZ' (Verb, 3rd person singular present).

SpaCy provides both the simple UPOS ('DET', 'ADJ', 'NOUN', 'VERB') and the fine-grained tag ('DT', 'JJ', 'NN', 'VBZ'), offering more flexibility.

The raw POS tags (like 'DT', 'NN', 'VBZ') generated here are the essential input for all subsequent parsing steps (chunking, constituency, and dependency parsing).

### Phrase Chunking (Shallow Parsing)
**Phrase Chunking, or Shallow Parsing**, is the process of identifying simple, non-overlapping phrases (like Noun Phrases or Verb Phrases) within a sentence. Unlike full parsing, it does not analyse the internal structure of the phrase or its role in the sentence.

#### Method 1: spaCy's Built-in Noun Chunker
spaCy makes extracting Noun Chunks straightforward using the `doc.noun_chunks` property.

In [8]:
print("--- Extracted Noun Phrases (NPs) ---")
noun_phrases = []
for chunk in doc_spacy.noun_chunks:
    noun_phrases.append(chunk.text)
    print(f"Noun Phrase: '{chunk.text}' | Root: '{chunk.root.text}' | Root POS: '{chunk.root.pos_}'")


--- Extracted Noun Phrases (NPs) ---
Noun Phrase: 'Bert' | Root: 'Bert' | Root POS: 'PROPN'
Noun Phrase: 'multiple emails' | Root: 'emails' | Root POS: 'NOUN'
Noun Phrase: 'attached files' | Root: 'files' | Root POS: 'NOUN'
Noun Phrase: 'weekend notes' | Root: 'notes' | Root POS: 'NOUN'
Noun Phrase: 'different dates' | Root: 'dates' | Root POS: 'NOUN'


This process quickly identifies the main subjects/objects.

The 'Root' is the main noun in the phrase.

#### Method 2: NLTK Grammar-Based Chunking

NLTK allows us to define custom grammar rules using regular expressions over POS tags to perform chunking.

In [9]:
# Define a grammar to find Noun Phrases (NP) and Verb Phrases (VP)
# NP: Starts with an optional Determiner (DT), followed by any number of Adjectives (JJ), and ends with a Noun (NN/NNS)
# VP: Starts with a Verb (VB/VBD/VBZ/VBG), followed by any other word.
from nltk.chunk import RegexpParser # For grammar-based chunking

grammar = r"""
  NP: {<DT|PRP\$>?<JJ.*>*<NN.*>+}
  VP: {<VB.*><.*>*}
"""
chunk_parser = RegexpParser(grammar)
tree = chunk_parser.parse(tagged_nltk)

print("\n--- NLTK Grammar-Based Chunking (NP/VP) ---")
print(tree)
# tree.pretty_print() # Use this command in a live notebook environment


--- NLTK Grammar-Based Chunking (NP/VP) ---
(S
  (NP Bert/NNP)
  (VP
    sent/VBD
    (NP multiple/JJ emails/NNS)
    with/IN
    (NP attached/JJ files/NNS)
    containing/VBG
    (NP weekend/NN notes/NNS)
    for/IN
    (NP different/JJ dates/NNS)))


The NLTK tree shows the chunks identified by the regex rules, like (NP BERT/NNP). This confirms the quick identification of meaningful units for Information Extraction.

#### Extracting Verb Phrases (VPs) - via simple heuristic
While spaCy doesn't have a dedicated `verb_chunks` property, we can use a simple heuristic based on POS tags to identify potential verb groups.

In [10]:
print("\n--- Potential Verb Phrases (VPs) (Heuristic) ---")
verb_phrases = []
current_vp = []
for token in doc_spacy:
    if token.pos_ in ('VERB', 'AUX', 'ADV'): # Look for verbs, auxiliaries, or adverbs around the verb
        current_vp.append(token.text)
    elif current_vp:
        verb_phrases.append(" ".join(current_vp))
        print(f"Verb Phrase: '{' '.join(current_vp)}'")
        current_vp = []
if current_vp:
    verb_phrases.append(" ".join(current_vp))
    print(f"Verb Phrase: '{' '.join(current_vp)}'")



--- Potential Verb Phrases (VPs) (Heuristic) ---
Verb Phrase: 'sent'
Verb Phrase: 'attached'
Verb Phrase: 'containing'


**How this helps Information Extraction**:

By identifying Noun Phrase and Verb Phrase, we can infer the basic relationship: [BERT] *does* [an action]. This is a core step in extracting Subject-Verb-Object triples.

### Constituency Parsing

**Constituency Parsing (Phrase-Structure Grammar)** analyses a sentence by grouping words into nested, hierarchical phrases (constituents). It generates a **phrase-structure tree** (or parse tree), where internal nodes are phrase labels (like NP, VP, S) and leaf nodes are words.

#### Generating and Visualising the Tree (NLTK)

Since `spaCy` focuses on dependency parsing, we use NLTK to demonstrate constituency parsing. NLTK's `ParentedTree` can be used to display the hierarchical structure.

In [11]:
from nltk.tree import Tree # For constituency tree visualisation


# Note: Full NLTK constituency parsing often requires a trained statistical parser (e.g., Stanford Parser or an NLTK tagger),
# which is computationally expensive or requires external libraries.
# For demonstration, we will manually define the tree structure based on standard English grammar.

# Constituency tree structure for the sample sentence (similar to what a parser would generate)
tree_string = "(S (NP (NNP Bert)) (VP (VBD sent) (NP (JJ multiple) (NNS emails)) (PP (IN with) (NP (VBN attached) (NNS files) (VP (VBG containing) (NP (NN weekend) (NNS notes)) (PP (IN for) (NP (JJ different) (NNS dates))))))) (. .))"

# Create and print the NLTK Tree object
constituency_tree = Tree.fromstring(tree_string)

print("\n--- Constituency Parse Tree (NLTK Format) ---")
print(constituency_tree)
constituency_tree.pretty_print() # Use this command in a live notebook environment



--- Constituency Parse Tree (NLTK Format) ---
(S
  (NP (NNP Bert))
  (VP
    (VBD sent)
    (NP (JJ multiple) (NNS emails))
    (PP
      (IN with)
      (NP
        (VBN attached)
        (NNS files)
        (VP
          (VBG containing)
          (NP (NN weekend) (NNS notes))
          (PP (IN for) (NP (JJ different) (NNS dates)))))))
  (. .))
                                              S                                                           
  ____________________________________________|_________________________________________________________   
 |                       VP                                                                             | 
 |     __________________|____________________                                                          |  
 |    |             |                         PP                                                        | 
 |    |             |           ______________|_________________                                        |  
 |   

### Dependency Parsing

**Dependency Parsing** analyses the grammatical relationships between words in a sentence. It establishes a direct link between a **head** (governor) and its **dependents** (modifiers). Each link has a label that describes the nature of the relationship (e.g., `nsubj` for nominal subject, `dobj` for direct object).

#### 5.1 Head-Dependent Relationships in spaCy
spaCy's `token.dep_` and `token.head` attributes allow us to access the dependency structure.

In [12]:
print("--- Dependency Parsing: Head-Dependent Relationships ---")
print(f"{'Token':10} | {'Dependency':12} | {'Head Token':10} | {'Head POS':4}")
print("-" * 40)
for token in doc_spacy:
    print(f"{token.text:10} | {token.dep_:12} | {token.head.text:10} | {token.head.pos_:4}")

--- Dependency Parsing: Head-Dependent Relationships ---
Token      | Dependency   | Head Token | Head POS
----------------------------------------
Bert       | nsubj        | sent       | VERB
sent       | ROOT         | sent       | VERB
multiple   | amod         | emails     | NOUN
emails     | dobj         | sent       | VERB
with       | prep         | sent       | VERB
attached   | amod         | files      | NOUN
files      | pobj         | with       | ADP 
containing | acl          | files      | NOUN
weekend    | compound     | notes      | NOUN
notes      | dobj         | containing | VERB
for        | prep         | notes      | NOUN
different  | amod         | dates      | NOUN
dates      | pobj         | for        | ADP 


#### Visualisation of Dependency Graph
We use `displacy.render` to visualise the dependency graph, showing the links and their labels.

In [19]:
from IPython.display import display
import IPython.core.display as core_display

if not hasattr(core_display, "display"):
    core_display.display = display
if not hasattr(core_display, "HTML"):
    core_display.HTML = HTML

# Visualise the dependency tree
print("\n--- Dependency Graph Visualisation ---")
displacy.render(doc_spacy, style="dep", jupyter=True, options={'compact': True})



--- Dependency Graph Visualisation ---


NLTK often uses the Penn Treebank tagset for its POS tagger, providing an immediate point of comparison.

In [20]:
# NLTK Process
from nltk.tokenize import word_tokenize

tokens_nltk = word_tokenize(SAMPLE_TEXT)
tagged_nltk = pos_tag(tokens_nltk)

print("\n--- NLTK POS Tagging (Token -> Tag) ---")
print(f"{'Token':10} | {'Tag'}")
print("-" * 15)
for token, tag in tagged_nltk:
    print(f"{token:10} | {tag}")


--- NLTK POS Tagging (Token -> Tag) ---
Token      | Tag
---------------
Bert       | NNP
sent       | VBD
multiple   | JJ
emails     | NNS
with       | IN
attached   | JJ
files      | NNS
containing | VBG
weekend    | NN
notes      | NNS
for        | IN
different  | JJ
dates      | NNS


The raw POS tags (like 'DT', 'NN', 'VBZ') generated here are the essential input for all subsequent parsing steps (chunking, constituency, and dependency parsing).

Constituency parsing emphasises the components of a sentence, ideal for understanding sentence structure based on grammar rules.

Dependency parsing emphasises the functional roles (subject, object, modifier), making it highly effective for tasks like Information Extraction and Semantic Parsing.

### Grammatical Agreement Checking

**Grammatical Agreement** is a requirement that words or word forms in a sentence share the same value for some grammatical category, such as number (singular/plural), person (1st, 2nd, 3rd), or gender. Subject-Verb Agreement is the most common form in English.


#### Simple Subject-Verb Mismatch Detection
We'll use a new sentence to demonstrate a common error: using a singular verb with a plural subject (or vice-versa)

In [21]:
# Mismatched sentence: The *dogs* *jumps* over the fence. (Incorrect: plural subject, singular verb)
mismatch_text = "The employees sents multiple email."
doc_mismatch = nlp(mismatch_text)

print("\n--- Checking Subject-Verb Agreement Mismatch ---")

# Simple Logic: Iterate through tokens, find a verb (VERB)
# and check if its nominal subject (nsubj) is plural (NNS) or singular (NN/NNP).

for verb in doc_mismatch:
    if verb.pos_ == 'VERB':
        # Check the nominal subject (nsubj) of the verb
        subject = next((token for token in verb.children if token.dep_ == 'nsubj'), None)

        if subject:
            # Simple check based on POS tags and verb ending (heuristic)
            subject_is_plural = subject.tag_ == 'NNS'
            verb_is_singular_form = verb.text.endswith('s') and verb.tag_ == 'VBZ'

            if subject_is_plural and verb_is_singular_form:
                print(f"Mismatch Found: Subject '{subject.text}' (Plural) with Verb '{verb.text}' (Singular Form - VBZ).")
                print("Correction needed: 'sents' should be 'sent'.")
                # Visualise the mismatch to show the dependency
                displacy.render(doc_mismatch, style="dep", jupyter=True, options={'compact': True})
            else:
                print("Agreement check passed (for this simple rule).")


--- Checking Subject-Verb Agreement Mismatch ---
Mismatch Found: Subject 'employees' (Plural) with Verb 'sents' (Singular Form - VBZ).
Correction needed: 'sents' should be 'sent'.


Grammatical agreement checking is a core function of **grammar checking software** (like Grammarly or Microsoft Word). For **Language Learning Tools**, providing specific feedback on agreement errors is essential for student progress. Syntactic parsing (specifically dependency parsing) is necessary because it accurately identifies which noun is the actual subject (`nsubj`) of the main verb, even in complex sentences with intervening phrases.

#### Processing a Mini-Corpus

To conclude, we apply the key concepts (POS, Chunking, Dependency) to a small corpus to show how these techniques are used sequentially in a typical NLP pipeline.

In [22]:
MINI_CORPUS = email_summary[100]['summary'].split(". ")

print("\n--- Integrated Analysis of Mini-Corpus ---")

for i, sentence in enumerate(MINI_CORPUS):
    doc_corpus = nlp(sentence)
    print(f"\n[Sentence {i+1}]: {sentence}")

    # 1. POS Tags
    pos_tags = [(token.text, token.pos_) for token in doc_corpus]
    print(f"  POS Tags: {pos_tags}")

    # 2. Phrase Chunks (NPs)
    nps = [chunk.text for chunk in doc_corpus.noun_chunks]
    print(f"  Noun Phrases: {nps}")

    # 3. Key Dependencies (Subject-Verb-Object Triple)
    subject = ""
    verb = ""
    obj = ""

    # Simple extraction logic: Find ROOT (Verb), then its nsubj (Subject) and dobj (Object)
    for token in doc_corpus:
        if token.dep_ == 'ROOT':
            verb = token.text

            # Find Subject (nsubj) and Object (dobj) among the children
            for child in token.children:
                if child.dep_ == 'nsubj':
                    subject = child.text
                elif child.dep_ == 'dobj':
                    obj = child.text

    print(f"  S-V-O Triple: Subject='{subject}', Verb='{verb}', Object='{obj}'")
    displacy.render(doc_corpus, style="dep", jupyter=True, options={'compact': True})


--- Integrated Analysis of Mini-Corpus ---

[Sentence 1]: Bert sent multiple emails with attached files containing weekend notes for different dates
  POS Tags: [('Bert', 'PROPN'), ('sent', 'VERB'), ('multiple', 'ADJ'), ('emails', 'NOUN'), ('with', 'ADP'), ('attached', 'VERB'), ('files', 'NOUN'), ('containing', 'VERB'), ('weekend', 'NOUN'), ('notes', 'NOUN'), ('for', 'ADP'), ('different', 'ADJ'), ('dates', 'NOUN')]
  Noun Phrases: ['Bert', 'multiple emails', 'attached files', 'weekend notes', 'different dates']
  S-V-O Triple: Subject='Bert', Verb='sent', Object='emails'



[Sentence 2]: He mentioned problems with the confirmation phase of several cycles over the weekend of March 16th-17th, but resolved them with the help of the Help Desk and Production Support
  POS Tags: [('He', 'PRON'), ('mentioned', 'VERB'), ('problems', 'NOUN'), ('with', 'ADP'), ('the', 'DET'), ('confirmation', 'NOUN'), ('phase', 'NOUN'), ('of', 'ADP'), ('several', 'ADJ'), ('cycles', 'NOUN'), ('over', 'ADP'), ('the', 'DET'), ('weekend', 'NOUN'), ('of', 'ADP'), ('March', 'PROPN'), ('16th-17th', 'PROPN'), (',', 'PUNCT'), ('but', 'CCONJ'), ('resolved', 'VERB'), ('them', 'PRON'), ('with', 'ADP'), ('the', 'DET'), ('help', 'NOUN'), ('of', 'ADP'), ('the', 'DET'), ('Help', 'PROPN'), ('Desk', 'PROPN'), ('and', 'CCONJ'), ('Production', 'PROPN'), ('Support', 'PROPN')]
  Noun Phrases: ['He', 'problems', 'the confirmation phase', 'several cycles', 'the weekend', 'March 16th-17th', 'them', 'the help', 'the Help Desk', 'Production Support']
  S-V-O Triple: Subject='He', Verb='mentioned', Object='p


[Sentence 3]: The other weekends mentioned were fairly quiet
  POS Tags: [('The', 'DET'), ('other', 'ADJ'), ('weekends', 'NOUN'), ('mentioned', 'VERB'), ('were', 'AUX'), ('fairly', 'ADV'), ('quiet', 'ADJ')]
  Noun Phrases: ['The other weekends']
  S-V-O Triple: Subject='weekends', Verb='were', Object=''



[Sentence 4]: Bert also requested that if anyone no longer needed to be included in the distribution, they should contact a member of the TW team.
  POS Tags: [('Bert', 'PROPN'), ('also', 'ADV'), ('requested', 'VERB'), ('that', 'SCONJ'), ('if', 'SCONJ'), ('anyone', 'PRON'), ('no', 'ADV'), ('longer', 'ADV'), ('needed', 'VERB'), ('to', 'PART'), ('be', 'AUX'), ('included', 'VERB'), ('in', 'ADP'), ('the', 'DET'), ('distribution', 'NOUN'), (',', 'PUNCT'), ('they', 'PRON'), ('should', 'AUX'), ('contact', 'VERB'), ('a', 'DET'), ('member', 'NOUN'), ('of', 'ADP'), ('the', 'DET'), ('TW', 'PROPN'), ('team', 'NOUN'), ('.', 'PUNCT')]
  Noun Phrases: ['Bert', 'anyone', 'the distribution', 'they', 'a member', 'the TW team']
  S-V-O Triple: Subject='Bert', Verb='requested', Object=''


This section demonstrates a typical NLP workflow: tokens are tagged (POS), grouped (NP/VP), and then their functional relationships (S-V-O) are extracted using Dependency Parsing, which is foundational for Information Extraction.

### Conclusion

In this pipeline, we walked through the essential **syntactic processing** steps for email data, from basic **POS Tagging** to complex **Dependency Parsing**, provide different, yet complementary, views of a sentence's structure.

We leveraged `spaCy` for its efficiency in shallow **parsing (phrase chunking)** and for its robust dependency parsing features. Understanding these structural relationships is the essential bridge between simple word processing and advanced **Natural Language Understanding (NLU)**. The practical example of Subject-Verb Agreement checking demonstrated the immediate, real-world utility of dependency parsing in building sophisticated language tools.








