# NLTK Complete Guide - Section 9: Chunking

This notebook covers:
- What is Chunking?
- Noun Phrase Chunking
- Chunk Grammar Rules
- Chinking (Excluding Patterns)
- Custom Chunkers
- Practical Applications

In [1]:
import nltk

nltk.download('punkt', quiet=True)
nltk.download('averaged_perceptron_tagger', quiet=True)

from nltk import pos_tag, word_tokenize
from nltk.chunk import RegexpParser
from nltk.tree import Tree

## 9.1 What is Chunking?

**Chunking** (also called **shallow parsing**) is an NLP technique that groups words into meaningful phrases based on their Part-of-Speech (POS) tags. Unlike full parsing which builds complete syntactic trees, chunking identifies flat, non-overlapping segments of text.

### Why Use Chunking?

1. **Information Extraction**: Identify key entities and relationships in text
2. **Text Summarization**: Extract important noun phrases for summaries
3. **Question Answering**: Find candidate answers by extracting relevant phrases
4. **Named Entity Recognition**: Often used as a preprocessing step

### Common Chunk Types

| Chunk Type | Description | Examples |
|------------|-------------|----------|
| **NP** (Noun Phrase) | Groups around a noun | "the big dog", "a beautiful sunset" |
| **VP** (Verb Phrase) | Groups around a verb | "is running", "has been working" |
| **PP** (Prepositional Phrase) | Preposition + NP | "in the house", "on the table" |
| **ADVP** (Adverb Phrase) | Groups around an adverb | "very quickly", "extremely well" |

### How Chunking Works

Chunking uses **regular expression patterns** over POS tags to identify phrase boundaries. The process is:

1. **Tokenize** the text into words
2. **POS tag** each token
3. **Apply chunk grammar rules** to group tokens into phrases

In [2]:
sentence = "The quick brown fox jumps over the lazy dog."

# POS tag the sentence
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)

print("POS Tagged:")
print(tagged)

POS Tagged:
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]


In [3]:
# Define a simple noun phrase grammar
# NP: Determiner + Adjective(s) + Noun
grammar = "NP: {<DT>?<JJ>*<NN.*>+}"

# Create a chunk parser
chunk_parser = RegexpParser(grammar)

# Parse the tagged sentence
tree = chunk_parser.parse(tagged)

print("Chunked:")
tree.pprint()

Chunked:
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  jumps/VBZ
  over/IN
  (NP the/DT lazy/JJ dog/NN)
  ./.)


## 9.2 Chunk Grammar Syntax

NLTK uses a special syntax for defining chunk patterns. The grammar is based on **regular expressions** that match sequences of POS tags.

### Basic Syntax Elements

| Symbol | Meaning | Example |
|--------|---------|---------|
| `<TAG>` | Match a specific POS tag | `<NN>` matches nouns |
| `<TAG1\|TAG2>` | Match either tag | `<NN\|NNS>` matches singular or plural nouns |
| `<TAG.*>` | Wildcard matching | `<NN.*>` matches NN, NNS, NNP, NNPS |
| `?` | Optional (0 or 1) | `<DT>?` matches zero or one determiner |
| `*` | Zero or more | `<JJ>*` matches any number of adjectives |
| `+` | One or more | `<NN>+` matches one or more nouns |
| `{pattern}` | **Chunk** - include this pattern | `{<DT><NN>}` groups DT+NN together |
| `}pattern{` | **Chink** - exclude this pattern | `}<VB>{` removes verbs from chunks |

### Understanding the Grammar Format

```
CHUNK_LABEL: {<pattern>}
```

- **CHUNK_LABEL**: The name for the chunk type (e.g., NP, VP, PP)
- **{...}**: Curly braces indicate what to include in the chunk
- **<...>**: Angle brackets contain POS tag patterns

### Common POS Tags Reference

| Tag | Description | Example |
|-----|-------------|---------|
| DT | Determiner | the, a, an |
| JJ | Adjective | big, red, fast |
| NN | Singular noun | dog, cat, house |
| NNS | Plural noun | dogs, cats |
| NNP | Proper noun | John, London |
| VB | Base verb | run, eat |
| VBD | Past tense verb | ran, ate |
| VBG | Gerund/present participle | running, eating |
| IN | Preposition | in, on, at |
| RB | Adverb | quickly, very |

In [4]:
# Different grammar patterns
grammars = {
    "Simple NP": "NP: {<DT>?<NN>}",
    "NP with adjectives": "NP: {<DT>?<JJ>*<NN>}",
    "NP with multiple nouns": "NP: {<DT>?<JJ>*<NN.*>+}",
    "Verb phrase": "VP: {<VB.*><RB>?}",
    "Prepositional phrase": "PP: {<IN><DT>?<NN.*>}",
}

sentence = "The quick brown fox jumps quickly over the lazy dog."
tagged = pos_tag(word_tokenize(sentence))

print(f"Sentence: {sentence}")
print(f"Tagged: {tagged}\n")

for name, grammar in grammars.items():
    parser = RegexpParser(grammar)
    tree = parser.parse(tagged)
    
    # Extract chunks
    chunks = []
    for subtree in tree:
        if isinstance(subtree, Tree):
            chunk_text = ' '.join(word for word, tag in subtree.leaves())
            chunks.append(chunk_text)
    
    print(f"{name}: {chunks}")

Sentence: The quick brown fox jumps quickly over the lazy dog.
Tagged: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('quickly', 'RB'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]

Simple NP: ['brown', 'fox', 'dog']
NP with adjectives: ['The quick brown', 'fox', 'the lazy dog']
NP with multiple nouns: ['The quick brown fox jumps', 'the lazy dog']
Verb phrase: []
Prepositional phrase: []


## 9.3 Noun Phrase Chunking (NP)

**Noun Phrases (NPs)** are the most commonly chunked phrase type. They typically consist of:

- **Head noun**: The main noun (required)
- **Determiner**: Articles like "the", "a", possessives like "my" (optional)
- **Modifiers**: Adjectives that describe the noun (optional)

### Common NP Patterns

| Pattern | Matches | Example |
|---------|---------|---------|
| `{<NN>}` | Single noun | "dog" |
| `{<DT><NN>}` | Determiner + noun | "the dog" |
| `{<DT>?<JJ>*<NN>}` | Optional det + adjectives + noun | "the big brown dog" |
| `{<DT>?<JJ>*<NN.*>+}` | Handles noun variants | "the quick brown fox" |
| `{<NNP>+}` | Proper noun sequences | "New York City" |
| `{<PRP>}` | Pronouns | "she", "they" |

### Multi-Rule Grammars

You can define multiple patterns for the same chunk type. NLTK applies rules in order, so more specific patterns should come first:

In [5]:
# Comprehensive NP grammar
np_grammar = r"""
    NP: {<DT|PRP\$>?<JJ>*<NN.*>+}   # Determiner + adjectives + nouns
        {<NNP>+}                      # Proper nouns
        {<PRP>}                       # Pronouns
"""

np_parser = RegexpParser(np_grammar)

sentences = [
    "The big brown dog chased the small cat.",
    "My beautiful garden has colorful flowers.",
    "John and Mary visited New York City.",
    "She bought an expensive red sports car.",
]

print("Noun Phrase Chunking")
print("=" * 60)

for sent in sentences:
    tagged = pos_tag(word_tokenize(sent))
    tree = np_parser.parse(tagged)
    
    nps = [' '.join(w for w, t in subtree.leaves()) 
           for subtree in tree if isinstance(subtree, Tree)]
    
    print(f"\nSentence: {sent}")
    print(f"NPs: {nps}")

Noun Phrase Chunking

Sentence: The big brown dog chased the small cat.
NPs: ['The big brown dog', 'the small cat']

Sentence: My beautiful garden has colorful flowers.
NPs: ['My beautiful garden', 'colorful flowers']

Sentence: John and Mary visited New York City.
NPs: ['John', 'Mary', 'New York City']

Sentence: She bought an expensive red sports car.
NPs: ['She', 'an expensive red sports car']


## 9.4 Verb Phrase Chunking (VP)

**Verb Phrases (VPs)** capture the action part of a sentence. They can include:

- **Main verb**: The action word (required)
- **Auxiliary verbs**: "is", "have", "will" (optional)
- **Adverbs**: Modifiers like "quickly", "always" (optional)
- **Modal verbs**: "can", "should", "must" (optional)

### Common VP Patterns

| Pattern | Matches | Example |
|---------|---------|---------|
| `{<VB>}` | Base verb | "run" |
| `{<VB.*>}` | Any verb form | "runs", "running", "ran" |
| `{<VB.*><RB>?}` | Verb + optional adverb | "runs quickly" |
| `{<MD><VB>}` | Modal + verb | "can swim" |
| `{<VB.*>+}` | Verb sequences | "has been running" |

### Verb POS Tags

| Tag | Description | Example |
|-----|-------------|---------|
| VB | Base form | run, eat |
| VBD | Past tense | ran, ate |
| VBG | Gerund (-ing) | running, eating |
| VBN | Past participle | run, eaten |
| VBP | Present, non-3rd person | run, eat |
| VBZ | Present, 3rd person | runs, eats |
| MD | Modal | can, will, should |

In [6]:
# Verb phrase grammar
vp_grammar = r"""
    VP: {<VB.*><RB.*>?<VB.*>*}  # Verb + optional adverb + more verbs
        {<MD><VB>}              # Modal + base verb
"""

vp_parser = RegexpParser(vp_grammar)

sentences = [
    "She is running quickly.",
    "They have been working hard.",
    "He can swim very fast.",
    "The dog was barking loudly.",
]

print("Verb Phrase Chunking")
print("=" * 60)

for sent in sentences:
    tagged = pos_tag(word_tokenize(sent))
    tree = vp_parser.parse(tagged)
    
    vps = [' '.join(w for w, t in subtree.leaves()) 
           for subtree in tree if isinstance(subtree, Tree)]
    
    print(f"\nSentence: {sent}")
    print(f"VPs: {vps}")

Verb Phrase Chunking

Sentence: She is running quickly.
VPs: ['is running']

Sentence: They have been working hard.
VPs: ['have been working']

Sentence: He can swim very fast.
VPs: ['swim very']

Sentence: The dog was barking loudly.
VPs: ['was barking']


## 9.5 Chinking (Excluding Patterns)

**Chinking** is the opposite of chunking - it **removes** elements from existing chunks. This is useful when you want to:

1. **Break apart large chunks** at certain boundaries
2. **Exclude specific word types** from phrases
3. **Create cleaner, more meaningful chunks**

### Chinking Syntax

```
}pattern{    # Note: braces are REVERSED compared to chunking
```

The reversed braces `}...{` indicate "exclude this pattern" from the chunk.

### How Chinking Works

1. First, a chunk rule groups tokens together
2. Then, a chink rule "punches holes" in the chunk
3. The result is the original chunk split at the chinked elements

### Chinking Strategy

A common approach is to:
1. **Chunk everything**: `{<.*>+}` - grab all tokens
2. **Chink at boundaries**: `}<VB.*|IN>{` - split at verbs and prepositions

This effectively creates noun phrases by excluding verbs and prepositions.

### Important: Grammar Rules Must Be on Separate Lines

When combining chunk and chink rules, each rule must be on its own line:

In [7]:
# Chunk everything, then exclude verbs and prepositions
chink_grammar = r"""
    NP: {<.*>+}         # Chunk everything
        }<VB.*|IN>{     # Chink verbs and prepositions
"""

chink_parser = RegexpParser(chink_grammar)

sentence = "The dog ran through the park and jumped over the fence."
tagged = pos_tag(word_tokenize(sentence))

print(f"Sentence: {sentence}\n")
print("Tagged:")
print(tagged)

tree = chink_parser.parse(tagged)
print("\nChunked (with chinking):")
tree.pprint()

Sentence: The dog ran through the park and jumped over the fence.

Tagged:
[('The', 'DT'), ('dog', 'NN'), ('ran', 'VBD'), ('through', 'IN'), ('the', 'DT'), ('park', 'NN'), ('and', 'CC'), ('jumped', 'VBD'), ('over', 'IN'), ('the', 'DT'), ('fence', 'NN'), ('.', '.')]

Chunked (with chinking):
(S
  (NP The/DT dog/NN)
  ran/VBD
  through/IN
  (NP the/DT park/NN and/CC)
  jumped/VBD
  over/IN
  (NP the/DT fence/NN ./.))


In [8]:
# Compare with and without chinking
sentence = "The cat sat on the mat near the door."
tagged = pos_tag(word_tokenize(sentence))

# Without chinking - everything becomes one chunk
no_chink = RegexpParser("NP: {<.*>+}")
tree1 = no_chink.parse(tagged)

# With chinking - breaks at prepositions
with_chink = RegexpParser(r"""
    NP: {<.*>+}
        }<IN>{
""")
tree2 = with_chink.parse(tagged)

print(f"Sentence: {sentence}\n")

print("Without chinking:")
tree1.pprint()

print("\nWith chinking (exclude prepositions):")
tree2.pprint()

Sentence: The cat sat on the mat near the door.

Without chinking:
(S
  (NP
    The/DT
    cat/NN
    sat/VBD
    on/IN
    the/DT
    mat/NN
    near/IN
    the/DT
    door/NN
    ./.))

With chinking (exclude prepositions):
(S
  (NP The/DT cat/NN sat/VBD)
  on/IN
  (NP the/DT mat/NN)
  near/IN
  (NP the/DT door/NN ./.))


## 9.6 Complex Multi-Level Grammar

NLTK's `RegexpParser` supports **cascaded chunking**, where you can define multiple chunk types and even nest them. Rules are applied in order from top to bottom.

### Cascaded Chunking Process

1. **First pass**: Identify basic chunks (NP, VP)
2. **Second pass**: Combine basic chunks into larger structures (PP, CLAUSE)
3. **Result**: Hierarchical phrase structure

### Building Complex Grammars

When designing multi-level grammars:

- **Order matters**: Define base chunks before compound chunks
- **Reference other chunks**: Use chunk labels in later patterns (e.g., `<NP>` after NP is defined)
- **Keep it simple**: Complex grammars can be hard to debug

### Example Structure

```
NP: {<DT>?<JJ>*<NN.*>+}    # Level 1: Basic noun phrases
VP: {<VB.*>+}               # Level 1: Basic verb phrases  
PP: {<IN><NP>}              # Level 2: Preposition + NP
CLAUSE: {<NP><VP><NP>?}     # Level 3: Subject + Verb + Object
```

In [9]:
# Multi-level chunking grammar
complex_grammar = r"""
    NP: {<DT|PRP\$>?<JJ>*<NN.*>+}  # Noun phrases
    VP: {<VB.*>+}                   # Verb phrases
    PP: {<IN><NP>}                  # Prepositional phrases
    CLAUSE: {<NP><VP><NP>?<PP>*}   # Simple clause
"""

complex_parser = RegexpParser(complex_grammar)

sentence = "The young student studies hard in the library."
tagged = pos_tag(word_tokenize(sentence))

print(f"Sentence: {sentence}\n")
print("Tagged:")
for word, tag in tagged:
    print(f"  {word}: {tag}")

tree = complex_parser.parse(tagged)
print("\nChunked tree:")
tree.pprint()

Sentence: The young student studies hard in the library.

Tagged:
  The: DT
  young: JJ
  student: NN
  studies: NNS
  hard: VBP
  in: IN
  the: DT
  library: NN
  .: .

Chunked tree:
(S
  (CLAUSE
    (NP The/DT young/JJ student/NN studies/NNS)
    (VP hard/VBP)
    (PP in/IN (NP the/DT library/NN)))
  ./.)


## 9.7 Extracting Chunks Programmatically

The chunker returns a **tree structure** where:
- The root node is labeled 'S' (sentence)
- Chunk nodes are labeled with their chunk type (NP, VP, etc.)
- Leaf nodes contain the original (word, tag) tuples

### Key Methods for Working with Chunk Trees

| Method | Description |
|--------|-------------|
| `tree.subtrees()` | Iterate over all subtrees (including root) |
| `tree.leaves()` | Get all (word, tag) pairs |
| `subtree.label()` | Get the chunk label (NP, VP, S) |
| `isinstance(node, Tree)` | Check if node is a chunk (vs. unchunked word) |

### Extracting Chunks as Text

To get the actual words in a chunk:
```python
chunk_text = ' '.join(word for word, tag in subtree.leaves())
```

In [10]:
def extract_chunks(text, grammar):
    """Extract chunks from text using given grammar"""
    parser = RegexpParser(grammar)
    tagged = pos_tag(word_tokenize(text))
    tree = parser.parse(tagged)
    
    chunks = {}
    for subtree in tree.subtrees():
        if subtree.label() != 'S':  # Skip root
            chunk_type = subtree.label()
            chunk_text = ' '.join(word for word, tag in subtree.leaves())
            
            if chunk_type not in chunks:
                chunks[chunk_type] = []
            chunks[chunk_type].append(chunk_text)
    
    return chunks

In [11]:
grammar = r"""
    NP: {<DT|PRP\$>?<JJ>*<NN.*>+}
    VP: {<VB.*><RB>?}
"""

text = "The clever student quickly solved the difficult math problem."

print(f"Text: {text}\n")

chunks = extract_chunks(text, grammar)

for chunk_type, chunk_list in chunks.items():
    print(f"{chunk_type}: {chunk_list}")

Text: The clever student quickly solved the difficult math problem.

NP: ['The clever student', 'the difficult math problem']
VP: ['solved']


## 9.8 Practical: Information Extraction

One of the most useful applications of chunking is **information extraction** - automatically identifying structured information from unstructured text.

### Subject-Verb-Object (SVO) Extraction

In English, simple sentences follow the **SVO pattern**:
- **Subject**: Who/what performs the action (usually first NP)
- **Verb**: The action (VP)
- **Object**: Who/what receives the action (usually second NP)

Example: "**The cat** (S) **chased** (V) **the mouse** (O)."

### Applications of SVO Extraction

1. **Knowledge graph construction**: Build relationships between entities
2. **Summarization**: Extract key facts from documents
3. **Question answering**: Find answers to "who did what" questions
4. **Sentiment analysis**: Identify what entity is associated with sentiment

### Limitations

Simple chunking-based SVO extraction has limitations:
- Doesn't handle passive voice well ("The mouse was chased by the cat")
- Struggles with complex sentences (subordinate clauses, relative pronouns)
- May miss indirect objects ("She gave **him** the book")

In [12]:
def extract_subject_verb_object(sentence):
    """Simple SVO extraction using chunking"""
    # Grammar for SVO patterns
    grammar = r"""
        NP: {<DT|PRP\$>?<JJ>*<NN.*>+|<PRP>|<NNP>+}
        VP: {<VB.*>}
    """
    
    parser = RegexpParser(grammar)
    tagged = pos_tag(word_tokenize(sentence))
    tree = parser.parse(tagged)
    
    chunks = []
    for subtree in tree:
        if isinstance(subtree, Tree):
            chunk_type = subtree.label()
            chunk_text = ' '.join(w for w, t in subtree.leaves())
            chunks.append((chunk_type, chunk_text))
    
    # Simple heuristic: first NP = subject, VP = verb, second NP = object
    nps = [text for ctype, text in chunks if ctype == 'NP']
    vps = [text for ctype, text in chunks if ctype == 'VP']
    
    return {
        'subject': nps[0] if len(nps) > 0 else None,
        'verb': vps[0] if len(vps) > 0 else None,
        'object': nps[1] if len(nps) > 1 else None,
    }

In [13]:
sentences = [
    "The cat chased the mouse.",
    "John loves pizza.",
    "The happy children played games.",
    "Scientists discovered a new planet.",
    "She wrote an interesting book.",
]

print("Subject-Verb-Object Extraction")
print("=" * 60)

for sent in sentences:
    svo = extract_subject_verb_object(sent)
    print(f"\n{sent}")
    print(f"  Subject: {svo['subject']}")
    print(f"  Verb:    {svo['verb']}")
    print(f"  Object:  {svo['object']}")

Subject-Verb-Object Extraction

The cat chased the mouse.
  Subject: The cat
  Verb:    chased
  Object:  the mouse

John loves pizza.
  Subject: John
  Verb:    loves
  Object:  pizza

The happy children played games.
  Subject: The happy children
  Verb:    played
  Object:  games

Scientists discovered a new planet.
  Subject: Scientists
  Verb:    discovered
  Object:  a new planet

She wrote an interesting book.
  Subject: She
  Verb:    wrote
  Object:  an interesting book


## 9.9 Chunk Parser Class

Creating a **reusable class** for chunking provides several benefits:

1. **Encapsulation**: Grammar and parsing logic in one place
2. **Convenience methods**: Easy access to specific chunk types
3. **Extensibility**: Easy to add new methods or grammars
4. **Consistency**: Same parsing behavior across your application

### Design Considerations

When building a chunking utility:
- Provide **sensible defaults** but allow customization
- Include methods for **common use cases** (get NPs, get VPs)
- Consider **caching** parsed results for performance
- Handle **edge cases** (empty text, no matches)

In [14]:
class ChunkExtractor:
    """Reusable chunk extraction utility"""
    
    DEFAULT_GRAMMAR = r"""
        NP: {<DT|PRP\$>?<JJ>*<NN.*>+}
        VP: {<VB.*>+<RB>?}
        PP: {<IN><DT>?<JJ>*<NN.*>+}
    """
    
    def __init__(self, grammar=None):
        self.grammar = grammar or self.DEFAULT_GRAMMAR
        self.parser = RegexpParser(self.grammar)
    
    def parse(self, text):
        """Parse text and return tree"""
        tagged = pos_tag(word_tokenize(text))
        return self.parser.parse(tagged)
    
    def extract_all(self, text):
        """Extract all chunks as dict"""
        tree = self.parse(text)
        chunks = {}
        
        for subtree in tree.subtrees():
            if subtree.label() != 'S':
                ctype = subtree.label()
                ctext = ' '.join(w for w, t in subtree.leaves())
                
                if ctype not in chunks:
                    chunks[ctype] = []
                chunks[ctype].append(ctext)
        
        return chunks
    
    def get_noun_phrases(self, text):
        """Get noun phrases only"""
        chunks = self.extract_all(text)
        return chunks.get('NP', [])
    
    def get_verb_phrases(self, text):
        """Get verb phrases only"""
        chunks = self.extract_all(text)
        return chunks.get('VP', [])
    
    def get_prep_phrases(self, text):
        """Get prepositional phrases only"""
        chunks = self.extract_all(text)
        return chunks.get('PP', [])

In [15]:
# Use the class
chunker = ChunkExtractor()

text = "The young scientist conducted experiments in the modern laboratory."

print(f"Text: {text}\n")

print(f"Noun Phrases: {chunker.get_noun_phrases(text)}")
print(f"Verb Phrases: {chunker.get_verb_phrases(text)}")
print(f"Prep Phrases: {chunker.get_prep_phrases(text)}")

print(f"\nAll chunks:")
for ctype, chunks in chunker.extract_all(text).items():
    print(f"  {ctype}: {chunks}")

Text: The young scientist conducted experiments in the modern laboratory.

Noun Phrases: ['The young scientist', 'experiments', 'the modern laboratory']
Verb Phrases: ['conducted']
Prep Phrases: []

All chunks:
  NP: ['The young scientist', 'experiments', 'the modern laboratory']
  VP: ['conducted']


## Summary

### Chunk Grammar Quick Reference

| Syntax | Meaning | Example |
|--------|---------|---------|
| `{pattern}` | Chunk (include) | `{<DT><NN>}` |
| `}pattern{` | Chink (exclude) | `}<IN>{` |
| `<TAG>` | Match exact tag | `<NN>` |
| `<TAG.*>` | Match tag prefix | `<NN.*>` matches NN, NNS, NNP |
| `<TAG1\|TAG2>` | Match either tag | `<NN\|NNS>` |
| `?` | Optional (0 or 1) | `<DT>?` |
| `*` | Zero or more | `<JJ>*` |
| `+` | One or more | `<NN>+` |

### Common Chunk Patterns

```python
# Noun phrase - most common pattern
NP: {<DT|PRP\$>?<JJ>*<NN.*>+}

# Verb phrase - captures verb sequences
VP: {<VB.*>+}

# Prepositional phrase - preposition followed by NP
PP: {<IN><DT>?<JJ>*<NN.*>+}

# Chunk everything, then exclude verbs/prepositions
NP: {<.*>+}
    }<VB.*|IN>{
```

### Key Takeaways

1. **Chunking is shallow parsing** - identifies phrases without full syntactic analysis
2. **POS tags are essential** - chunking operates on tagged text
3. **Grammar rules use regex-like syntax** - `?`, `*`, `+` work as expected
4. **Chinking removes elements** - use reversed braces `}pattern{`
5. **Rules must be on separate lines** - when combining chunk and chink rules
6. **Order matters** - define base chunks before compound chunks

### When to Use Chunking

✅ **Good for:**
- Information extraction
- Named entity recognition preprocessing
- Simple phrase identification
- Text summarization
- Building search indexes

❌ **Not ideal for:**
- Complex grammatical analysis
- Handling long-distance dependencies
- Parsing nested structures
- Languages with free word order

### Next Steps

- **Section 10**: N-grams and Language Models
- **Section 13**: Sentiment Analysis (uses chunking for feature extraction)
- **Section 14**: Text Classification (chunking as preprocessing)