# Interactive Parts of Speech (POS) Tagging Workshop

Welcome to this interactive workshop on Parts of Speech (POS) Tagging in Natural Language Processing! This notebook will guide you through the fundamentals and advanced concepts of POS tagging, with hands-on examples and interactive visualizations.

## Table of Contents

1. [Introduction](#introduction)
2. [Basic Implementation](#basic)
3. [Interactive Concept Explanation](#interactive)
4. [Advanced Implementation](#advanced)
5. [Data Flow Visualization](#visualization)
6. [Challenges & Edge Cases](#challenges)
7. [Conclusion & Further Reading](#conclusion)

## Setup and Imports

In [5]:
# Import required libraries
import nltk
import spacy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact, interactive, fixed, widgets
from IPython.display import display, HTML
import torch
from transformers import pipeline

# Download required NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker')
nltk.download('words')

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Set style for visualizations
plt.style.use('dark_background')
sns.set_palette('husl')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package words is already up-to-date!


<a id='introduction'></a>
## 1. Introduction

Parts of Speech (POS) tagging is a fundamental task in Natural Language Processing that involves assigning grammatical categories (like noun, verb, adjective, etc.) to each word in a sentence. This process is crucial for understanding the syntactic structure of text and is used as a preprocessing step in many NLP applications.

### Key Concepts

1. **Parts of Speech Categories**
   - Nouns (NN): names of people, places, things, or ideas
   - Verbs (VB): actions or states of being
   - Adjectives (JJ): words that describe nouns
   - Adverbs (RB): words that modify verbs, adjectives, or other adverbs
   - Pronouns (PRP): words that replace nouns
   - Prepositions (IN): words that show relationships between words
   - Conjunctions (CC): words that connect words or phrases
   - Determiners (DT): words that introduce nouns

2. **Tagging Techniques**
   - Rule-based tagging
   - Statistical tagging (Hidden Markov Models)
   - Deep Learning-based tagging

### Real-World Applications

1. **Syntactic Analysis**
   - Understanding sentence structure
   - Identifying grammatical relationships
   - Building parse trees

2. **Named Entity Recognition**
   - Identifying proper nouns
   - Extracting named entities
   - Entity classification

3. **Sentiment Analysis**
   - Identifying opinion-bearing words
   - Understanding context
   - Analyzing sentiment patterns

4. **Machine Translation**
   - Understanding word roles
   - Maintaining grammatical structure
   - Handling word order differences

### Example Use Cases

1. **Chatbots**
   - Understanding user intent
   - Generating grammatically correct responses
   - Handling complex queries

2. **Grammar Checking**
   - Identifying grammatical errors
   - Suggesting corrections
   - Improving writing quality

3. **Financial Text Analysis**
   - Extracting key financial terms
   - Understanding market sentiment
   - Analyzing financial reports

Let's start by implementing a basic POS tagger to understand the fundamentals!

<a id='basic'></a>
## 2. Basic Implementation

Let's implement a basic POS tagger from scratch to understand the fundamental concepts. We'll create a simple rule-based tagger that uses pattern matching and basic linguistic rules.

In [7]:
def basic_pos_tagger(word):
    """A simple rule-based POS tagger"""
    # Basic rules for POS tagging
    if word.endswith(('ing', 'ed', 's')):
        return 'VB'
    elif word.endswith(('ly')):
        return 'RB'
    elif word.endswith(('able', 'ible', 'ful', 'less', 'ous')):
        return 'JJ'
    elif word.endswith(('tion', 'sion', 'ment', 'ness')):
        return 'NN'
    elif word.lower() in ['the', 'a', 'an']:
        return 'DT'
    elif word.lower() in ['and', 'or', 'but']:
        return 'CC'
    elif word.lower() in ['in', 'on', 'at', 'to', 'for', 'of', 'with']:
        return 'IN'
    else:
        return 'NN'  # Default to noun

def compare_taggers(text):
    """Compare our basic tagger with NLTK and spaCy"""
    # Tokenize the text
    words = nltk.word_tokenize(text)
    
    # Get tags from different taggers
    basic_tags = [basic_pos_tagger(word) for word in words]
    nltk_tags = nltk.pos_tag(words)  # This returns a list of (word, tag) tuples
    
    # Process with spaCy
    doc = nlp(text)
    spacy_tags = [token.pos_ for token in doc]
    
    # Create comparison DataFrame
    df = pd.DataFrame({
        'Word': words,
        'Basic Tagger': basic_tags,
        'NLTK Tagger': [tag for word, tag in nltk_tags],  # Fixed unpacking
        'spaCy Tagger': spacy_tags
    })
    
    # Display results
    print(f"\nInput text: '{text}'\n")
    print("POS Tagging Results:")
    display(df)
    
    # Calculate accuracy (compared to NLTK as reference)
    basic_accuracy = sum(1 for i in range(len(words)) 
                        if basic_tags[i] == nltk_tags[i][1]) / len(words)
    spacy_accuracy = sum(1 for i in range(len(words)) 
                        if spacy_tags[i] == nltk_tags[i][1]) / len(words)
    
    print(f"\nAccuracy (compared to NLTK):")
    print(f"Basic Tagger: {basic_accuracy:.2%}")
    print(f"spaCy Tagger: {spacy_accuracy:.2%}")

# Create interactive widget
text_input = widgets.Textarea(
    value='The quick brown fox jumps over the lazy dog.',
    placeholder='Enter text to analyze...',
    description='Text:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(compare_taggers, text=text_input)

interactive(children=(Textarea(value='The quick brown fox jumps over the lazy dog.', description='Text:', layo…

<function __main__.compare_taggers(text)>

### Understanding the Basic Implementation

Our basic POS tagger uses pattern matching to identify parts of speech based on word patterns and endings. Here's how it works:

1. **Pattern Matching**
   - Uses regular expressions to match word patterns
   - Assigns POS tags based on matching patterns
   - Handles common word endings and forms

2. **Limitations**
   - Cannot handle irregular forms
   - May miss context-dependent meanings
   - Limited to basic patterns

3. **Comparison with NLTK**
   - Shows accuracy compared to professional tagger
   - Highlights areas for improvement
   - Demonstrates the complexity of POS tagging

Try entering different sentences to see how our basic tagger performs compared to NLTK's more sophisticated implementation!

<a id='interactive'></a>
## 3. Interactive Concept Explanation

Let's explore how POS tagging works on different sentences through interactive visualizations. We'll create tools to help you understand the tagging process and see how different words are categorized.

In [8]:
def visualize_pos_tags(sentence):
    """Create an interactive visualization of POS tags"""
    # Get POS tags from NLTK
    tokens = nltk.word_tokenize(sentence)
    pos_tags = nltk.pos_tag(tokens)
    
    # Create a color-coded visualization
    plt.figure(figsize=(12, 6))
    
    # Define colors for different POS categories
    pos_colors = {
        'NN': '#FF9999',  # Nouns
        'VB': '#66B2FF',  # Verbs
        'JJ': '#99FF99',  # Adjectives
        'RB': '#FFCC99',  # Adverbs
        'PRP': '#FF99CC', # Pronouns
        'IN': '#99CCFF',  # Prepositions
        'CC': '#FFB366',  # Conjunctions
        'DT': '#FF99FF',  # Determiners
        'OTHER': '#CCCCCC' # Other categories
    }
    
    # Create bars for each word
    words = [word for word, _ in pos_tags]
    tags = [tag for _, tag in pos_tags]
    
    # Get colors for each tag
    colors = [pos_colors.get(tag[:2], pos_colors['OTHER']) for tag in tags]
    
    # Create bar plot
    bars = plt.bar(range(len(words)), [1] * len(words), color=colors)
    
    # Customize the plot
    plt.xticks(range(len(words)), words, rotation=45, ha='right')
    plt.yticks([])
    plt.title('POS Tag Visualization')
    
    # Add legend
    legend_elements = [plt.Rectangle((0,0),1,1, facecolor=color, label=pos)
                      for pos, color in pos_colors.items()]
    plt.legend(handles=legend_elements, loc='upper right')
    
    plt.tight_layout()
    plt.show()
    
    # Display detailed tag information
    print("\nDetailed POS Tag Information:")
    for word, tag in pos_tags:
        print(f"{word}: {tag}")

# Create interactive widget
sentence_input = widgets.Textarea(
    value='The quick brown fox jumps over the lazy dog.',
    placeholder='Enter a sentence for POS visualization...',
    description='Sentence:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(visualize_pos_tags, sentence=sentence_input)

interactive(children=(Textarea(value='The quick brown fox jumps over the lazy dog.', description='Sentence:', …

<function __main__.visualize_pos_tags(sentence)>

### Understanding the Visualization

The interactive visualization above shows how different words in a sentence are tagged with their parts of speech. Here's what the colors represent:

1. **Nouns (Red)**
   - Names of people, places, things, or ideas
   - Can be singular or plural
   - Can be proper or common

2. **Verbs (Blue)**
   - Actions or states of being
   - Can be in different tenses
   - Can be main verbs or auxiliaries

3. **Adjectives (Green)**
   - Words that describe nouns
   - Can be comparative or superlative
   - Can be attributive or predicative

4. **Adverbs (Orange)**
   - Words that modify verbs, adjectives, or other adverbs
   - Often end in '-ly'
   - Can indicate manner, time, place, or degree

5. **Pronouns (Pink)**
   - Words that replace nouns
   - Can be personal, possessive, or demonstrative
   - Help avoid repetition

6. **Prepositions (Light Blue)**
   - Words that show relationships between words
   - Often indicate location, direction, or time
   - Form prepositional phrases

7. **Conjunctions (Light Orange)**
   - Words that connect words or phrases
   - Can be coordinating or subordinating
   - Help create complex sentences

8. **Determiners (Purple)**
   - Words that introduce nouns
   - Include articles and quantifiers
   - Help specify which noun is being referred to

Try entering different sentences to see how the POS tags change and how different words are categorized!

<a id='advanced'></a>
## 4. Advanced Implementation with Libraries

Now that we understand the basics, let's explore advanced POS tagging using popular NLP libraries. We'll compare different tagging methods and analyze their performance.

In [9]:
def compare_pos_taggers(text):
    """Compare different POS tagging methods"""
    # Tokenize the text
    tokens = nltk.word_tokenize(text)
    
    # Get tags from different taggers
    nltk_tags = nltk.pos_tag(tokens)
    spacy_doc = nlp(text)
    spacy_tags = [(token.text, token.pos_) for token in spacy_doc]
    
    # Create a DataFrame for comparison
    df = pd.DataFrame({
        'Word': [word for word, _ in nltk_tags],
        'NLTK Tag': [tag for _, tag in nltk_tags],
        'spaCy Tag': [tag for word, tag in spacy_tags]
    })
    
    # Display results
    print(f"\nInput text: '{text}'\n")
    print("POS Tagging Results:")
    display(df)
    
    # Calculate tag distribution
    print("\nTag Distribution:")
    nltk_dist = pd.Series([tag for _, tag in nltk_tags]).value_counts()
    spacy_dist = pd.Series([tag for _, tag in spacy_tags]).value_counts()
    
    plt.figure(figsize=(12, 6))
    plt.subplot(1, 2, 1)
    nltk_dist.plot(kind='bar')
    plt.title('NLTK Tag Distribution')
    plt.xticks(rotation=45)
    
    plt.subplot(1, 2, 2)
    spacy_dist.plot(kind='bar')
    plt.title('spaCy Tag Distribution')
    plt.xticks(rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # Performance comparison
    import time
    
    def measure_performance(text, n_runs=100):
        """Measure performance of different taggers"""
        tokens = nltk.word_tokenize(text)
        
        # NLTK performance
        nltk_start = time.time()
        for _ in range(n_runs):
            nltk.pos_tag(tokens)
        nltk_time = (time.time() - nltk_start) / n_runs
        
        # spaCy performance
        spacy_start = time.time()
        for _ in range(n_runs):
            nlp(text)
        spacy_time = (time.time() - spacy_start) / n_runs
        
        return {
            'NLTK': nltk_time,
            'spaCy': spacy_time
        }
    
    # Measure performance
    perf_results = measure_performance(text)
    
    print("\nPerformance Comparison (average time per run):")
    for tagger, time_taken in perf_results.items():
        print(f"{tagger}: {time_taken*1000:.2f} ms")

# Create interactive widget
text_input = widgets.Textarea(
    value='The quick brown fox jumps over the lazy dog.',
    placeholder='Enter text for advanced POS tagging...',
    description='Text:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(compare_pos_taggers, text=text_input)

interactive(children=(Textarea(value='The quick brown fox jumps over the lazy dog.', description='Text:', layo…

<function __main__.compare_pos_taggers(text)>

### Understanding the Advanced Implementation

This section demonstrates advanced POS tagging using popular NLP libraries. Here's what we're comparing:

1. **NLTK Tagger**
   - Uses the Perceptron Tagger
   - Provides detailed Penn Treebank tags
   - Good for general-purpose tagging

2. **spaCy Tagger**
   - Part of a complete NLP pipeline
   - Uses a statistical model
   - Provides universal POS tags

3. **Performance Analysis**
   - Compares processing speed
   - Shows tag distribution
   - Highlights differences in tagging approaches

4. **Key Differences**
   - Tag granularity (NLTK more detailed)
   - Processing speed (spaCy generally faster)
   - Integration with other NLP tasks

Try entering different texts to see how the taggers perform and compare their results!

<a id='visualization'></a>
## 5. Data Flow Visualization

Let's visualize how text flows through the POS tagging process, from tokenization to parsing. We'll create interactive visualizations to help understand the relationships between words and their grammatical roles.

In [15]:
def visualize_nlp_pipeline(text):
    """Visualize the NLP pipeline with POS tagging"""
    # Process text with spaCy
    doc = nlp(text)
    
    # Create figure with subplots
    plt.figure(figsize=(15, 10))
    
    # 1. Token and POS Tag Visualization
    plt.subplot(2, 2, 1)
    tokens = [token.text for token in doc]
    pos_tags = [token.pos_ for token in doc]
    plt.bar(range(len(tokens)), [1] * len(tokens))
    plt.xticks(range(len(tokens)), tokens, rotation=45, ha='right')
    plt.yticks([])
    plt.title('Tokens and POS Tags')
    
    # Add POS tags above the bars
    for i, tag in enumerate(pos_tags):
        plt.text(i, 1.1, tag, ha='center', va='bottom')
    
    # 2. Dependency Tree Visualization
    plt.subplot(2, 2, 2)
    dep_labels = [token.dep_ for token in doc]
    dep_counts = pd.Series(dep_labels).value_counts()
    dep_counts.plot(kind='bar')
    plt.title('Dependency Distribution')
    plt.xticks(rotation=45)
    
    # 3. POS Tag Distribution
    plt.subplot(2, 2, 3)
    pos_counts = pd.Series(pos_tags).value_counts()
    pos_counts.plot(kind='bar')
    plt.title('POS Tag Distribution')
    plt.xticks(rotation=45)
    
    # 4. Word Relationships Network
    plt.subplot(2, 2, 4)
    import networkx as nx
    
    # Create a directed graph
    G = nx.DiGraph()
    
    # Add nodes and edges based on dependencies
    for token in doc:
        if token.dep_ != 'ROOT':
            G.add_edge(token.head.text, token.text, label=token.dep_)
    
    # Draw the graph
    pos = nx.spring_layout(G)
    nx.draw(G, pos, with_labels=True, node_color='lightblue', 
            node_size=2000, font_size=8, font_weight='bold')
    
    plt.title('Word Dependencies')
    
    plt.tight_layout()
    plt.show()
    
    # Print detailed analysis
    print("\nDetailed Analysis:")
    print("\n1. Sentence Structure:")
    for token in doc:
        if token.dep_ == 'ROOT':
            print(f"Root word: {token.text} ({token.pos_})")
    
    print("\n2. Key Phrases:")
    for chunk in doc.noun_chunks:
        print(f"Noun phrase: {chunk.text}")
    
    print("\n3. Named Entities:")
    for ent in doc.ents:
        print(f"{ent.text}: {ent.label_}")
    
    print("\n4. Dependencies:")
    for token in doc:
        if token.dep_ != 'ROOT':
            print(f"{token.text} -> {token.head.text} ({token.dep_})")

# Create interactive widget
text_input = widgets.Textarea(
    value='The beautiful cat gracefully jumped over the fence.',
    placeholder='Enter text to visualize...',
    description='Text:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(visualize_nlp_pipeline, text=text_input)

interactive(children=(Textarea(value='The beautiful cat gracefully jumped over the fence.', description='Text:…

<function __main__.visualize_nlp_pipeline(text)>

### Understanding the Visualizations

The interactive visualization above shows four different aspects of the POS tagging process:

1. **Tokens and POS Tags**
   - Shows each word in the sentence
   - Displays its POS tag above
   - Helps understand word categorization

2. **Dependency Tree**
   - Shows grammatical relationships between words
   - Displays dependency labels
   - Illustrates sentence structure

3. **Tag Distribution**
   - Shows frequency of different POS tags
   - Helps understand tag patterns
   - Useful for analysis

4. **Word Relationship Network**
   - Visualizes word connections
   - Shows dependency structure
   - Helps understand sentence complexity

Try entering different sentences to see how the visualizations change and how different sentence structures are represented!

<a id='challenges'></a>
## 6. Challenges & Edge Cases

Let's explore common challenges in POS tagging and how different taggers handle edge cases. We'll look at ambiguous words, proper nouns, and domain-specific challenges.

In [16]:
def explore_challenges(text):
    """Explore POS tagging challenges and edge cases"""
    # Process text with different taggers
    doc = nlp(text)
    tokens = nltk.word_tokenize(text)
    nltk_tags = nltk.pos_tag(tokens)
    
    # Create a DataFrame for comparison
    df = pd.DataFrame({
        'Word': [token.text for token in doc],
        'NLTK Tag': [tag for word, tag in nltk_tags],
        'spaCy Tag': [token.pos_ for token in doc],
        'Dependency': [token.dep_ for token in doc]
    })
    
    # Display results
    print(f"\nInput text: '{text}'\n")
    print("POS Tagging Results:")
    display(df)
    
    # Identify potential challenges
    print("\nPotential Challenges:")
    
    # 1. Ambiguous words
    ambiguous_words = []
    for i, row in df.iterrows():
        if row['NLTK Tag'] != row['spaCy Tag']:
            ambiguous_words.append(row['Word'])
    
    if ambiguous_words:
        print("\n1. Ambiguous Words:")
        for word in ambiguous_words:
            print(f"- {word}: NLTK and spaCy disagree on its POS tag")
    
    # 2. Proper nouns
    proper_nouns = [word for word, tag in nltk_tags if tag.startswith('NNP')]
    if proper_nouns:
        print("\n2. Proper Nouns:")
        for word in proper_nouns:
            print(f"- {word}: Identified as a proper noun")
    
    # 3. Compound words
    compound_words = []
    for i, token in enumerate(doc):
        if token.dep_ == 'compound':
            compound_words.append((token.text, token.head.text))
    
    if compound_words:
        print("\n3. Compound Words:")
        for word, head in compound_words:
            print(f"- {word} + {head}: Compound noun")
    
    # 4. Domain-specific terms
    domain_terms = []
    for token in doc:
        if token.pos_ == 'NOUN' and token.text.isupper():
            domain_terms.append(token.text)
    
    if domain_terms:
        print("\n4. Domain-Specific Terms:")
        for term in domain_terms:
            print(f"- {term}: Possible domain-specific term")
    
    # Create visualizations
    plt.figure(figsize=(12, 6))
    
    # 1. Tag agreement visualization
    plt.subplot(1, 2, 1)
    agreement = [1 if row['NLTK Tag'] == row['spaCy Tag'] else 0 for _, row in df.iterrows()]
    plt.bar(range(len(agreement)), agreement)
    plt.title('Tag Agreement')
    plt.xticks(range(len(df['Word'])), df['Word'], rotation=45, ha='right')
    plt.yticks([0, 1], ['Disagree', 'Agree'])
    
    # 2. Dependency visualization
    plt.subplot(1, 2, 2)
    dep_counts = df['Dependency'].value_counts()
    dep_counts.plot(kind='bar')
    plt.title('Dependency Distribution')
    plt.xticks(rotation=45)
    
    plt.tight_layout()
    plt.show()

# Create interactive widget
text_input = widgets.Textarea(
    value='The CEO of Apple Inc. announced a new iPhone model.',
    placeholder='Enter text to explore challenges...',
    description='Text:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(explore_challenges, text=text_input)

interactive(children=(Textarea(value='The CEO of Apple Inc. announced a new iPhone model.', description='Text:…

<function __main__.explore_challenges(text)>

### Understanding the Challenges

This section explores common challenges in POS tagging:

1. **Ambiguous Words**
   - Words that can have multiple POS tags
   - Context-dependent meanings
   - Different interpretations by different taggers

2. **Proper Nouns**
   - Names of people, organizations, places
   - Often capitalized
   - Can be compound or multi-word

3. **Compound Words**
   - Multi-word expressions
   - Special grammatical structures
   - Domain-specific terminology

4. **Domain-Specific Terms**
   - Technical vocabulary
   - Industry-specific terms
   - Abbreviations and acronyms

Try entering different types of text to see how the taggers handle these challenges!

<a id='conclusion'></a>
## 7. Conclusion & Further Reading

In this workshop, we've explored the fundamentals and advanced concepts of Parts of Speech (POS) tagging. Let's summarize what we've learned and look at where to go next.

In [18]:
def summarize_workshop(text):
    """Summarize the key concepts learned in the workshop"""
    # Process text with different taggers
    doc = nlp(text)
    tokens = nltk.word_tokenize(text)
    nltk_tags = nltk.pos_tag(tokens)
    
    # Create summary DataFrame
    df = pd.DataFrame({
        'Word': [token.text for token in doc],
        'POS Tag': [tag for word, tag in nltk_tags],
        'Dependency': [token.dep_ for token in doc],
        'Lemma': [token.lemma_ for token in doc]
    })
    
    # Display results
    print(f"\nFinal Analysis of: '{text}'\n")
    print("Complete POS Analysis:")
    display(df)
    
    # Create visualizations
    plt.figure(figsize=(15, 5))
    
    # 1. POS Tag Distribution
    plt.subplot(1, 3, 1)
    pos_counts = df['POS Tag'].value_counts()
    pos_counts.plot(kind='bar')
    plt.title('POS Tag Distribution')
    plt.xticks(rotation=45)
    
    # 2. Dependency Distribution
    plt.subplot(1, 3, 2)
    dep_counts = df['Dependency'].value_counts()
    dep_counts.plot(kind='bar')
    plt.title('Dependency Distribution')
    plt.xticks(rotation=45)
    
    # 3. Word vs Lemma Length
    plt.subplot(1, 3, 3)
    df['Word Length'] = df['Word'].str.len()
    df['Lemma Length'] = df['Lemma'].str.len()
    plt.scatter(df['Word Length'], df['Lemma Length'])
    plt.title('Word vs Lemma Length')
    plt.xlabel('Word Length')
    plt.ylabel('Lemma Length')
    
    plt.tight_layout()
    plt.show()
    
    # Print summary statistics
    print("\nSummary Statistics:")
    print(f"Total words: {len(tokens)}")
    print(f"Unique POS tags: {len(pos_counts)}")
    print(f"Unique dependencies: {len(dep_counts)}")
    
    # Print key insights using iloc to avoid FutureWarning
    print("\nKey Insights:")
    print("1. Most common POS tags:", pos_counts.index[0], "(", pos_counts.iloc[0], "occurrences)")
    print("2. Most common dependency:", dep_counts.index[0], "(", dep_counts.iloc[0], "occurrences)")
    print("3. Average word length:", df['Word Length'].mean())
    print("4. Average lemma length:", df['Lemma Length'].mean())

# Create interactive widget
text_input = widgets.Textarea(
    value='The quick brown fox jumps over the lazy dog.',
    placeholder='Enter text for final analysis...',
    description='Text:',
    style={'description_width': 'initial'},
    layout={'width': '80%', 'height': '100px'}
)

interact(summarize_workshop, text=text_input)

interactive(children=(Textarea(value='The quick brown fox jumps over the lazy dog.', description='Text:', layo…

<function __main__.summarize_workshop(text)>

### Key Takeaways

1. **Fundamental Concepts**
   - Understanding POS tagging basics
   - Different tagging approaches
   - Rule-based vs. statistical methods

2. **Advanced Techniques**
   - Deep learning-based tagging
   - Context-aware tagging
   - Handling edge cases

3. **Practical Applications**
   - Text preprocessing
   - Information extraction
   - Natural language understanding

### Further Reading

1. **Academic Papers**
   - "Deep Learning for POS Tagging" by Collobert et al.
   - "BERT: Pre-training of Deep Bidirectional Transformers" by Devlin et al.
   - "Universal Dependencies" by Nivre et al.

2. **Online Resources**
   - [NLTK Documentation](https://www.nltk.org/)
   - [spaCy Documentation](https://spacy.io/)
   - [Stanford NLP Group](https://nlp.stanford.edu/)

3. **Related Topics**
   - Named Entity Recognition (NER)
   - Dependency Parsing
   - Constituency Parsing
   - Semantic Role Labeling

### Practice Exercises

1. **Basic Exercises**
   - Implement a custom POS tagger
   - Compare different tagging methods
   - Analyze tag distributions

2. **Advanced Challenges**
   - Handle domain-specific text
   - Implement context-aware tagging
   - Build a custom tagger for specific languages

3. **Real-World Projects**
   - Build a grammar checker
   - Create a text summarizer
   - Develop a question-answering system

### Next Steps

1. **Advanced Topics**
   - Deep learning for POS tagging
   - Multilingual POS tagging
   - Domain-specific tagging

2. **Practical Applications**
   - Text classification
   - Sentiment analysis
   - Machine translation

3. **Research Areas**
   - Novel tagging architectures
   - Cross-lingual transfer learning
   - Zero-shot POS tagging

Remember to experiment with different texts and analyze how POS tagging behaves in various contexts. The more you practice, the better you'll understand the nuances of natural language processing!