# Interactive Named Entity Recognition for Financial Document Analysis

Welcome to this interactive workshop on Named Entity Recognition (NER) focused on financial document analysis. This notebook will guide you through the fundamentals and advanced concepts of NER, with hands-on examples and interactive visualizations specifically tailored for financial text.

## Table of Contents

1. [Introduction](#introduction)
2. [Basic Implementation from Scratch](#basic)
3. [Interactive Concept Explanation](#interactive)
4. [Advanced Implementation with Libraries](#advanced)
5. [Data Flow Visualization](#visualization)
6. [User Interaction & Visualization](#user-interaction)
7. [Challenges & Edge Cases](#challenges)
8. [Conclusion & Further Reading](#conclusion)

## Setup and Imports

In [22]:
# Import required libraries
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import spacy
from spacy import displacy
import nltk
from nltk import ne_chunk
from nltk.chunk import conlltags2tree, tree2conlltags
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual, Layout,widgets
from IPython.display import display, HTML, Markdown,clear_output
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import string
import warnings
warnings.filterwarnings('ignore')

# Download required NLTK data
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('maxent_ne_chunker_tab')
  

# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Set style for visualizations
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Custom CSS for highlighting entities
def apply_custom_styles():
    display(HTML("""
    <style>
    .entity-PERSON { background: #ffccd5; border-radius: 3px; padding: 0 3px; }
    .entity-ORG { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
    .entity-GPE { background: #c1ffba; border-radius: 3px; padding: 0 3px; }
    .entity-LOC { background: #d6ffba; border-radius: 3px; padding: 0 3px; }
    .entity-MONEY { background: #ffe5a8; border-radius: 3px; padding: 0 3px; }
    .entity-TIME { background: #e5ceff; border-radius: 3px; padding: 0 3px; }
    .entity-DATE { background: #cecdff; border-radius: 3px; padding: 0 3px; }
    .entity-PERCENT { background: #bbffee; border-radius: 3px; padding: 0 3px; }
    .entity-CARDINAL { background: #eeedff; border-radius: 3px; padding: 0 3px; }
    .entity-TICKER { background: #ffbadd; border-radius: 3px; padding: 0 3px; }
    .entity-COMPANY { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
    .entity-PRODUCT { background: #bbcefb; border-radius: 3px; padding: 0 3px; }
    .entity-QUANTITY { background: #e5ffbb; border-radius: 3px; padding: 0 3px; }
    </style>
    """))

apply_custom_styles()

# Sample financial texts to use throughout the notebook
sample_texts = {
    "earnings_report": "Apple Inc. (AAPL) reported Q2 earnings of $1.52 per share, beating estimates by $0.15. Revenue was $97.3 billion, up 9% year-over-year. CEO Tim Cook mentioned strong iPhone sales in emerging markets.",
    "financial_news": "Microsoft (MSFT) stock rose 3.2% to $245.67 after announcing plans to invest $10 billion in OpenAI on January 15, 2023. The tech giant expects the partnership to generate significant revenue by 2025.",
    "sec_filing": "According to the 10-K filing, Tesla Inc. (TSLA) increased R&D spending to $3.1 billion in 2022, representing 5.7% of the total revenue. The company plans to launch new products in Q3 2023.",
    "market_analysis": "The S&P 500 fell 1.2% yesterday, with energy stocks like Exxon Mobil (XOM) and Chevron (CVX) dropping over 3%. The Federal Reserve's decision to maintain interest rates at 5.25% influenced market sentiment.",
    "complex_example": "In Q2 2023, Amazon.com Inc. (AMZN) acquired Zoox for $1.2 billion while reporting earnings of $2.63 per share. The deal closed on August 12, 2023, when AMZN was trading at $135.28."
}

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /Users/samarmohanty/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker_tab.zip.


<a id='introduction'></a>
## 1. Introduction

Named Entity Recognition (NER) is a subtask of information extraction that seeks to identify and classify named entities in text into predefined categories such as names of persons, organizations, locations, monetary values, percentages, and time expressions.

### Importance in Financial Data Processing

In the financial domain, NER plays a crucial role in:

1. **Automated data extraction** from financial reports, SEC filings, and news
2. **Investment research** by identifying key companies, people, and financial metrics
3. **Risk assessment** through identification of entities associated with financial events
4. **Regulatory compliance** by extracting reportable information
5. **Market sentiment analysis** by linking sentiment to specific companies or financial instruments

### Real-World Use Cases

1. **Financial News Analysis**
   - Extract company names, ticker symbols, and financial metrics
   - Track merger and acquisition announcements
   - Monitor leadership changes and executive statements

2. **Earnings Reports Processing**
   - Identify revenue figures, EPS, and growth metrics
   - Extract forward-looking statements and guidance
   - Compare performance against analyst expectations

3. **SEC Filings Analysis**
   - Extract risk factors from 10-K reports
   - Identify related party transactions
   - Monitor insider trading activities
   - Track changes in business strategy and outlook

4. **Trading Algorithms**
   - Real-time processing of news for algorithmic trading
   - Event-driven trading based on corporate announcements
   - Sentiment-based trading strategies

### Key Concepts and Terminology

1. **Entity Types in Financial Context**
   - **ORGANIZATION**: Companies, institutions (e.g., Apple Inc., Federal Reserve)
   - **PERSON**: Executives, analysts, regulators (e.g., Jerome Powell, Elon Musk)
   - **MONEY**: Financial amounts (e.g., $1.2 million, €500,000)
   - **PERCENT**: Percentage values (e.g., 5.7%, 3.2%)
   - **DATE**: Time references (e.g., Q2 2023, January 15)
   - **TICKER**: Stock symbols (e.g., AAPL, MSFT)
   - **PRODUCT**: Financial products (e.g., mortgage-backed securities)
   - **LOCATION**: Geographic locations (e.g., Wall Street, Silicon Valley)

2. **Context-Aware Extraction**
   - Understanding the difference between "Apple" (company) and "apple" (fruit)
   - Distinguishing between uses of acronyms (e.g., "FB" as Facebook or feedback)
   - Resolving ambiguity in financial terminology

3. **NER Approaches**
   - **Rule-based**: Using patterns and dictionaries
   - **Statistical**: Machine learning models trained on labeled data
   - **Deep Learning**: Neural networks, particularly transformers
   - **Hybrid**: Combination of multiple approaches

Let's start by implementing a basic NER system for financial text from scratch!

<a id='basic'></a>
## 2. Basic Implementation from Scratch

Let's build a basic NER system for financial text using Python and regular expressions. This approach will help us understand the fundamental concepts before moving to more sophisticated methods.

In [19]:
def basic_financial_ner(text):
    """
    A simple rule-based NER function for financial text using regular expressions
    """
    entities = {
        'TICKER': [],
        'MONEY': [],
        'PERCENT': [],
        'DATE': [],
        'COMPANY': [],
        'PERSON': []
    }
    
    # Extract ticker symbols (uppercase letters in parentheses)
    ticker_pattern = r'\(([A-Z]{1,5})\)'
    tickers = re.findall(ticker_pattern, text)
    entities['TICKER'] = [(ticker, text.find(f'({ticker})')) for ticker in tickers]
    
    # Extract monetary values
    money_pattern = r'\$\d+(?:\.\d+)?(?:\s?(?:billion|million|thousand|B|M|K))?'
    money_values = re.findall(money_pattern, text)
    entities['MONEY'] = [(money, text.find(money)) for money in money_values]
    
    # Extract percentages
    percent_pattern = r'\d+(?:\.\d+)?%'
    percentages = re.findall(percent_pattern, text)
    entities['PERCENT'] = [(percent, text.find(percent)) for percent in percentages]
    
    # Extract dates and time references
    date_pattern = r'(?:Q[1-4]\s?(?:20)?\d{2})|(?:January|February|March|April|May|June|July|August|September|October|November|December)\s\d{1,2},?\s\d{4}'
    dates = re.findall(date_pattern, text)
    entities['DATE'] = [(date, text.find(date)) for date in dates]
    
    # Extract potential company names (simple heuristic - words ending with Inc., Corp., Ltd.)
    company_pattern = r'([A-Z][a-zA-Z\.\s]+(?:Inc\.|Corp\.|Ltd\.|LLC|Group|Company))'
    companies = re.findall(company_pattern, text)
    entities['COMPANY'] = [(company.strip(), text.find(company)) for company in companies]
    
    # Extract potential person names (simple heuristic - title followed by capitalized words)
    person_pattern = r'(?:Mr\.|Mrs\.|Ms\.|Dr\.|CEO|CFO|CTO)\s([A-Z][a-z]+(?:\s[A-Z][a-z]+){1,2})'
    people = re.findall(person_pattern, text)
    entities['PERSON'] = [(person.strip(), text.find(f"CEO {person}" if "CEO" in text else person)) 
                          for person in people]
    
    return entities

def highlight_entities(text, entities):
    """
    Highlight entities in the text using HTML spans
    """
    # Sort all entities by their position in text to handle overlaps
    all_entities = []
    for entity_type, entity_list in entities.items():
        for entity_text, entity_pos in entity_list:
            all_entities.append((entity_text, entity_pos, entity_type))
    
    # Sort by position
    all_entities.sort(key=lambda x: x[1], reverse=True)
    
    # Apply highlighting from end to beginning to avoid position shifts
    highlighted_text = text
    for entity_text, entity_pos, entity_type in all_entities:
        if entity_pos != -1:  # Skip if entity position not found
            entity_end = entity_pos + len(entity_text)
            highlighted_text = (highlighted_text[:entity_pos] + 
                              f'<span class="entity-{entity_type}">{entity_text}</span>' + 
                              highlighted_text[entity_end:])
    
    return highlighted_text

# Create a function that both analyzes and displays the results
def analyze_and_display(text_input):
    # Clear previous output
    clear_output(wait=True)
    
    # Display the current input box
    display(text_input)
    
    # Get the current text
    text = text_input.value if hasattr(text_input, 'value') else text_input
    
    # Extract entities
    entities = basic_financial_ner(text)
    
    # Count entities by type
    entity_counts = {entity_type: len(entity_list) for entity_type, entity_list in entities.items() 
                    if len(entity_list) > 0}
    
    # Display highlighted text
    display(HTML("<h3>Extracted Entities:</h3>"))
    display(HTML(highlight_entities(text, entities)))
    
    # Display entity counts
    display(HTML("<h3>Entity Counts:</h3>"))
    for entity_type, count in entity_counts.items():
        display(HTML(f"<b>{entity_type}</b>: {count}"))
    
    # Display extracted entities
    display(HTML("<h3>Entity Details:</h3>"))
    for entity_type, entity_list in entities.items():
        if entity_list:
            display(HTML(f"<b>{entity_type}</b>: {', '.join([entity[0] for entity in entity_list])}"))
    
    # Create a DataFrame for entities
    entity_data = []
    for entity_type, entity_list in entities.items():
        for entity_text, _ in entity_list:
            entity_data.append({
                "Entity": entity_text,
                "Type": entity_type
            })
    
    if entity_data:
        df = pd.DataFrame(entity_data)
        
        # Plot entity distribution
        plt.figure(figsize=(10, 5))
        ax = sns.countplot(x="Type", data=df)
        plt.title("Distribution of Entity Types")
        plt.xlabel("Entity Type")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        for p in ax.patches:
            ax.annotate(f'{p.get_height()}', 
                        (p.get_x() + p.get_width() / 2., p.get_height()), 
                        ha = 'center', va = 'bottom')
        plt.tight_layout()
        plt.show()

# Sample financial texts
sample_texts = {
    "earnings_report": "Apple Inc. (AAPL) reported Q2 earnings of $1.52 per share, beating estimates by $0.15. Revenue was $97.3 billion, up 9% year-over-year. CEO Tim Cook mentioned strong iPhone sales in emerging markets.",
    "financial_news": "Microsoft (MSFT) stock rose 3.2% to $245.67 after announcing plans to invest $10 billion in OpenAI on January 15, 2023. The tech giant expects the partnership to generate significant revenue by 2025.",
    "sec_filing": "According to the 10-K filing, Tesla Inc. (TSLA) increased R&D spending to $3.1 billion in 2022, representing 5.7% of the total revenue. The company plans to launch new products in Q3 2023.",
    "market_analysis": "The S&P 500 fell 1.2% yesterday, with energy stocks like Exxon Mobil (XOM) and Chevron (CVX) dropping over 3%. The Federal Reserve's decision to maintain interest rates at 5.25% influenced market sentiment.",
    "complex_example": "In Q2 2023, Amazon.com Inc. (AMZN) acquired Zoox for $1.2 billion while reporting earnings of $2.63 per share. The deal closed on August 12, 2023, when AMZN was trading at $135.28."
}

# Custom CSS for highlighting entities
display(HTML("""
<style>
.entity-PERSON { background: #ffccd5; border-radius: 3px; padding: 0 3px; }
.entity-ORG { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
.entity-GPE { background: #c1ffba; border-radius: 3px; padding: 0 3px; }
.entity-LOC { background: #d6ffba; border-radius: 3px; padding: 0 3px; }
.entity-MONEY { background: #ffe5a8; border-radius: 3px; padding: 0 3px; }
.entity-TIME { background: #e5ceff; border-radius: 3px; padding: 0 3px; }
.entity-DATE { background: #cecdff; border-radius: 3px; padding: 0 3px; }
.entity-PERCENT { background: #bbffee; border-radius: 3px; padding: 0 3px; }
.entity-CARDINAL { background: #eeedff; border-radius: 3px; padding: 0 3px; }
.entity-TICKER { background: #ffbadd; border-radius: 3px; padding: 0 3px; }
.entity-COMPANY { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
.entity-PRODUCT { background: #bbcefb; border-radius: 3px; padding: 0 3px; }
.entity-QUANTITY { background: #e5ffbb; border-radius: 3px; padding: 0 3px; }
</style>
"""))

# Create dropdown for examples
examples = widgets.Dropdown(
    options=list(sample_texts.keys()),
    value='earnings_report',
    description='Example:',
    layout=Layout(width='50%')
)

# Create text area for custom input
text_input = widgets.Textarea(
    value=sample_texts['earnings_report'],
    placeholder='Enter financial text...',
    description='Text:',
    layout=Layout(width='90%', height='100px')
)

# Function to update text when dropdown changes
def update_text(change):
    text_input.value = sample_texts[change['new']]
    analyze_and_display(text_input)

# Register callback for dropdown
examples.observe(update_text, names='value')

# Display widgets
display(examples)
display(text_input)

# Use interact to make the text_input interactive
interact(analyze_and_display, text_input=text_input);

Dropdown(description='Example:', layout=Layout(width='50%'), options=('earnings_report', 'financial_news', 'se…

Textarea(value='Apple Inc. (AAPL) reported Q2 earnings of $1.52 per share, beating estimates by $0.15. Revenue…

interactive(children=(Textarea(value='Apple Inc. (AAPL) reported Q2 earnings of $1.52 per share, beating estim…

<a id='interactive'></a>
## 3. Interactive Concept Explanation

Now, let's delve deeper into NER for financial text by creating an interactive tool that allows us to understand the entity extraction process and experiment with different patterns common in financial documents.

In [20]:
# Financial NER patterns
financial_patterns = {
    'TICKER': {
        'pattern': r'\(([A-Z]{1,5})\)',
        'examples': ['AAPL', 'MSFT', 'TSLA', 'AMZN', 'GOOG'],
        'description': 'Stock ticker symbols usually appear as uppercase letters, often in parentheses after a company name.'
    },
    'MONEY': {
        'pattern': r'\$\d+(?:\.\d+)?(?:\s?(?:billion|million|thousand|B|M|K))?',
        'examples': ['$1.2 million', '$97.3 billion', '$500K', '$10B', '$2,500'],
        'description': 'Monetary values typically start with a currency symbol, followed by numbers and possibly scale indicators.'
    },
    'PERCENT': {
        'pattern': r'\d+(?:\.\d+)?%',
        'examples': ['3.2%', '5.7%', '10%', '0.5%', '100%'],
        'description': 'Percentage values consist of numbers followed by the percent symbol.'
    },
    'DATE': {
        'pattern': r'(?:Q[1-4]\s?(?:20)?\d{2})|(?:January|February|March|April|May|June|July|August|September|October|November|December)\s\d{1,2},?\s\d{4}',
        'examples': ['Q2 2023', 'January 15, 2023', 'Q4 2022', 'March 31, 2023', 'Q1 2024'],
        'description': 'Financial dates often refer to quarters (Q1-Q4) or specific dates relevant to financial events.'
    },
    'COMPANY': {
        'pattern': r'([A-Z][a-zA-Z\.\s]+(?:Inc\.|Corp\.|Ltd\.|LLC|Group|Company))',
        'examples': ['Apple Inc.', 'Microsoft Corp.', 'Tesla Inc.', 'Amazon.com Inc.', 'Alphabet Inc.'],
        'description': 'Company names typically start with capital letters and often end with corporate designations like Inc., Corp., or Ltd.'
    },
    'PERSON': {
        'pattern': r'(?:Mr\.|Mrs\.|Ms\.|Dr\.|CEO|CFO|CTO)\s([A-Z][a-z]+(?:\s[A-Z][a-z]+){1,2})',
        'examples': ['CEO Tim Cook', 'CFO Amy Hood', 'Dr. Lisa Su', 'Ms. Sarah Johnson', 'Mr. Jamie Dimon'],
        'description': 'Person names in financial contexts are often preceded by titles or roles (CEO, CFO, etc.) and consist of capitalized names.'
    }
}

# Custom CSS for highlighting entities
display(HTML("""
<style>
.entity-PERSON { background: #ffccd5; border-radius: 3px; padding: 0 3px; }
.entity-ORG { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
.entity-GPE { background: #c1ffba; border-radius: 3px; padding: 0 3px; }
.entity-LOC { background: #d6ffba; border-radius: 3px; padding: 0 3px; }
.entity-MONEY { background: #ffe5a8; border-radius: 3px; padding: 0 3px; }
.entity-TIME { background: #e5ceff; border-radius: 3px; padding: 0 3px; }
.entity-DATE { background: #cecdff; border-radius: 3px; padding: 0 3px; }
.entity-PERCENT { background: #bbffee; border-radius: 3px; padding: 0 3px; }
.entity-CARDINAL { background: #eeedff; border-radius: 3px; padding: 0 3px; }
.entity-TICKER { background: #ffbadd; border-radius: 3px; padding: 0 3px; }
.entity-COMPANY { background: #c3eeff; border-radius: 3px; padding: 0 3px; }
.entity-PRODUCT { background: #bbcefb; border-radius: 3px; padding: 0 3px; }
.entity-QUANTITY { background: #e5ffbb; border-radius: 3px; padding: 0 3px; }
</style>
"""))

# Function to highlight matches in text
def highlight_pattern_matches(text, pattern, entity_type):
    """Highlight matches for a specific pattern in the text"""
    highlighted_text = text
    
    # Handle TICKER pattern differently as it matches content in parentheses
    if entity_type == 'TICKER':
        matches = re.findall(pattern, text)
        for match in matches:
            actual_match = f"({match})"
            highlighted_text = highlighted_text.replace(
                actual_match, 
                f'<span class="entity-{entity_type}">{actual_match}</span>'
            )
    else:
        matches = re.findall(pattern, text)
        for match in matches:
            highlighted_text = highlighted_text.replace(
                match, 
                f'<span class="entity-{entity_type}">{match}</span>'
            )
    
    return highlighted_text, matches

# Function to display pattern explanation
def display_pattern_info(entity_type, text):
    # Clear previous output
    clear_output(wait=True)
    
    # Get pattern information
    pattern_info = financial_patterns[entity_type]
    
    # Display widgets
    display(entity_dropdown)
    display(text_input)
    
    # Display pattern information
    display(HTML(f"<h3>{entity_type} Entity Pattern</h3>"))
    display(HTML(f"<p><b>Regular Expression:</b> <code>{pattern_info['pattern']}</code></p>"))
    display(HTML(f"<p><b>Description:</b> {pattern_info['description']}</p>"))
    
    # Display examples
    display(HTML("<p><b>Examples:</b></p>"))
    examples_html = "<ul>"
    for example in pattern_info['examples']:
        examples_html += f'<li><span class="entity-{entity_type}">{example}</span></li>'
    examples_html += "</ul>"
    display(HTML(examples_html))
    
    # Apply pattern to text
    highlighted_text, matches = highlight_pattern_matches(
        text, pattern_info['pattern'], entity_type
    )
    
    # Display results
    display(HTML("<h4>Pattern Test Results:</h4>"))
    if len(matches) > 0:
        display(HTML(f"<p>Found {len(matches)} {entity_type} entities in the text:</p>"))
        display(HTML(f"<p>{highlighted_text}</p>"))
        
        # Display matches
        display(HTML("<p><b>Matches found:</b></p>"))
        matches_html = "<ul>"
        for match in matches:
            if entity_type == 'TICKER':
                match = f"({match})"
            matches_html += f'<li><span class="entity-{entity_type}">{match}</span></li>'
        matches_html += "</ul>"
        display(HTML(matches_html))
    else:
        display(HTML(f"<p>No {entity_type} entities found in the text.</p>"))

# Create interactive widgets for pattern explanation
entity_dropdown = widgets.Dropdown(
    options=list(financial_patterns.keys()),
    value='TICKER',
    description='Entity Type:',
    layout=Layout(width='50%')
)

# Create text area for custom input
text_input = widgets.Textarea(
    value=sample_texts['earnings_report'],
    placeholder='Enter financial text...',
    description='Text:',
    layout=Layout(width='90%', height='100px')
)

# Use interact to make it interactive
interact(display_pattern_info, entity_type=entity_dropdown, text=text_input);

# Function to visualize the extraction process
def visualize_extraction_process(text):
    """
    Visualize the step-by-step process of entity extraction
    """
    # Clear output
    clear_output(wait=True)
    
    # Redisplay input widget
    display(text_area)
    
    # Stage 1: Original Text
    display(HTML("<h3>NER Extraction Process</h3>"))
    display(HTML("<h4>Stage 1: Original Text</h4>"))
    display(HTML(f"<p>{text}</p>"))
    
    # Stage 2: Tokenization
    display(HTML("<h4>Stage 2: Tokenization</h4>"))
    tokens = text.split()
    tokenized_html = "<p>"
    for token in tokens:
        tokenized_html += f'<span style="border: 1px solid #ccc; margin: 2px; padding: 2px;">{token}</span> '
    tokenized_html += "</p>"
    display(HTML(tokenized_html))
    
    # Stage 3: Pattern Matching
    display(HTML("<h4>Stage 3: Pattern Matching</h4>"))
    
    # Process text with all patterns
    all_matches = {}
    for entity_type, pattern_info in financial_patterns.items():
        _, matches = highlight_pattern_matches(text, pattern_info['pattern'], entity_type)
        if matches:
            if entity_type == 'TICKER':
                matches = [f"({match})" for match in matches]
            all_matches[entity_type] = matches
    
    # Display matches for each pattern
    for entity_type, matches in all_matches.items():
        display(HTML(f"<p><b>{entity_type} matches:</b> {', '.join(matches)}</p>"))
    
    # Highlight all entities in the text
    highlighted_text = text
    for entity_type, pattern_info in financial_patterns.items():
        highlighted_text, _ = highlight_pattern_matches(
            highlighted_text, pattern_info['pattern'], entity_type
        )
    
    display(HTML("<p><b>Text with all entities highlighted:</b></p>"))
    display(HTML(f"<p>{highlighted_text}</p>"))
    
    # Stage 4: Entity Classification
    display(HTML("<h4>Stage 4: Entity Classification</h4>"))
    
    # Create a table of all entities
    if all_matches:
        entity_table = """
        <table style="border-collapse: collapse; width: 80%;">
          <tr>
            <th style="border: 1px solid #ddd; padding: 8px;">Entity</th>
            <th style="border: 1px solid #ddd; padding: 8px;">Type</th>
          </tr>
        """
        
        for entity_type, matches in all_matches.items():
            for match in matches:
                entity_table += f"""
                <tr>
                  <td style="border: 1px solid #ddd; padding: 8px;">{match}</td>
                  <td style="border: 1px solid #ddd; padding: 8px; background-color: var(--entity-{entity_type.lower()}-color);">{entity_type}</td>
                </tr>
                """
        
        entity_table += "</table>"
        display(HTML(entity_table))
    else:
        display(HTML("<p>No entities found in the text.</p>"))
    
    # Stage 5: Visualization
    display(HTML("<h4>Stage 5: Entity Distribution</h4>"))
    
    # Create bar chart of entity counts
    if all_matches:
        entity_counts = {entity_type: len(matches) for entity_type, matches in all_matches.items()}
        
        plt.figure(figsize=(10, 5))
        plt.bar(entity_counts.keys(), entity_counts.values())
        plt.title("Entity Type Distribution")
        plt.xlabel("Entity Type")
        plt.ylabel("Count")
        plt.xticks(rotation=45)
        
        # Add count labels on bars
        for i, (entity_type, count) in enumerate(entity_counts.items()):
            plt.text(i, count + 0.1, str(count), ha='center')
            
        plt.tight_layout()
        plt.show()

# Create text area for extraction process demo
text_area = widgets.Textarea(
    value=sample_texts['sec_filing'],
    placeholder='Enter text to analyze process...',
    description='Text:',
    layout=Layout(width='90%', height='100px')
)

# Use interact to make it interactive
interact(visualize_extraction_process, text=text_area);

interactive(children=(Dropdown(description='Entity Type:', layout=Layout(width='50%'), options=('TICKER', 'MON…

interactive(children=(Textarea(value='According to the 10-K filing, Tesla Inc. (TSLA) increased R&D spending t…