<a href="https://colab.research.google.com/github/Orefle2003/AnswerTime-MetricNLP/blob/model-experiments-1/distilbert_statistical_sentiment_analysis_v1.4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install spacy transformers torch
!pip install datasets
!python -m spacy download en_core_web_sm



Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [None]:
!pip install numpy==1.24.3


Collecting numpy==1.24.3
  Downloading numpy-1.24.3-cp39-cp39-win_amd64.whl.metadata (5.6 kB)
Downloading numpy-1.24.3-cp39-cp39-win_amd64.whl (14.9 MB)
   ---------------------------------------- 0.0/14.9 MB ? eta -:--:--
   -- ------------------------------------- 1.0/14.9 MB 5.0 MB/s eta 0:00:03
   ----- ---------------------------------- 2.1/14.9 MB 5.1 MB/s eta 0:00:03
   ------- -------------------------------- 2.6/14.9 MB 4.7 MB/s eta 0:00:03
   --------- ------------------------------ 3.4/14.9 MB 4.3 MB/s eta 0:00:03
   ----------- ---------------------------- 4.5/14.9 MB 4.5 MB/s eta 0:00:03
   -------------- ------------------------- 5.5/14.9 MB 4.6 MB/s eta 0:00:03
   ----------------- ---------------------- 6.6/14.9 MB 4.7 MB/s eta 0:00:02
   -------------------- ------------------- 7.6/14.9 MB 4.7 MB/s eta 0:00:02
   ----------------------- ---------------- 8.7/14.9 MB 4.8 MB/s eta 0:00:02
   -------------------------- ------------- 9.7/14.9 MB 4.8 MB/s eta 0:00:02
   ----

  You can safely remove it manually.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
blis 1.0.1 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.24.3 which is incompatible.
thinc 8.3.2 requires numpy<2.1.0,>=2.0.0; python_version >= "3.9", but you have numpy 1.24.3 which is incompatible.

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


# Model 1

Basic NLP Distil-Bert model using "finetuned-sst-2-english" finetuning with NER (Named Entity Recognition) to calculate rudimentary sentiment distribution

In [None]:
from transformers import pipeline
import spacy
import pandas as pd
from datasets import load_dataset
from collections import defaultdict

# Load spaCy model for Named Entity Recognition
nlp = spacy.load('en_core_web_sm')

# Load a fine-tuned sentiment analysis model for better predictions
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", truncation=True)

# Load a slightly larger sample of the Yelp reviews dataset for testing
dataset = load_dataset("yelp_polarity", split="test[:200]")  # Increased dataset size to 200 reviews
reviews = pd.DataFrame(dataset)
documents = reviews['text'].tolist()

# Function to truncate long documents to the maximum length of 512 tokens
def truncate_document(doc, max_length=512):
    return ' '.join(doc.split()[:max_length])

# Apply truncation to all documents to ensure they are within the model's limit
truncated_documents = [truncate_document(doc) for doc in documents]

# Function to extract relevant entities (e.g., restaurant names)
def extract_relevant_entities(documents):
    entities = set()
    for doc in documents:
        spacy_doc = nlp(doc)
        for ent in spacy_doc.ents:
            # Only include entities such as organizations, places, etc.
            if ent.label_ in ['ORG', 'GPE', 'LOC']:
                entities.add(ent.text.lower())
    return entities

# Extract entities from the documents
relevant_entities = extract_relevant_entities(truncated_documents)
print("\nExtracted Entities:", relevant_entities)

# Function to analyze sentiment for each entity
def analyze_entity_sentiment(documents, entities):
    entity_sentiment_counts = defaultdict(lambda: {'positive': 0, 'negative': 0, 'neutral': 0, 'total': 0})

    for doc in documents:
        for entity in entities:
            if entity in doc.lower():  # Check if the entity is mentioned in the document
                sentiment = sentiment_model(doc[:512])  # Truncate to model's max input size
                label = sentiment[0]['label']
                entity_sentiment_counts[entity]['total'] += 1
                if label == 'POSITIVE':
                    entity_sentiment_counts[entity]['positive'] += 1
                elif label == 'NEGATIVE':
                    entity_sentiment_counts[entity]['negative'] += 1
                else:
                    entity_sentiment_counts[entity]['neutral'] += 1

    # Calculate and print the percentage of positive mentions for each entity
    for entity, counts in entity_sentiment_counts.items():
        if counts['total'] > 0:
            positive_pct = (counts['positive'] / counts['total']) * 100
            print(f"{positive_pct:.2f}% of reviewers mentioned '{entity}' positively out of {counts['total']} mentions.")
        else:
            print(f"No mentions of '{entity}' found.")

# Run the sentiment analysis function
analyze_entity_sentiment(truncated_documents, relevant_entities)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.



Extracted Entities: {'pennsylvania', "max's allegheny tavern", 'the port authority', 'california', 'seattle', 'soho', 'totally', 'funds availability policy', 'ct', 'hmmmmm', "men's", 'the original fish market', 'the pittsburgh international airport', 'shopping.\\n\\nwe', 'the cultural district', "the people's special biryani", 'new haven', 'waterfront', 'penzeys', 'shadyside', 'crab & tomato', 'dairy queen', 'reyna foods', 'san diego', 'oakland', 'betos', 'izzazu', 'imho', 'the red alert', 'blues', 'mcn', 'ktm', 'nyc', 'rock bottom brewery', 'tomaso', 'front desk staff', 'port authority', 'piggy', 'un-clean', 'carnegie mellon university', 'mxc', '2011.\\n\\nthe university of pittsburgh', 'd&b', 'atm', 'awhile ago &', 'potato gnocchi', 'byob', 'olive garden', 'brighton heights', 'pgh', 'zero', 'dq', 'yelp', 'squirrel hill', 'the fish sandwich &', 'oclv', 'primanti', 'lol', 'us', 'korean bbq sauce &', 'steeler', 'koh samui thailand', 'immediately', 'hint', 'caramel', 'la prima', '\\""sp

# model 2
Entity-Aware Sentiment Analyzer (EASA)

The model focuses on extracting relevant entities and analyses their sentiment.

In [None]:
from transformers import pipeline
import spacy
import pandas as pd
from datasets import load_dataset
from collections import defaultdict

# Load spaCy model for Named Entity Recognition
nlp = spacy.load('en_core_web_sm')

# Load a fine-tuned sentiment analysis model for better predictions
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", truncation=True)

# Load a slightly larger sample of the Yelp reviews dataset for testing
dataset = load_dataset("yelp_polarity", split="test[:1000]")  # Increased dataset size to 200 reviews
reviews = pd.DataFrame(dataset)
documents = reviews['text'].tolist()

# Function to truncate long documents to the maximum length of 512 tokens
def truncate_document(doc, max_length=512):
    return ' '.join(doc.split()[:max_length])

# Apply truncation to all documents to ensure they are within the model's limit
truncated_documents = [truncate_document(doc) for doc in documents]

# Function to extract relevant entities with refined filters
def extract_relevant_entities(documents):
    entities = defaultdict(int)  # Track entity frequencies
    for doc in documents:
        spacy_doc = nlp(doc)
        for ent in spacy_doc.ents:
            # Include only organizations (ORG), geopolitical entities (GPE), and locations (LOC)
            if ent.label_ in ['ORG', 'GPE', 'LOC']:
                entity_text = ent.text.strip().lower()
                # Filter out irrelevant entities
                if len(entity_text.split()) > 1 or entity_text.istitle():  # Allow multi-word or title-case entities
                    entities[entity_text] += 1

    # Apply additional filtering: exclude low-frequency and irrelevant entities
    filtered_entities = {entity for entity, count in entities.items() if count > 1}  # Only keep entities mentioned > 1
    return filtered_entities

# Extract entities from the documents
relevant_entities = extract_relevant_entities(truncated_documents)
print("\nFiltered Entities:", relevant_entities)

# Function to analyze sentiment for each entity
def analyze_entity_sentiment(documents, entities):
    entity_sentiment_counts = defaultdict(lambda: {'positive': 0, 'negative': 0, 'neutral': 0, 'total': 0})

    for doc in documents:
        for entity in entities:
            if entity in doc.lower():  # Check if the entity is mentioned in the document
                sentiment = sentiment_model(doc[:512])  # Truncate to model's max input size
                label = sentiment[0]['label']
                entity_sentiment_counts[entity]['total'] += 1
                if label == 'POSITIVE':
                    entity_sentiment_counts[entity]['positive'] += 1
                elif label == 'NEGATIVE':
                    entity_sentiment_counts[entity]['negative'] += 1
                else:
                    entity_sentiment_counts[entity]['neutral'] += 1

    # Calculate and print the percentage of positive mentions for each entity
    for entity, counts in entity_sentiment_counts.items():
        if counts['total'] > 0:
            positive_pct = (counts['positive'] / counts['total']) * 100
            print(f"{positive_pct:.2f}% of reviewers mentioned '{entity}' positively out of {counts['total']} mentions.")
        else:
            print(f"No mentions of '{entity}' found.")

# Run the sentiment analysis function
analyze_entity_sentiment(truncated_documents, relevant_entities)


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.



Filtered Entities: {'the home depot', 'the capital brewery', 'zimbrick vw', 'la tolteca', 'el nopalito', 'home depot', 'the phoenix zoo', 'time warner', 'port authority', 'caring hands', 'southern california', 'athenian express', 'rock bottom', 'the queen city', 'super 8', "tully's ii", 'university city', 'capital grille', 'cooperstown sports grill', 'time warner cable', 'us airways center', 'cracker barrel', 'taco bell', "common ground's", 'the us airways center', 'us airways', 'front row', 'whole foods', 'gabriel brothers', 'south hills', 'penn mac', 'hyde park', 'china palace', 'north carolina', 'thai house', 'alice springs chicken', 'big burrito', 'wiener schnitzel', 'san diego', '4th ward', 'common ground', 'the phoenix art museum', 'giant eagle', 'new york', 'the pear and gorgonzola'}
100.00% of reviewers mentioned 'gabriel brothers' positively out of 1 mentions.
66.67% of reviewers mentioned 'rock bottom' positively out of 3 mentions.
66.67% of reviewers mentioned 'whole foods'

# Model 3

Topic Modeling with Sentiment Mapping
Method: Use a topic modeling library like BERTopic to extract topics discussed in the reviews and map sentiment scores to each topic.
Goal: Provide insights such as:
"Topic A (service) is mentioned positively in 70% of reviews."
"Topic B (food quality) has a 40% negative sentiment."

In [None]:
from transformers import pipeline
from bertopic import BERTopic
from sklearn.feature_extraction.text import CountVectorizer
from datasets import load_dataset
import pandas as pd
from collections import defaultdict
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

# Load fine-tuned sentiment analysis model
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", truncation=True)

# Load Yelp reviews dataset
dataset = load_dataset("yelp_polarity", split="test[:200]")  # Use a subset of the dataset for testing
reviews = pd.DataFrame(dataset)
documents = reviews['text'].tolist()

# Truncate documents to fit model's token limit
def truncate_document(doc, max_length=512):
    return ' '.join(doc.split()[:max_length])

truncated_documents = [truncate_document(doc) for doc in documents]

# Generate a dynamic stop-word list based on term frequency
from collections import Counter
all_words = ' '.join(truncated_documents).lower().split()
most_common_words = [word for word, _ in Counter(all_words).most_common(50)]  # Top 50 most frequent words
dynamic_stop_words = list(set(most_common_words + list(ENGLISH_STOP_WORDS)))  # Convert to list

# Generate topics using BERTopic
vectorizer_model = CountVectorizer(stop_words=dynamic_stop_words, ngram_range=(1, 3))
topic_model = BERTopic(vectorizer_model=vectorizer_model, nr_topics="auto")  # Automatically reduce noise in topics
topics, probs = topic_model.fit_transform(truncated_documents)

# Map sentences to topics for detailed sentiment analysis
sentences_per_topic = defaultdict(list)
for doc, topic in zip(truncated_documents, topics):
    if topic != -1:  # Exclude outliers
        sentences_per_topic[topic].append(doc)

# Perform sentiment analysis and aggregate scores by topic
topic_sentiment = defaultdict(lambda: {"positive": 0, "negative": 0, "total": 0})

for topic, sentences in sentences_per_topic.items():
    for sentence in sentences:
        sentiment = sentiment_model(sentence[:512])  # Truncate to model's max input size
        label = sentiment[0]['label']
        topic_sentiment[topic]["total"] += 1
        if label == "POSITIVE":
            topic_sentiment[topic]["positive"] += 1
        elif label == "NEGATIVE":
            topic_sentiment[topic]["negative"] += 1

# Print insights for each topic
for topic, sentiment in topic_sentiment.items():
    total = sentiment["total"]
    if total > 0:
        positive_pct = (sentiment["positive"] / total) * 100
        negative_pct = (sentiment["negative"] / total) * 100
        print(f"Topic {topic}:")
        print(f"  - Positive Sentiment: {positive_pct:.2f}%")
        print(f"  - Negative Sentiment: {negative_pct:.2f}%")
        print(f"  - Example Keywords: {topic_model.get_topic(topic)}")
        print()


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Topic 0:
  - Positive Sentiment: 55.17%
  - Negative Sentiment: 44.83%
  - Example Keywords: [('time', 0.011660498826360156), ('great', 0.010151014846837175), ('restaurant', 0.009931514282906272), ('service', 0.009089217054508936), ('got', 0.00894784924488207), ('pittsburgh', 0.008762045898978698), ('didnt', 0.008695396582672058), ('best', 0.008246705358259936), ('really', 0.007844646413896067), ('dont', 0.007775306383641421)]

Topic 1:
  - Positive Sentiment: 58.33%
  - Negative Sentiment: 41.67%
  - Example Keywords: [('cut', 0.04372502113793714), ('hair', 0.029751338797600876), ('said', 0.02462031972409449), ('wasnt', 0.022532779472775342), ('went', 0.018493102473054294), ('helmet', 0.01788457257315994), ('jacket', 0.01788457257315994), ('dont', 0.017132090067817372), ('ended', 0.017108257845638093), ('oh', 0.015509589846997074)]



#Model 4
Comparison and Benchmark Analysis


Perform comparative analysis across subsets of text to extract metrics that highlight differences in sentiment, frequency, or themes. This can provide insights into how entities, time periods, or categories compare against each other.

In [None]:
from transformers import pipeline
from datasets import load_dataset
import spacy
from collections import defaultdict, Counter
import pandas as pd

# Load fine-tuned sentiment analysis model
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", truncation=True)

# Load spaCy model for Named Entity Recognition (NER)
nlp = spacy.load("en_core_web_sm")

# Load Yelp reviews dataset
dataset = load_dataset("yelp_polarity", split="test[:200]")  # Subset for testing
reviews = pd.DataFrame(dataset)
documents = reviews['text'].tolist()

# Function to truncate long documents to fit the model's limit
def truncate_document(doc, max_length=512):
    return ' '.join(doc.split()[:max_length])

truncated_documents = [truncate_document(doc) for doc in documents]

# Function to extract all entities and their frequency
def extract_entities(documents):
    entity_counts = Counter()
    entity_context = defaultdict(list)  # Store the sentences where entities occur
    for doc in documents:
        spacy_doc = nlp(doc)
        for ent in spacy_doc.ents:
            if ent.label_ in ["ORG", "GPE"]:  # Focus on organizations and locations
                entity_counts[ent.text.lower()] += 1
                entity_context[ent.text.lower()].append(doc)
    return entity_counts, entity_context

# Extract entities and their contexts
entity_counts, entity_context = extract_entities(truncated_documents)

# Filter entities dynamically based on frequency (e.g., >2 mentions)
relevant_entities = {entity: docs for entity, count in entity_counts.items() if count > 2}

# Analyze sentiment for each relevant entity
group_sentiment = defaultdict(lambda: {"positive": 0, "negative": 0, "neutral": 0, "total": 0})

for entity, docs in relevant_entities.items():
    for doc in entity_context[entity]:
        sentiment = sentiment_model(doc[:512])  # Truncate to model's max input size
        label = sentiment[0]['label']
        group_sentiment[entity]["total"] += 1
        if label == "POSITIVE":
            group_sentiment[entity]["positive"] += 1
        elif label == "NEGATIVE":
            group_sentiment[entity]["negative"] += 1
        else:
            group_sentiment[entity]["neutral"] += 1

# Print comparative insights
for entity, sentiment in group_sentiment.items():
    total = sentiment["total"]
    if total > 0:
        positive_pct = (sentiment["positive"] / total) * 100
        negative_pct = (sentiment["negative"] / total) * 100
        print(f"Entity: {entity.capitalize()}")
        print(f"  - Positive Sentiment: {positive_pct:.2f}%")
        print(f"  - Negative Sentiment: {negative_pct:.2f}%")
        print(f"  - Total Mentions: {total}")
        print()


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Entity: Waterfront
  - Positive Sentiment: 66.67%
  - Negative Sentiment: 33.33%
  - Total Mentions: 3

Entity: Pittsburgh
  - Positive Sentiment: 51.16%
  - Negative Sentiment: 48.84%
  - Total Mentions: 43

Entity: D&b
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 4

Entity: Bbb
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 4

Entity: Oakland
  - Positive Sentiment: 33.33%
  - Negative Sentiment: 66.67%
  - Total Mentions: 3

Entity: Whole foods
  - Positive Sentiment: 100.00%
  - Negative Sentiment: 0.00%
  - Total Mentions: 6

Entity: Casbah
  - Positive Sentiment: 100.00%
  - Negative Sentiment: 0.00%
  - Total Mentions: 7

Entity: Atm
  - Positive Sentiment: 100.00%
  - Negative Sentiment: 0.00%
  - Total Mentions: 3

Entity: Wiener schnitzel
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 3

Entity: Really
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Tot

# Model 4

In [None]:
from transformers import pipeline
from datasets import load_dataset
import spacy
from collections import defaultdict, Counter
import numpy as np  # Correctly import NumPy
import pandas as pd


# Load fine-tuned sentiment analysis model
sentiment_model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", truncation=True)

# Load spaCy model for Named Entity Recognition (NER)
nlp = spacy.load("en_core_web_sm")

# Load Yelp reviews dataset
dataset = load_dataset("yelp_polarity", split="test[:200]")  # Subset for testing
reviews = pd.DataFrame(dataset)
documents = reviews['text'].tolist()

# Function to truncate long documents to fit the model's limit
def truncate_document(doc, max_length=512):
    return ' '.join(doc.split()[:max_length])

truncated_documents = [truncate_document(doc) for doc in documents]

# Function to extract entities from the dataset
def extract_entities(documents):
    entity_context = defaultdict(list)  # Store the sentences where entities occur
    for doc in documents:
        spacy_doc = nlp(doc)
        for ent in spacy_doc.ents:
            if ent.label_ in ["ORG", "GPE"]:  # Focus on organizations and locations
                entity_context[ent.text.lower()].append(doc)
    return entity_context

# Extract entities and their contexts
entity_context = extract_entities(truncated_documents)

# Perform sentiment analysis for each entity
entity_sentiment = defaultdict(lambda: {"positive": 0, "negative": 0, "neutral": 0, "total": 0, "score": 0})

for entity, docs in entity_context.items():
    for doc in docs:
        sentiment = sentiment_model(doc[:512])  # Truncate to model's max input size
        label = sentiment[0]['label']
        entity_sentiment[entity]["total"] += 1
        if label == "POSITIVE":
            entity_sentiment[entity]["positive"] += 1
            entity_sentiment[entity]["score"] += 1  # Assign +1 for positive sentiment
        elif label == "NEGATIVE":
            entity_sentiment[entity]["negative"] += 1
            entity_sentiment[entity]["score"] -= 1  # Assign -1 for negative sentiment

# Calculate overall sentiment metrics for anomaly detection
scores = [data["score"] / data["total"] for data in entity_sentiment.values()]
mean_score = np.mean(scores)
std_dev_score = np.std(scores)

# Adjust z-score threshold for sensitivity
sensitivity_threshold = 0.05  # Lower for higher sensitivity

# Detect anomalies based on adjusted z-scores
anomalies = {}
for entity, data in entity_sentiment.items():
    if data["total"] > 1:  # Ensure sufficient data for the entity
        entity_avg_score = data["score"] / data["total"]
        z_score = (entity_avg_score - mean_score) / std_dev_score
        if abs(z_score) > sensitivity_threshold:  # Lower threshold for higher sensitivity
            anomalies[entity] = {
                "z_score": z_score,
                "positive": (data["positive"] / data["total"]) * 100,
                "negative": (data["negative"] / data["total"]) * 100,
                "total_mentions": data["total"]
            }

# Print anomalies
print("Detected Anomalies:")
for entity, data in anomalies.items():
    print(f"Entity: {entity.capitalize()}")
    print(f"  - Z-Score: {data['z_score']:.2f}")
    print(f"  - Positive Sentiment: {data['positive']:.2f}%")
    print(f"  - Negative Sentiment: {data['negative']:.2f}%")
    print(f"  - Total Mentions: {data['total_mentions']}")
    print()


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Detected Anomalies:
Entity: Pittsburgh
  - Z-Score: -0.08
  - Positive Sentiment: 51.16%
  - Negative Sentiment: 48.84%
  - Total Mentions: 43

Entity: D&b
  - Z-Score: -1.14
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 4

Entity: Unos
  - Z-Score: 0.93
  - Positive Sentiment: 100.00%
  - Negative Sentiment: 0.00%
  - Total Mentions: 2

Entity: Hint
  - Z-Score: -1.14
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 2

Entity: Bbb
  - Z-Score: -1.14
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 5

Entity: Arl
  - Z-Score: -1.14
  - Positive Sentiment: 0.00%
  - Negative Sentiment: 100.00%
  - Total Mentions: 2

Entity: Whole foods
  - Z-Score: 0.93
  - Positive Sentiment: 100.00%
  - Negative Sentiment: 0.00%
  - Total Mentions: 5

Entity: Giant eagle
  - Z-Score: -0.10
  - Positive Sentiment: 50.00%
  - Negative Sentiment: 50.00%
  - Total Mentions: 2

Entity: Pennsylvania
  - Z-S

# Features to Implement

*   Advanced Topic Modeling
*   Outlier Detection
*   Reviewer Demographics Analysis
*   Comparative Analysis
*   Emotion Detection
*   Word Cloud Visualization
*   Contextual Sentiment
*   Entity Mention Heatmaps
*   Temporal Trends

*   Statistical Co-occurrence Analysis
*   Frequency Analysis for Common Themes
*   Distribution of Ratings
*   Key Phrase and Adjective Extraction
*   Net Promoter Insights
*   Percentages of Neutral Sentiments
*   Correlation Between Sentiment and Length
*   Keyword Trends
*   Predictive Insights
