<a href="https://colab.research.google.com/github/Monsterp99/LLM-prototype/blob/main/Intent_Analyzer_w_AI_Capability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**MUVERA Semantic Analyzer ‚Äî Quick README**

A notebook-ready pipeline that scores page content against user intent and business messaging, then exports results to Excel.

**What it does**

- Loads page text from Excel.

- Builds TF-IDF embeddings ‚Üí reduces with SVD.

- Optional fast similarity via FAISS; otherwise cosine similarity.

- Detects primary intent (dental-implant domain signals).

- Scores business alignment (value props, personas, differentiators, pain-point solutions).

- Produces a single MUVERA score (0‚Äì1) and recommendations.

- Exports a ready-to-share XLSX.

**Setup**

1.  Run the provided cell (installs packages and defines everything).
2.  Put your input Excel where the notebook can read it.


Run

```
analyzer, results_df, detailed_results = run_muvera_from_excel('path/to/input.xlsx')
show_page_details(detailed_results, page_index=0)  # optional
```

**Input (Excel)**

- Required content column: first header containing content, text, body, or extracted (case-insensitive).

- URL column: any header containing url (optional; if missing, a synthetic URL is used).

*Minimal example*

url |	content

*Output*

- File: muvera_results/muvera_analysis_results.xlsx

*Sheets:*

- MUVERA Scores ‚Äî per page: url_short, muvera_score, primary_intent, intent_confidence, raw alignments for the 4 business groups, and issue counts.

- Intent Analysis ‚Äî primary intent + confidence.

- Recommendations ‚Äî prioritized actions (only if any exist).

- Console summary: pages analyzed, average score, high-priority issues.

**How scoring works (brief)**

Intent: compares your page embedding to averaged ‚Äúsignal‚Äù embeddings per intent; highest similarity = primary intent.

Business alignment: similarity to each business message group (value props, personas, differentiators, pain points) with weights.

Final score: linear blend of intent relevance (30%), business alignment (60%), and a small quality proxy (10%).

**Helpful snippets**

```
# Sort for lowest scores (quick wins)
results_df.sort_values('muvera_score').head(10)

# Access a page‚Äôs full result dict
page = detailed_results[0]
page['business_alignment'], page['recommendations']
```

**Customization (quick)**

- Change domain: edit self.intent_categories (signals) and self.business_context (core messages + weights) in MUVERASemanticAnalyzer.__init__.

- Tune weights: adjust the 0.3 / 0.6 / 0.1 blend in _calculate_muvera_score.

- Preprocessing: extend _clean_content to strip HTML/boilerplate.

**Troubleshooting**

- ‚ÄúNo content column found‚Äù: ensure a header includes content, text, body, or extracted.

- FAISS not available: safe to ignore; sklearn fallback is automatic.

- Very low scores: likely vocabulary mismatch; expand signals/messages to match your content language.

**Limits**

- Intent detection is similarity-based (no supervised training).

- Alignment measures messaging proximity, not factual correctness.

- Scores are most useful for comparative analysis within your corpus.


**What is [MUVERA](https://arxiv.org/abs/2405.19504)?**

MUVERA is a semantic analysis framework designed to evaluate how well a piece of content aligns with both user intent and business messaging.

It combines NLP techniques‚ÄîTF-IDF, SVD dimensionality reduction, and similarity modeling‚Äîto generate a single MUVERA Score (0‚Äì1) that reflects:

- How clearly the content matches specific user intents (e.g., cost, eligibility, recovery).

- How strongly it reinforces brand and business themes (value propositions, audience needs, differentiators).

The goal: to make content optimization measurable‚Äîso teams can identify which pages communicate intent effectively and which need refinement.

In [None]:
# QUICK SETUP: Copy and run this entire cell first
# This loads all the MUVERA functions you need

# Install required packages
!pip install scikit-learn faiss-cpu openpyxl -q

# Import all required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from collections import defaultdict, Counter
import warnings
import os
from pathlib import Path
warnings.filterwarnings('ignore')

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import TruncatedSVD
from sklearn.cluster import KMeans
import pickle

# Try to import FAISS
try:
    import faiss
    FAISS_AVAILABLE = True
    print("‚úÖ FAISS available for fast similarity search")
except ImportError:
    print("‚ö†Ô∏è  FAISS not available, using sklearn fallback")
    FAISS_AVAILABLE = False

class MUVERASemanticAnalyzer:
    """MUVERA-grade semantic analyzer"""

    def __init__(self):
        # Custom dental implant intent categories
        self.intent_categories = {
            'comparative_analysis_final_decision': {
                'description': 'User is in final-decision stage, actively comparing dental implants against alternatives.',
                'signals': ['compared to', 'technology advancements in oral surgery', 'early intervention oral surgery benefits', 'dental implants long-term oral health comparison', 'better value full mouth restoration', 'pros and cons single tooth implants', 'choosing denture vs single tooth implant', 'dental bridge vs full mouth implants', 'single tooth implant vs bridge cost effectiveness', 'pros and cons of dental implants', 'full mouth restoration vs partial treatments cost', 'functionality comparison full mouth restoration', 'oral surgery vs non surgical treatments', 'corporate vs private oral surgery outcomes', 'dental bridge longevity vs implant', 'dental bridge vs partial denture', 'dental implants vs removable dentures', 'full arch implants vs traditional dental implants', 'single tooth implant vs bridge cost', 'dental implants vs bridges vs dentures comparison', 'comparison', 'pros and cons of full mouth restoration', 'dental bridge vs implant cost comparison', 'pros and cons', 'aesthetic differences dental implants vs bridges', 'right choice full mouth restoration', 'surgical extraction vs root canal', 'long term benefits full mouth restoration', 'single tooth implant vs bridge', 'functionality difference bridge vs single tooth implant', 'dental bridge vs single tooth implant', 'single tooth implant vs partial denture', 'dental bridge vs maryland bridge', 'long term oral surgery success rates', 'choose dental bridge or implant', 'aesthetics difference bridge vs single tooth implant', 'better than', 'single tooth implant longevity comparison', 'implant supported bridge vs traditional', 'full mouth restoration vs dentures', 'dental bridge lifespan compared to alternatives', 'vs', 'versus', 'aesthetic improvements full mouth restoration', 'choosing between bridge and dental implant', 'local anesthesia vs sedation oral surgery', 'quality of life after full mouth restoration', 'pros and cons different oral surgery options', 'dental implants vs dentures cost comparison', 'best option dental implants', 'full mouth restoration vs individual repairs', 'cost differences oral surgery vs non surgical', 'bone grafting vs ridge expansion', 'longevity dental implants vs dentures', 'when single tooth implants best option', 'when dental bridge better than implant'],
                'business_weight': 0.143
            },
            'core_need_assessment': {
                'description': 'User is evaluating whether they genuinely need dental implants and if they\'re the right solution.',
                'signals': ['oral surgery dental implant placement', 'when to get a dental bridge', 'delaying tooth replacement risks', 'are dental implants worth it', 'full mouth restoration candidates', 'jawbone loss from missing tooth', 'oral surgery for tooth extractions', 'types of dental bridges', 'signs you need full mouth restoration', 'full mouth restoration severe dental problems', 'do I need', 'signs you need dental implants', 'for oral surgery need', 'are single tooth implants worth it', 'dental bridge bite improvement', 'dental implants preserve jawbone', 'implants vs bridges vs dentures', 'emotional benefits full mouth restoration', 'dental bridge oral health benefits', 'dental bridge candidacy', 'benefits of dental implants', 'missing teeth oral health effects', 'reasons to choose full mouth restoration', 'consequences of not replacing missing tooth', 'dental bridge function and appearance', 'oral surgery overall health benefits', 'full mouth restoration vs individual procedures', 'front tooth implant vs molar implant', 'tooth replacement', 'dental bridge vs no replacement', 'full mouth restoration for seniors', 'missing teeth', 'full mouth restoration health benefits', 'signs you need a dental bridge', 'dental implants vs dentures benefits', 'oral surgery facial trauma repair', 'dental implant candidacy', 'severe dental issues full mouth restoration', 'am I candidate', 'do nothing after tooth loss vs implant', 'full mouth restoration best option', 'candidates for oral surgery consultation', 'single tooth implant candidacy', 'single tooth implant preserves adjacent teeth', 'jaw correction oral surgery', 'replace missing teeth urgency', 'reasons for dental implant single tooth', 'fixed dental bridge benefits', 'reasons patients need oral surgery', 'signs you need oral surgery', 'impacted teeth oral surgery treatment', 'dental implants for seniors', 'why replace missing tooth', 'single dental implant oral health benefits'],
                'business_weight': 0.012
            },
            'cost_financing_considerations': {
                'description': 'User explores financial investment and payment options for implants.',
                'signals': ['dental implant financing options', 'dental implant cost', 'location based pricing single tooth implants', 'dental implant provider cost comparison', 'hidden costs oral surgery', 'hidden costs dental implants', 'single tooth implant insurance coverage', 'dental bridge pricing factors', 'traveling abroad for dental implants', 'dental implant cost factors', 'payment plans full mouth restoration', 'full mouth restoration insurance coverage', 'financing', 'payment plan', 'hsa fsa full mouth restoration', 'full mouth restoration average cost', 'oral surgery provider cost comparison', 'hidden costs dental bridge', 'cost', 'traveling for single tooth implant', 'dental bridge long-term value', 'dental bridge cost', 'full mouth restoration financing options', 'price', 'oral surgery pricing factors', 'long-term value of oral surgery treatments', 'oral surgery financing options', 'dental implants insurance coverage', 'hidden costs single tooth implants', 'oral surgery insurance coverage', 'dental bridge insurance coverage', 'travel for dental bridge cost savings', 'single tooth implant financing options', 'full mouth restoration investment benefits', 'payment plans for dental implants', 'how much', 'dental bridge financing options', 'full mouth restoration cost comparison', 'travel abroad full mouth restoration', 'hsa fsa funds for oral surgery', 'cost of oral surgery procedures', 'payment plans for oral surgery', 'hsa fsa dental bridge', 'dental bridge vs implant cost', 'hidden costs full mouth restoration', 'single tooth implant provider cost comparison', 'single tooth implant cost factors', 'dental bridge cost by location', 'long term cost savings full mouth restoration', 'traveling for oral surgery', 'single tooth implant average cost', 'hsa fsa single tooth implants', 'dental implant long term value', 'payment plans for single tooth implants', 'hsa fsa dental implants'],
                'business_weight': 0.093
            },
            'location_considerations': {
                'description': 'User seeks services within a convenient geographic area.',
                'signals': ['traveling vs local dental bridge care', 'comparing oral surgery reviews', 'reviews for single tooth implant providers', 'single tooth implant provider success rates', 'local vs travel single tooth implant', 'success rates full mouth restoration providers', 'best full mouth restoration dentist near me', 'private practice for single tooth implant', 'dental bridge success rates by provider', 'questions for dental bridge consultation', 'single tooth implant provider questions', 'success rates dental implant providers', 'proximity dental bridge follow-up care', 'experience matters full mouth restoration', 'choosing dental bridge center technology', 'appointments', 'technology at dental implant centers', 'private practice oral surgery benefits', 'best dentist for single tooth implant', 'local follow-up dental implants', 'private practice dental bridge benefits', 'questions for full mouth restoration providers', 'comparing dental implant reviews', 'private practice full mouth restoration', 'private vs corporate dental implant center', 'local dental bridge specialist', 'experience matters dental implant dentist', 'in my area', 'find dentist', 'questions to ask dental implant provider', 'comparing reviews full mouth restoration', 'experience matters for oral surgeon', 'verify dental bridge dentist experience', 'oral surgery success rates by provider', 'local follow-up after oral surgery', 'technology at oral surgery centers', 'implant technology differences single tooth', 'experienced dentist for single tooth implants', 'single tooth implant location cost differences', 'location', 'local vs travel oral surgery options', 'best dental bridge dentist near me', 'local vs travel dental implant provider', 'cost differences by location oral surgery', 'local vs travel full mouth restoration', 'best dental implant dentist near me', 'near me', 'best oral surgeon near me', 'dental bridge center technology advances', 'full mouth restoration cost differences by location', 'dental implant costs by location', 'technology at full mouth restoration centers', 'local follow-up full mouth restoration', 'questions to ask oral surgeon consultation', 'follow-up care after single tooth implant'],
                'business_weight': 0.55
            },
            'pre_procedure_evaluation_eligibility': {
                'description': 'User determines medical and dental eligibility for the procedure.',
                'signals': ['smoking impact on oral surgery', 'oral surgery patient evaluation', 'gum health bone health oral surgery', 'eligible', 'missing teeth surrounding stability', 'key questions full mouth restoration consultation', 'medical conditions affecting oral surgery', 'bone density and dental implants', 'preparing for full mouth restoration appointment', 'dental bridge consultation expectations', 'full mouth restoration medical conditions', 'preparing for oral surgery consultation', 'consultation', 'gum and bone health for single tooth implant', 'full mouth restoration consultation process', 'dental bridge candidacy evaluation', 'bone density', 'gum health and dental bridge eligibility', 'single tooth implant medical conditions', 'why dental implant surgery postponed', 'gum health before full mouth restoration', 'dental implant imaging and consultation', 'dental implant age requirements', 'dental bridge pre-treatment imaging', 'reasons for oral surgery postponement', 'dental bridge for seniors', 'pre-surgical imaging for oral surgery', 'single tooth implants for smokers', 'dental implant medical conditions', 'oral surgery for seniors', 'smoking and dental implants', 'single tooth implant consultation questions', 'bone health and full mouth restoration', 'qualify', 'bone grafting before oral surgery', 'single tooth implant disqualification reasons', 'single tooth implant consultation preparation', 'full mouth restoration bruxism patients', 'questions to ask oral surgeon', 'dental bridge with bone loss', 'dental implant evaluation process', 'smoking impact full mouth restoration', 'full mouth restoration disqualification reasons', 'questions for dental implant consultation', 'preparing for dental implant consultation', 'pre-implant imaging and evaluation', 'medical history', 'single tooth implant candidacy evaluation', 'questions to ask before dental bridge', 'full mouth restoration candidacy evaluation', 'dental bridge eligibility for smokers', 'dental bridge disqualification reasons', 'dental implant candidacy evaluation', 'single tooth implant age requirements', 'medical conditions dental bridge success'],
                'business_weight': 0.111
            },
            'procedure_clinical_information': {
                'description': 'User seeks detailed technical and clinical information about the procedure.',
                'signals': ['bone grafting full mouth restoration', 'correcting bite alignment oral surgery', 'minimal bone single tooth implants', 'oral surgery procedure steps overview', 'temporary options before single tooth implant', 'dental bridge timeline', 'oral surgery day expectations', 'same day dental bridge options', 'single tooth implant healing timeline', 'single vs multiple tooth implants', 'single tooth implant surgery day expectations', 'full mouth restoration procedure steps', '3d imaging for dental implants', 'digital dental bridge fitting technology', 'bone grafting for single tooth implants', 'full mouth restoration healing timeline', 'function and aesthetics full mouth restoration', 'single tooth implant procedure timeline', 'immediate load vs traditional single tooth implant', 'dental bridge placement on natural teeth', 'dental bridge materials explained', 'bite alignment for dental bridges', 'stages of full mouth restoration', 'sedation options full mouth restoration', 'temporary vs final restorations oral surgery', 'technology in modern oral surgery', 'immediate load vs traditional dental implants', 'anesthesia during dental bridge placement', 'oral surgery healing timeline', 'bone grafting during oral surgery', 'dental implant sedation options', 'single tooth implant procedure steps', 'dental implant surgery expectations', 'dental implant procedure steps', 'temporary options full mouth restoration', 'full mouth restoration surgery expectations', 'how it works', 'surgery', 'osseointegration', 'oral surgery recovery time by type', 'clinical details', 'bone grafting and dental implants', 'implant procedure', 'temporary crown after dental implant', 'immediate load vs staged full mouth restoration', 'anesthesia for single tooth implant surgery', 'dental implant healing timeline', 'temporary dental bridge expectations', 'digital technology single tooth implants', 'dental bridge appointment stages', 'immediate vs staged oral surgeries', 'dental bridge procedure steps', 'digital technology full mouth restoration', 'sedation options oral surgery', 'dental implant surgery follow-up'],
                'business_weight': 0.078
            },
            'recovery_longevity_aftercare': {
                'description': 'User is concerned about recovery, durability, and aftercare requirements.',
                'signals': ['tips for faster single tooth implant healing', 'caring for your mouth after oral surgery', 'signs of complications full mouth restoration', 'foods after dental bridge placement', 'first 24 hours after dental implant surgery', 'full mouth restoration longevity', 'dental bridge complications and solutions', 'single tooth implant lifespan', 'dental implant lifespan', 'extending dental bridge lifespan', 'full mouth restoration recovery timeline', 'best foods after oral surgery', 'signs of dental implant complications', 'pain management after dental implant surgery', 'signs of complications after single tooth implants', 'best foods after full mouth restoration', 'follow-up visits single tooth implant', 'single tooth implant annual checkups', 'dental implant recovery timeline', 'speeding up full mouth restoration recovery', 'how long', 'resuming activities after oral surgery', 'speeding up dental bridge healing', 'first 24 hours after oral surgery', 'pain management after full mouth restoration', 'long term care single tooth implant', 'healing', 'pain management after oral surgery', 'speeding up oral surgery healing', 'dental bridge discomfort management', 'recovery', 'dental bridge recovery timeline', 'oral surgery recovery timeline', 'dental implant annual checkups importance', 'speed up dental implant healing', 'dental implant long-term care', 'longevity', 'best foods after dental implant surgery', 'signs of oral surgery complications', 'first 24 hours after single tooth implant', 'single tooth implant recovery timeline', 'resuming activities after dental implant surgery', 'annual checkups after full mouth restoration', 'resuming activities after full mouth restoration', 'aftercare', 'first 24 hours after dental bridge placement', 'long-term dental bridge care', 'long-term follow-up after oral surgery', 'best foods after single tooth implant', 'managing pain after single tooth implant surgery', 'extend oral surgery success', 'care after full mouth restoration', 'signs you need to replace dental bridge', 'first 24 hours after full mouth restoration', 'dental bridge annual checkups'],
                'business_weight': 0.008
            },
            'reviews_provider_assessment': {
                'description': 'User evaluates professional qualifications of potential providers.',
                'signals': ['corporate vs private oral surgery', 'research single tooth implant providers', 'dental bridge provider credentials', 'poor full mouth restoration provider warning signs', 'corporate vs private practice full mouth restoration', 'oral surgeon credentials checklist', 'provider experience dental implant outcomes', 'communication single tooth implant provider', 'oral surgery specialist for complex cases', 'full mouth restoration provider credentials', 'questions for full mouth restoration dentist', 'comparing dental bridge providers', 'questions for dental bridge consultation', 'poor oral surgeon warning signs', 'researching full mouth restoration providers', 'corporate vs private single tooth implants', 'single tooth implant dentist questions', 'compare single tooth implant providers', 'dental bridge provider research tips', 'corporate vs private dental bridge providers', 'comparing full mouth restoration providers', 'experience', 'comparing dental implant providers', 'second opinion for dental implants', 'provider experience dental bridge outcomes', 'dental bridge specialist complex cases', 'experience impact oral surgery outcomes', 'second opinions for oral surgery', 'second opinion dental bridge treatment', 'single tooth implant provider credentials', 'dental implant provider credentials checklist', 'communication with dental implant providers', 'dental bridge provider red flags', 'single tooth implant provider red flags', 'second opinion full mouth restoration', 'provider experience full mouth restoration', 'questions for dental implant dentist', 'doctor credentials', 'second opinions for single tooth implants', 'provider experience single tooth implants', 'communication with dental bridge provider', 'specialists for complex full mouth restoration cases', 'compare oral surgery providers', 'specialist for complex single tooth implant', 'dental implant provider research tips', 'dental implant specialists complex cases', 'communication full mouth restoration provider', 'specialist', 'poor dental implant provider warning signs', 'corporate vs private practice dental implants', 'dentist qualifications', 'researching oral surgeons near me', 'oral surgeon communication importance', 'who', 'questions to ask oral surgeon consultation'],
                'business_weight': 0.108
            },
            'reviews_testimonials': {
                'description': 'User seeks validation via patient testimonials and before/after galleries.',
                'signals': ['dental implant success stories', 'patient questions after dental implants', 'single tooth implant healing patient experiences', 'common themes in full mouth restoration reviews', 'oral surgery healing patient experiences', 'patient questions after dental bridge', 'dental bridge positive experiences', 'dental bridge provider review influence', 'patient questions after oral surgery', 'patient concerns before oral surgery', 'dental implant recovery patient feedback', 'patient questions after full mouth restoration', 'patient concerns before full mouth restoration', 'positive experiences single tooth implants', 'dental implant patient testimonials', 'patient tips dental implant prep', 'tips from patients preparing for oral surgery', 'smile improvements from single tooth implant', 'dental bridge patient testimonials', 'success stories full mouth restoration', 'dental bridge impact eating and speaking', 'reviews', 'patient questions after single tooth implants', 'patient concerns before dental implants', 'full mouth restoration patient testimonials importance', 'positive experiences full mouth restoration', 'dental bridge patient advice', 'reading single tooth implant reviews', 'testimonials', 'patient advice before oral surgery', 'life change after full mouth restoration', 'common themes oral surgery reviews', 'positive oral surgery patient experiences', 'before and after', 'single tooth implant patient testimonials', 'patient concerns before single tooth implant', 'interpreting full mouth restoration reviews', 'dental bridge success stories', 'common themes in single tooth implant reviews', 'patient advice before dental implants', 'patient advice before full mouth restoration', 'patient concerns dental bridges', 'a dental implant reviews choosing provider', 'interpreting oral surgery reviews', 'reading dental bridge reviews', 'patient advice single tooth implants', 'single tooth implant process patient reviews', 'positive dental implant patient reviews', 'full mouth restoration patient testimonials', 'oral surgery success stories', 'common themes dental implant reviews', 'oral surgery patient testimonials', 'patient experiences', 'patient feedback dental bridge consultations'],
                'business_weight': 0.007
            }
        }

        # Business context vectors
        self.business_context = {
            'unique_value_props': {
                'core_messages': [
                    'permanent dental implant solution',
                    'same day teeth replacement',
                    'lifetime warranty coverage',
                    'FDA approved implant technology',
                    'experienced implant specialists',
                    'proprietary treatment methods'
                ],
                'weight': 0.3
            },
            'audience_personas': {
                'core_messages': [
                    'seniors with missing teeth',
                    'adults frustrated with dentures',
                    'people with dental anxiety',
                    'patients seeking permanent solution',
                    'individuals comparing implant options',
                    'budget conscious patients'
                ],
                'weight': 0.2
            },
            'competitive_differentiators': {
                'core_messages': [
                    'unlike removable dentures',
                    'no daily removal required',
                    'stronger than natural teeth',
                    'comprehensive care approach',
                    'proprietary treatment protocol',
                    'all-in-one solution'
                ],
                'weight': 0.25
            },
            'pain_point_solutions': {
                'core_messages': [
                    'eliminates loose denture problems',
                    'restores confident eating ability',
                    'prevents bone loss progression',
                    'reduces ongoing dental costs',
                    'solves speech clarity issues',
                    'no more denture adhesives'
                ],
                'weight': 0.25
            }
        }

        # Initialize components
        self.content_vectorizer = None
        self.business_vectorizer = None
        self.intent_classifier = None
        self.similarity_index = None
        self.page_embeddings = {}
        self.business_embeddings = {}
        self.url_mapping = {}

        # Legacy triplet extraction
        self.triplet_auditor = TripletAuditor()

    def initialize_vector_processing(self, content_corpus):
        """Initialize vector processing system"""
        print("üöÄ Initializing MUVERA vector processing...")

        # Create content embeddings
        self.content_vectorizer = TfidfVectorizer(
            max_features=10000,
            ngram_range=(1, 3),
            stop_words='english',
            min_df=1,
            max_df=0.95
        )

        # Fit vectorizer
        tfidf_matrix = self.content_vectorizer.fit_transform(content_corpus)

        # Apply SVD for dimensionality reduction
        n_components = min(256, tfidf_matrix.shape[1], len(content_corpus))
        self.svd_compressor = TruncatedSVD(n_components=n_components, random_state=42)
        compressed_embeddings = self.svd_compressor.fit_transform(tfidf_matrix)

        # Initialize similarity search
        if FAISS_AVAILABLE and len(content_corpus) > 5:
            dimension = compressed_embeddings.shape[1]
            self.similarity_index = faiss.IndexFlatIP(dimension)
            faiss.normalize_L2(compressed_embeddings.astype('float32'))
            self.similarity_index.add(compressed_embeddings.astype('float32'))
            self.use_faiss = True
        else:
            self.similarity_matrix = cosine_similarity(compressed_embeddings)
            self.use_faiss = False

        self.compressed_embeddings = compressed_embeddings

        # Create business and intent embeddings
        self._create_business_embeddings()
        self._initialize_intent_classifier()

        print(f"‚úÖ Vector processing initialized with {n_components}D embeddings")

    def _create_business_embeddings(self):
        """Create business context embeddings"""
        all_business_messages = []
        business_labels = []

        for category, config in self.business_context.items():
            for message in config['core_messages']:
                all_business_messages.append(message)
                business_labels.append(category)

        business_tfidf = self.content_vectorizer.transform(all_business_messages)
        business_compressed = self.svd_compressor.transform(business_tfidf)

        self.business_embeddings = {}
        for i, (embedding, label) in enumerate(zip(business_compressed, business_labels)):
            if label not in self.business_embeddings:
                self.business_embeddings[label] = []
            self.business_embeddings[label].append(embedding)

        for category in self.business_embeddings:
            self.business_embeddings[category] = np.mean(self.business_embeddings[category], axis=0)

    def _initialize_intent_classifier(self):
        """Initialize intent classification"""
        intent_texts = []
        intent_labels = []

        for intent, config in self.intent_categories.items():
            for signal in config['signals']:
                intent_texts.append(f"User wants to {signal} dental implants")
                intent_labels.append(intent)

        intent_tfidf = self.content_vectorizer.transform(intent_texts)
        intent_compressed = self.svd_compressor.transform(intent_tfidf)

        self.intent_embeddings = {}
        for i, (embedding, label) in enumerate(zip(intent_compressed, intent_labels)):
            if label not in self.intent_embeddings:
                self.intent_embeddings[label] = []
            self.intent_embeddings[label].append(embedding)

        for intent in self.intent_embeddings:
            self.intent_embeddings[intent] = np.mean(self.intent_embeddings[intent], axis=0)

    def encode_content_to_fde(self, content):
        """Transform content into Fixed Dimensional Encoding"""
        cleaned_content = self._clean_content(content)
        tfidf_vector = self.content_vectorizer.transform([cleaned_content])
        fde_vector = self.svd_compressor.transform(tfidf_vector)
        fde_normalized = fde_vector / (np.linalg.norm(fde_vector) + 1e-10)
        return fde_normalized[0]

    def detect_user_intent(self, content):
        """Detect user intent using semantic similarity"""
        content_fde = self.encode_content_to_fde(content)

        intent_scores = {}
        for intent, intent_embedding in self.intent_embeddings.items():
            similarity = np.dot(content_fde, intent_embedding)
            intent_scores[intent] = float(similarity)

        primary_intent = max(intent_scores, key=intent_scores.get)
        confidence = intent_scores[primary_intent]

        return {
            'primary_intent': primary_intent,
            'confidence': confidence,
            'all_scores': intent_scores,
            'business_weight': self.intent_categories[primary_intent]['business_weight']
        }

    def calculate_business_semantic_alignment(self, content_fde):
        """Calculate alignment with business context"""
        business_scores = {}

        for category, business_embedding in self.business_embeddings.items():
            alignment_score = np.dot(content_fde, business_embedding)
            business_weight = self.business_context[category]['weight']

            business_scores[category] = {
                'raw_alignment': float(alignment_score),
                'weighted_score': float(alignment_score * business_weight),
                'weight': business_weight
            }

        return business_scores

    def _find_similar_content(self, content_fde, k=5):
        """Find similar content"""
        if self.use_faiss and hasattr(self, 'similarity_index'):
            query_vector = content_fde.reshape(1, -1).astype('float32')
            faiss.normalize_L2(query_vector)
            distances, indices = self.similarity_index.search(query_vector, k)

            similar_items = []
            for i, (distance, idx) in enumerate(zip(distances[0], indices[0])):
                if idx != -1:
                    similar_items.append({
                        'rank': i + 1,
                        'similarity_score': float(distance),
                        'content_index': int(idx)
                    })
        else:
            if hasattr(self, 'similarity_matrix'):
                content_similarities = cosine_similarity([content_fde], self.compressed_embeddings)[0]
                similar_indices = np.argsort(content_similarities)[::-1][:k]

                similar_items = []
                for i, idx in enumerate(similar_indices):
                    similar_items.append({
                        'rank': i + 1,
                        'similarity_score': float(content_similarities[idx]),
                        'content_index': int(idx)
                    })
            else:
                similar_items = []

        return similar_items

    def muvera_content_scoring(self, content, url=None):
        """MUVERA-grade content scoring"""
        content_fde = self.encode_content_to_fde(content)
        intent_analysis = self.detect_user_intent(content)
        business_alignment = self.calculate_business_semantic_alignment(content_fde)
        similar_content = self._find_similar_content(content_fde, k=5)
        muvera_score = self._calculate_muvera_score(intent_analysis, business_alignment)

        # Create friendly URL name
        url_short = self.url_mapping.get(url, url)
        if not url_short and url:
            url_short = Path(str(url)).stem if isinstance(url, str) else str(url)
            if len(url_short) > 50:
                url_short = url_short[:50] + "..."

        return {
            'url': url,
            'url_short': url_short,
            'muvera_score': muvera_score,
            'intent_analysis': intent_analysis,
            'business_alignment': business_alignment,
            'similar_content': similar_content,
            'recommendations': self._generate_muvera_recommendations(intent_analysis, business_alignment)
        }

    def _calculate_muvera_score(self, intent_analysis, business_alignment):
        """Calculate weighted MUVERA score"""
        # Intent relevance (30%)
        intent_component = intent_analysis['confidence'] * intent_analysis['business_weight'] * 0.3

        # Business alignment (60%)
        business_component = 0
        total_weight = sum(scores['weight'] for scores in business_alignment.values())

        for category, scores in business_alignment.items():
            business_component += scores['weighted_score']

        business_component = (business_component / total_weight) * 0.6 if total_weight > 0 else 0

        # Content quality (10%)
        quality_component = min(1.0, len(intent_analysis['all_scores']) / 4) * 0.1

        total_score = intent_component + business_component + quality_component
        return min(1.0, max(0.0, total_score))

    def _generate_muvera_recommendations(self, intent_analysis, business_alignment):
        """Generate recommendations"""
        recommendations = []

        if intent_analysis['confidence'] < 0.3:
            recommendations.append({
                'type': 'INTENT_CLARITY',
                'priority': 'HIGH',
                'message': f"Content lacks clear user intent. Strengthen {intent_analysis['primary_intent']} signals."
            })

        for category, scores in business_alignment.items():
            if scores['raw_alignment'] < 0.2:
                recommendations.append({
                    'type': 'BUSINESS_ALIGNMENT',
                    'priority': 'HIGH' if scores['weight'] > 0.2 else 'MEDIUM',
                    'message': f"Weak {category.replace('_', ' ')} alignment. Add relevant messaging."
                })

        return recommendations

    def _clean_content(self, content):
        """Clean content for processing"""
        if not content:
            return ""

        cleaned = re.sub(r'\s+', ' ', str(content)).strip()
        cleaned = re.sub(r'Page \d+', '', cleaned)
        cleaned = re.sub(r'Table of Contents', '', cleaned)

        return cleaned

    def _create_results_dataframe(self, results):
        """Create summary DataFrame"""
        summary_data = []

        for result in results:
            summary_data.append({
                'url_short': result['url_short'],
                'muvera_score': result['muvera_score'],
                'primary_intent': result['intent_analysis']['primary_intent'],
                'intent_confidence': result['intent_analysis']['confidence'],
                'business_weight': result['intent_analysis']['business_weight'],
                'unique_value_props_alignment': result['business_alignment']['unique_value_props']['raw_alignment'],
                'audience_personas_alignment': result['business_alignment']['audience_personas']['raw_alignment'],
                'competitive_differentiators_alignment': result['business_alignment']['competitive_differentiators']['raw_alignment'],
                'pain_point_solutions_alignment': result['business_alignment']['pain_point_solutions']['raw_alignment'],
                'recommendations_count': len(result['recommendations']),
                'high_priority_issues': len([r for r in result['recommendations'] if r['priority'] == 'HIGH'])
            })

        return pd.DataFrame(summary_data)

class TripletAuditor:
    """Legacy triplet extraction for human review"""

    def __init__(self):
        self.basic_patterns = [
            r'\b(\w+(?:\s+\w+){0,2})\s+(provides?|offers?|helps?|solves?|eliminates?)\s+([^.]{1,60})',
            r'\b(\w+(?:\s+\w+){0,2})\s+(is|are|has|have)\s+([^.]{1,60})',
            r'\b(\w+(?:\s+\w+){0,2})\s+(includes?|contains?|features?)\s+([^.]{1,60})'
        ]

    def extract_triplets_for_audit(self, content):
        """Extract basic triplets for human review only"""
        if not content:
            return []

        triplets = []
        for pattern in self.basic_patterns:
            matches = re.finditer(pattern, str(content), re.IGNORECASE)
            for match in matches:
                groups = match.groups()
                if len(groups) >= 3:
                    triplets.append({
                        'subject': groups[0].strip(),
                        'predicate': groups[1].strip(),
                        'object': groups[2].strip(),
                        'audit_only': True
                    })

        return triplets[:10]

def run_muvera_from_excel(excel_file_path, output_folder="muvera_results"):
    """Run MUVERA analysis from Excel file"""

    print("üöÄ Starting MUVERA Analysis from Excel File")
    print("=" * 50)

    # Load Excel file
    print(f"üìä Loading text data from: {excel_file_path}")

    try:
        text_df = pd.read_excel(excel_file_path)
        print(f"‚úÖ Loaded {len(text_df)} rows from Excel file")

        # Check columns
        print(f"üìã Available columns: {list(text_df.columns)}")

        # Look for URL and content columns
        url_col = None
        content_col = None

        for col in text_df.columns:
            if 'url' in col.lower():
                url_col = col
                break

        for col in text_df.columns:
            if any(word in col.lower() for word in ['content', 'text', 'body', 'extracted']):
                content_col = col
                break

        if url_col is None:
            print("‚ö†Ô∏è  No URL column found, using index as identifier")
            text_df['URL'] = text_df.index.astype(str)
            url_col = 'URL'

        if content_col is None:
            print("‚ùå No content column found")
            print("üí° Please ensure your Excel file has a column with 'content', 'text', or 'body' in the name")
            return None, None, None

        # Rename columns for consistency
        text_df = text_df.rename(columns={url_col: 'URL', content_col: 'content'})

        # Clean data
        original_count = len(text_df)
        text_df = text_df.dropna(subset=['content'])
        text_df = text_df[text_df['content'].astype(str).str.strip() != '']

        if len(text_df) < original_count:
            print(f"üìù Removed {original_count - len(text_df)} rows with empty content")

        if text_df.empty:
            print("‚ùå No valid content found")
            return None, None, None

    except Exception as e:
        print(f"‚ùå Error loading Excel file: {str(e)}")
        return None, None, None

    # Run MUVERA analysis
    print("\nüß† Running MUVERA semantic analysis...")
    analyzer = MUVERASemanticAnalyzer()

    # Prepare content
    content_corpus = [analyzer._clean_content(content) for content in text_df['content']]

    # Initialize vector processing
    analyzer.initialize_vector_processing(content_corpus)

    # Analyze each page
    results = []
    for index, row in text_df.iterrows():
        url = row['URL']
        content = row['content']

        # Create page name
        if pd.isna(url):
            page_name = f"page_{index}"
        else:
            page_name = str(url).split('/')[-1] if '/' in str(url) else str(url)
            if len(page_name) > 50:
                page_name = page_name[:50] + "..."

        print(f"Analyzing page {index+1}/{len(text_df)}: {page_name}")

        analysis = analyzer.muvera_content_scoring(content, url)
        results.append(analysis)

    # Create results DataFrame
    results_df = analyzer._create_results_dataframe(results)

    # Export results
    print(f"\nüìä Exporting results to {output_folder}/...")
    os.makedirs(output_folder, exist_ok=True)

    # Main results
    main_results_file = f"{output_folder}/muvera_analysis_results.xlsx"

    with pd.ExcelWriter(main_results_file, engine='openpyxl') as writer:
        results_df.to_excel(writer, sheet_name='MUVERA Scores', index=False)

        # Intent analysis
        intent_data = []
        for result in results:
            intent_data.append({
                'URL': result['url_short'],
                'Primary Intent': result['intent_analysis']['primary_intent'],
                'Intent Confidence': round(result['intent_analysis']['confidence'], 3),
                'Business Weight': result['intent_analysis']['business_weight']
            })

        intent_df = pd.DataFrame(intent_data)
        intent_df.to_excel(writer, sheet_name='Intent Analysis', index=False)

        # Recommendations
        rec_data = []
        for result in results:
            for rec in result['recommendations']:
                rec_data.append({
                    'URL': result['url_short'],
                    'Type': rec['type'],
                    'Priority': rec['priority'],
                    'Recommendation': rec['message']
                })

        if rec_data:
            rec_df = pd.DataFrame(rec_data)
            rec_df.to_excel(writer, sheet_name='Recommendations', index=False)

    print("\n‚úÖ MUVERA ANALYSIS COMPLETE!")
    print(f"üìÅ Results saved: {main_results_file}")
    print(f"\nüìà QUICK SUMMARY:")
    print(f"Pages analyzed: {len(results_df)}")
    print(f"Average MUVERA score: {results_df['muvera_score'].mean():.3f}")
    print(f"High priority issues: {results_df['high_priority_issues'].sum()}")

    return analyzer, results_df, results

def show_page_details(results, page_index):
    """Show detailed analysis for a specific page"""
    if page_index >= len(results):
        print(f"Page index {page_index} not found. Max index: {len(results)-1}")
        return

    result = results[page_index]
    print(f"=== PAGE ANALYSIS: {result['url_short']} ===")
    print(f"MUVERA Score: {result['muvera_score']:.3f}")
    print(f"Primary Intent: {result['intent_analysis']['primary_intent']}")
    print(f"Intent Confidence: {result['intent_analysis']['confidence']:.3f}")

    print("\nBusiness Alignment:")
    for category, scores in result['business_alignment'].items():
        print(f"  {category.replace('_', ' ').title()}: {scores['raw_alignment']:.3f}")

    if result['recommendations']:
        print("\nRecommendations:")
        for rec in result['recommendations']:
            print(f"  [{rec['priority']}] {rec['message']}")

print("‚úÖ MUVERA Analyzer loaded successfully!")
print("\nNow run your analysis with:")
print("analyzer, results_df, detailed_results = run_muvera_from_excel('your_file_path')")

‚úÖ FAISS available for fast similarity search
‚úÖ MUVERA Analyzer loaded successfully!

Now run your analysis with:
analyzer, results_df, detailed_results = run_muvera_from_excel('your_file_path')


In [None]:
# Your exact file path
your_file_path = "/content/drive/MyDrive/Tag/TAG_CC_text_extraction 7.8.25.xlsx"

# Run the analysis
analyzer, results_df, detailed_results = run_muvera_from_excel(your_file_path)

üöÄ Starting MUVERA Analysis from Excel File
üìä Loading text data from: /content/drive/MyDrive/Tag/TAG_CC_text_extraction 7.8.25.xlsx
‚úÖ Loaded 15 rows from Excel file
üìã Available columns: ['url', 'text']

üß† Running MUVERA semantic analysis...
üöÄ Initializing MUVERA vector processing...
‚úÖ Vector processing initialized with 15D embeddings
Analyzing page 1/15: 
Analyzing page 2/15: 
Analyzing page 3/15: 
Analyzing page 4/15: 
Analyzing page 5/15: 
Analyzing page 6/15: 
Analyzing page 7/15: 
Analyzing page 8/15: 
Analyzing page 9/15: 
Analyzing page 10/15: 
Analyzing page 11/15: 
Analyzing page 12/15: 
Analyzing page 13/15: 
Analyzing page 14/15: 
Analyzing page 15/15: 

üìä Exporting results to muvera_results/...

‚úÖ MUVERA ANALYSIS COMPLETE!
üìÅ Results saved: muvera_results/muvera_analysis_results.xlsx

üìà QUICK SUMMARY:
Pages analyzed: 15
Average MUVERA score: 0.116
High priority issues: 60
