# üîç Testeur de Recherche S√©mantique

Ce notebook permet de tester la recherche s√©mantique sur les embeddings de films.

**Objectif:** Debugger pourquoi la recherche s√©mantique ne retourne pas de r√©sultats.

## 1. Setup et Configuration

In [4]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from pathlib import Path
from dotenv import load_dotenv

# Load environment
load_dotenv()

# Configuration
CHROMA_PATH = r"..\data\vector_database"
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print(f"üìÇ ChromaDB Path: {CHROMA_PATH}")
print(f"üîë API Key found: {bool(OPENAI_API_KEY)}")
print(f"üîë API Key: {OPENAI_API_KEY[:10]}...{OPENAI_API_KEY[-4:] if OPENAI_API_KEY else 'None'}")

üìÇ ChromaDB Path: ..\data\vector_database
üîë API Key found: True
üîë API Key: sk-proj-R2...nGsA


## 2. Connexion √† ChromaDB

In [5]:
# Connect to ChromaDB
print("üîå Connexion √† ChromaDB...")

try:
    client = chromadb.PersistentClient(path=CHROMA_PATH)
    print("‚úÖ Client ChromaDB cr√©√©")
    
    # Create OpenAI embedding function
    openai_ef = embedding_functions.OpenAIEmbeddingFunction(
        api_key=OPENAI_API_KEY,
        model_name="text-embedding-3-small"
    )
    print("‚úÖ Fonction d'embedding OpenAI cr√©√©e")
    
    # Get collection
    collection = client.get_or_create_collection(
        name="movie_descriptions",
        embedding_function=openai_ef
    )
    print(f"‚úÖ Collection 'movie_descriptions' charg√©e")
    
except Exception as e:
    print(f"‚ùå Erreur: {e}")
    raise

üîå Connexion √† ChromaDB...
‚úÖ Client ChromaDB cr√©√©
‚úÖ Fonction d'embedding OpenAI cr√©√©e
‚úÖ Collection 'movie_descriptions' charg√©e


## 3. Statistiques de la Collection

In [6]:
# Get collection stats
print("\n" + "="*60)
print("üìä STATISTIQUES DE LA COLLECTION")
print("="*60)

count = collection.count()
print(f"\nüì¶ Nombre total de documents: {count}")

if count == 0:
    print("\n‚ö†Ô∏è ALERTE: La collection est VIDE!")
    print("   Vous devez d'abord embedder vos films avec embedding_manager.py")
else:
    print(f"‚úÖ Collection contient {count} films")
    
    # Peek at first few items
    print("\nüîç Aper√ßu des 3 premiers documents:")
    peek = collection.peek(limit=3)
    
    for i in range(len(peek['ids'])):
        print(f"\n   [{i+1}] ID: {peek['ids'][i]}")
        print(f"       Title: {peek['metadatas'][i].get('title', 'N/A')}")
        print(f"       Database: {peek['metadatas'][i].get('database', 'N/A')}")
        print(f"       Table: {peek['metadatas'][i].get('table', 'N/A')}")
        print(f"       Description: {peek['documents'][i][:100]}...")


üìä STATISTIQUES DE LA COLLECTION

üì¶ Nombre total de documents: 19925
‚úÖ Collection contient 19925 films

üîç Aper√ßu des 3 premiers documents:

   [1] ID: ama0000
       Title: The Grand Seduction
       Database: movie.db
       Table: amazon_prime_titles
       Description: A small fishing village must procure a local doctor to secure a lucrative business contract. When un...

   [2] ID: ama0001
       Title: Take Care Good Night
       Database: movie.db
       Table: amazon_prime_titles
       Description: A Metro Family decides to fight a Cyber Criminal threatening their stability and pride....

   [3] ID: ama0002
       Title: Secrets of Deception
       Database: movie.db
       Table: amazon_prime_titles
       Description: After a man discovers his wife is cheating on him with a neighborhood kid he goes down a furious pat...


## 4. Fonction de Test de Query

In [7]:
def test_query(query_text: str, n_results: int = 5, where_filter: dict = None):
    """
    Test une query s√©mantique et affiche les r√©sultats
    """
    print("\n" + "="*80)
    print(f"üîç QUERY: '{query_text}'")
    print("="*80)
    
    if where_filter:
        print(f"üéØ Filtre: {where_filter}")
    
    try:
        # Execute query
        results = collection.query(
            query_texts=[query_text],
            n_results=n_results,
            where=where_filter
        )
        
        # Check if we got results
        if not results['ids'] or len(results['ids'][0]) == 0:
            print("\n‚ùå AUCUN R√âSULTAT TROUV√â")
            print("\nPossibles causes:")
            print("  1. La collection est vide")
            print("  2. Le filtre est trop restrictif")
            print("  3. Pas de films correspondants")
            return None
        
        # Display results
        print(f"\n‚úÖ {len(results['ids'][0])} r√©sultats trouv√©s\n")
        
        for i in range(len(results['ids'][0])):
            distance = results['distances'][0][i] if 'distances' in results else None
            similarity = (1 - distance) * 100 if distance is not None else None
            
            print(f"\n{'‚îÄ'*80}")
            print(f"üé¨ R√âSULTAT #{i+1}")
            if similarity is not None:
                print(f"üìä Similarit√©: {similarity:.1f}% (distance: {distance:.4f})")
            print(f"üÜî ID: {results['ids'][0][i]}")
            print(f"üìΩÔ∏è Titre: {results['metadatas'][0][i].get('title', 'N/A')}")
            print(f"üíæ Database: {results['metadatas'][0][i].get('database', 'N/A')}")
            print(f"üìä Table: {results['metadatas'][0][i].get('table', 'N/A')}")
            print(f"\nüìù Description:")
            print(f"   {results['documents'][0][i]}")
        
        print(f"\n{'='*80}\n")
        
        return results
        
    except Exception as e:
        print(f"\n‚ùå ERREUR lors de la query: {e}")
        import traceback
        traceback.print_exc()
        return None

print("‚úÖ Fonction test_query() d√©finie")

‚úÖ Fonction test_query() d√©finie


## 5. Tests de Queries Simples

In [8]:
# Test 1: Query tr√®s simple et g√©n√©rique
test_query("action movie", n_results=3)


üîç QUERY: 'action movie'

‚úÖ 3 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: 2.3% (distance: 0.9771)
üÜî ID: ama5445
üìΩÔ∏è Titre: Dark Disciple
üíæ Database: movie.db
üìä Table: amazon_prime_titles

üìù Description:
   Action and Thriller collide in this amazing, indie film. As an ominous assassin targets a small, seaside town, police soon discover he is virtually unstoppable. But with the body count rising, Detective Steve Teal knows he must confront the killer in a climactic clash that will have you on the edge of your seat!

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î

{'ids': [['ama5445', 'net0801', 'ama8899']],
 'embeddings': None,
 'documents': [['Action and Thriller collide in this amazing, indie film. As an ominous assassin targets a small, seaside town, police soon discover he is virtually unstoppable. But with the body count rising, Detective Steve Teal knows he must confront the killer in a climactic clash that will have you on the edge of your seat!',
   'A group of mixed martial arts fighters stars in this action thriller that follows a quartet of brawlers as they prepare for a major underground event.',
   'Tony Jaa, the fighting superstar "destined for film\'s martial arts pantheon," (New York Daily News) electrifies as a religious young warrior who swears an oath of peace. But when a gangster steals the head of Ong-Bak, his village\'s deity, Ting heads for Bangkok to get it back. In a film Time Magazine calls "exhilarating" with relentless, fever-pitched action free of stunt ...']],
 'uris': None,
 'included': ['metadatas', 'documents', 

In [9]:
# Test 2: Query sur l'espace (comme dans votre exemple)
test_query("space action adventure science fiction", n_results=5)


üîç QUERY: 'space action adventure science fiction'

‚úÖ 5 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: -0.6% (distance: 1.0062)
üÜî ID: dis0474
üìΩÔ∏è Titre: Cosmos: A Spacetime Odyssey
üíæ Database: movie.db
üìä Table: disney_plus_titles

üìù Description:
   A 13-part adventure across the universe of space and time.

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #2
üìä Similarit√©: -1.0% (distance: 1.0097)
üÜî ID: ama2527
üìΩÔ∏è Titre: Explorers
üíæ Database: movie.db
üìä Table: amazon_prime_titles

üìù Description:

{'ids': [['dis0474', 'ama2527', 'net5047', 'net3187', 'net4623']],
 'embeddings': None,
 'documents': [['A 13-part adventure across the universe of space and time.',
   'This adventurous space tale stars Ethan Hawke and young star River Phoenix as misfit best friends whose dreams of space travel become a reality when they create an interplanetary spacecraft in their homemade laboratory and embark on a secret adventure to another galaxy where they find that things are not always as different as they seem.',
   'Orbiting above a planet on the brink of war, scientists test a device to solve an energy crisis and end up face-to-face with a dark alternate reality.',
   "With humankind's future at stake, a group of scientists and a powerful telepath venture into the void aboard a spaceship full of secrets.",
   'Travel the vast Skylander universe in this animated series as a ragtag group of academy graduates build trust and heart in their fight against evil.']],
 'uris': None,
 'included': ['

In [10]:
# Test 3: Query descriptive
test_query("A detective investigating a murder mystery", n_results=5)


üîç QUERY: 'A detective investigating a murder mystery'

‚úÖ 5 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: 35.1% (distance: 0.6487)
üÜî ID: net5390
üìΩÔ∏è Titre: Against the Tide
üíæ Database: movie.db
üìä Table: netflix_titles

üìù Description:
   A detective and a psychologist investigating a string of murders form a crime-solving team with the novelist whose work inspired the killings.

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #2
üìä Similarit√©: 32.6% (distance: 0.6743)
üÜî ID: ama5371
üìΩÔ∏è Titre: Morti's Law

{'ids': [['net5390', 'ama5371', 'ama9348', 'net8683', 'net3299']],
 'embeddings': None,
 'documents': [['A detective and a psychologist investigating a string of murders form a crime-solving team with the novelist whose work inspired the killings.',
   'Two unequal detectives need to solve the mysterious murder case of a hotel manager. But the deeper they dig the more both of them realize that the case has gotten much more personally then they thought.',
   'An unscrupulous police detective corners the lead suspect of a murder investigation in executing a sinister plan.',
   'While investigating a series of murders and the nightclub that links them, a detective‚Äôs case takes an alarming turn when his wife goes missing.',
   "A reporter must hunt for the truth behind a strange murder after she crosses paths with a young cop and becomes the investigation's prime suspect."]],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'table': 'ne

In [11]:
# Test 4: Romance
test_query("romantic love story", n_results=5)


üîç QUERY: 'romantic love story'

‚úÖ 5 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: 17.2% (distance: 0.8283)
üÜî ID: ama9634
üìΩÔ∏è Titre: Pretty Woman
üíæ Database: movie.db
üìä Table: amazon_prime_titles

üìù Description:
   A classic rags-to-riches love story.

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #2
üìä Similarit√©: 16.5% (distance: 0.8348)
üÜî ID: ama7195
üìΩÔ∏è Titre: Preetam
üíæ Database: movie.db
üìä Table: amazon_prime_titles

üìù Description:
   A tale of love that comes alive through the beauty of

{'ids': [['ama9634', 'ama7195', 'ama8741', 'net7462', 'ama3250']],
 'embeddings': None,
 'documents': [['A classic rags-to-riches love story.',
   'A tale of love that comes alive through the beauty of Konkan between a dark skinned guy who falls head over heels in love with the most beautiful girl in his village. Will he succeed in completing his love story...',
   'A beautiful love story that travels through the rain. The essence of this story is the misconception of the hero about the situational care and concern showed by the heroine as love and the bond he developed towards her with the illusion of conversations.',
   'A modern love story is connected to an ancient folk tale as star-crossed lovers from different social classes are kept apart by their families.',
   'A unique tale of romance, with three couples who have very distinct views on what everlasting love is.']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'title': 'P

## 6. Tests avec Filtres

In [12]:
# Test avec filtre sur une table sp√©cifique
test_query(
    "action movie",
    n_results=5,
    where_filter={"table": "netflix_titles"}
)


üîç QUERY: 'action movie'
üéØ Filtre: {'table': 'netflix_titles'}

‚úÖ 5 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: -4.8% (distance: 1.0475)
üÜî ID: net0801
üìΩÔ∏è Titre: Never Back Down 2: The Beatdown
üíæ Database: movie.db
üìä Table: netflix_titles

üìù Description:
   A group of mixed martial arts fighters stars in this action thriller that follows a quartet of brawlers as they prepare for a major underground event.

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #2
üìä Similarit√©: -5.9% (distance: 1.0591)
üÜî ID: 

{'ids': [['net0801', 'net8171', 'net3804', 'net0215', 'net8307']],
 'embeddings': None,
 'documents': [['A group of mixed martial arts fighters stars in this action thriller that follows a quartet of brawlers as they prepare for a major underground event.',
   "This high-octane thriller tells the story of a man on a mission to reclaim what was taken from him and the agent who's determined to stop him.",
   'Leveraging his ability to withstand pain, a young man trains to follow in the footsteps of his martial-arts hero in this high-action, meta comedy.',
   'Based on a true story, this action film follows an incident that stunned a nation in the early 1990s. In Mumbai, India, the notorious gangster Maya holds off veteran cop Khan and a force of more than 200 policemen in a six-hour bloody gunfight.',
   'Jackie Chan and Jet Li star in this rousing adventure about a martial arts movie fan who finds a mystical staff that transports him to ancient China.']],
 'uris': None,
 'included': ['m

## 7. V√©rification des M√©tadonn√©es

In [13]:
# Get a sample of documents to check metadata structure
print("\n" + "="*60)
print("üîç V√âRIFICATION DES M√âTADONN√âES")
print("="*60)

sample = collection.get(limit=10)

print(f"\nüìä √âchantillon de {len(sample['ids'])} documents:\n")

# Check which tables/databases are present
tables = set()
databases = set()

for metadata in sample['metadatas']:
    if 'table' in metadata:
        tables.add(metadata['table'])
    if 'database' in metadata:
        databases.add(metadata['database'])

print(f"üìã Tables pr√©sentes: {sorted(list(tables))}")
print(f"üíæ Databases pr√©sentes: {sorted(list(databases))}")

# Display sample metadata
print(f"\nüîç Exemple de m√©tadonn√©es (premier document):\n")
if sample['metadatas']:
    print(json.dumps(sample['metadatas'][0], indent=2))


üîç V√âRIFICATION DES M√âTADONN√âES

üìä √âchantillon de 10 documents:

üìã Tables pr√©sentes: ['amazon_prime_titles']
üíæ Databases pr√©sentes: ['movie.db']

üîç Exemple de m√©tadonn√©es (premier document):

{
  "table": "amazon_prime_titles",
  "title": "The Grand Seduction",
  "database": "movie.db"
}


## 8. Test de Query Personnalis√©e

In [14]:
# Testez vos propres queries ici
custom_query = "horror movie haunted house"  # Modifiez cette query
test_query(custom_query, n_results=5)


üîç QUERY: 'horror movie haunted house'

‚úÖ 5 r√©sultats trouv√©s


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üé¨ R√âSULTAT #1
üìä Similarit√©: 22.8% (distance: 0.7716)
üÜî ID: ama8802
üìΩÔ∏è Titre: RiffTrax Live: House on Haunted Hill
üíæ Database: movie.db
üìä Table: amazon_prime_titles

üìù Description:
   Yes, horror classic House on Haunted Hill provides a mesmerizing walk down "people actually used to find this SCARY?!?" lane. The Vincent Price horror classic riffed Live! This feature is a parody and contains the original movie combined with a comedic commentary by Mike, Kevin and Bill from RiffTrax (formerly of MST3K aka Mystery Science Theater 3000) .

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î

{'ids': [['ama8802', 'ama8302', 'net6744', 'net1867', 'ama6653']],
 'embeddings': None,
 'documents': [['Yes, horror classic House on Haunted Hill provides a mesmerizing walk down "people actually used to find this SCARY?!?" lane. The Vincent Price horror classic riffed Live! This feature is a parody and contains the original movie combined with a comedic commentary by Mike, Kevin and Bill from RiffTrax (formerly of MST3K aka Mystery Science Theater 3000) .',
   'When a group of people venture into a haunted house, paranormal activity unfolds and evil is unleashed from deep within.',
   'Possessed lovers, witches, haunted houses and more bring tales of horror to the screen in this anthology series.',
   'A group of daring teens finds themselves in a fight for their lives inside a haunted house when a sinister spirit crashes their Halloween party.',
   "When a crew is hired to shoot a horror film, it's lights, camera, action...and murder. A creepy old house, a reclusive grandmother, and

## 9. Test de la Fonction du Tool (comme dans albert_v7)

In [15]:
# Reproduire exactement la logique du tool semantic_search
def semantic_search_tool(query: str, n_results: int = 5, table_filter: str = None) -> str:
    """R√©plique exacte du tool dans albert_v7.py"""
    try:
        # Get or create ChromaDB collection
        os.makedirs(CHROMA_PATH, exist_ok=True)
        client = chromadb.PersistentClient(path=CHROMA_PATH)
        
        openai_ef = embedding_functions.OpenAIEmbeddingFunction(
            api_key=OPENAI_API_KEY,
            model_name="text-embedding-3-small"
        )
        
        collection = client.get_or_create_collection(
            name="movie_descriptions",
            embedding_function=openai_ef
        )
        
        # Build filter if specified
        where_filter = None
        if table_filter:
            where_filter = {"table": table_filter}
        
        # Query collection
        results = collection.query(
            query_texts=[query],
            n_results=n_results,
            where=where_filter
        )
        
        # Format results
        formatted_results = []
        if results['ids'] and len(results['ids'][0]) > 0:
            for i in range(len(results['ids'][0])):
                formatted_results.append({
                    "id": results['ids'][0][i],
                    "title": results['metadatas'][0][i].get('title', 'Unknown'),
                    "description": results['documents'][0][i],
                    "database": results['metadatas'][0][i].get('database', 'unknown'),
                    "table": results['metadatas'][0][i].get('table', 'unknown'),
                    "similarity_score": 1 - results['distances'][0][i] if 'distances' in results else None
                })
        
        return json.dumps(formatted_results, indent=2, default=str)
    
    except Exception as e:
        return json.dumps({"error": f"Semantic search error: {str(e)}"})

# Test du tool
print("\n" + "="*60)
print("üß™ TEST DU TOOL SEMANTIC_SEARCH (comme dans albert_v7)")
print("="*60)

result_json = semantic_search_tool("space action adventure", n_results=5)
result = json.loads(result_json)

if isinstance(result, list) and len(result) > 0:
    print(f"\n‚úÖ Tool retourne {len(result)} r√©sultats\n")
    for i, movie in enumerate(result, 1):
        print(f"{i}. {movie['title']} (similarity: {movie['similarity_score']:.2%})")
elif isinstance(result, dict) and 'error' in result:
    print(f"\n‚ùå ERREUR: {result['error']}")
else:
    print(f"\n‚ùå Aucun r√©sultat")

print(f"\nüìã JSON complet:\n{result_json}")


üß™ TEST DU TOOL SEMANTIC_SEARCH (comme dans albert_v7)

‚úÖ Tool retourne 5 r√©sultats

1. GT Serie 1 (similarity: -1.89%)
2. Carmen Sandiego: To Steal or Not to Steal (similarity: -3.65%)
3. LEGO City Adventures (similarity: -5.54%)
4. Skylanders Academy (similarity: -5.92%)
5. Last Action Hero (similarity: -7.63%)

üìã JSON complet:
[
  {
    "id": "ama2281",
    "title": "GT Serie 1",
    "description": "Global Adventure Trip",
    "database": "movie.db",
    "table": "amazon_prime_titles",
    "similarity_score": -0.018930912017822266
  },
  {
    "id": "net2828",
    "title": "Carmen Sandiego: To Steal or Not to Steal",
    "description": "You drive the action in this interactive adventure, helping Carmen save Ivy and Zack when V.I.L.E. captures them during a heist in Shanghai.",
    "database": "movie.db",
    "table": "netflix_titles",
    "similarity_score": -0.03649115562438965
  },
  {
    "id": "ama1879",
    "title": "LEGO City Adventures",
    "description": "LEGO\u00a

## 10. Diagnostic Complet

In [16]:
print("\n" + "="*80)
print("üî¨ DIAGNOSTIC COMPLET")
print("="*80)

# 1. Collection stats
count = collection.count()
print(f"\n1Ô∏è‚É£ Collection Stats:")
print(f"   - Nombre de documents: {count}")
print(f"   - Collection vide: {count == 0}")

# 2. ChromaDB path
print(f"\n2Ô∏è‚É£ Paths:")
print(f"   - ChromaDB: {CHROMA_PATH}")
print(f"   - Exists: {os.path.exists(CHROMA_PATH)}")
if os.path.exists(CHROMA_PATH):
    files = os.listdir(CHROMA_PATH)
    print(f"   - Files in directory: {len(files)}")

# 3. API Key
print(f"\n3Ô∏è‚É£ OpenAI API:")
print(f"   - API Key pr√©sente: {bool(OPENAI_API_KEY)}")
print(f"   - API Key length: {len(OPENAI_API_KEY) if OPENAI_API_KEY else 0}")

# 4. Sample query
print(f"\n4Ô∏è‚É£ Test Query:")
try:
    test_results = collection.query(
        query_texts=["action"],
        n_results=1
    )
    has_results = len(test_results['ids'][0]) > 0 if test_results['ids'] else False
    print(f"   - Query 'action' retourne des r√©sultats: {has_results}")
except Exception as e:
    print(f"   - Erreur lors de la query: {e}")

# 5. Conclusion
print(f"\n" + "="*80)
if count == 0:
    print("‚ö†Ô∏è PROBL√àME: Collection vide!")
    print("   ‚Üí Vous devez embedder vos films avec embedding_manager.py")
elif not OPENAI_API_KEY:
    print("‚ö†Ô∏è PROBL√àME: Pas d'API Key OpenAI!")
    print("   ‚Üí V√©rifiez votre fichier .env")
else:
    print("‚úÖ Tout semble correct!")
    print("   ‚Üí La recherche s√©mantique devrait fonctionner")
print("="*80)


üî¨ DIAGNOSTIC COMPLET

1Ô∏è‚É£ Collection Stats:
   - Nombre de documents: 19925
   - Collection vide: False

2Ô∏è‚É£ Paths:
   - ChromaDB: ..\data\vector_database
   - Exists: True
   - Files in directory: 2

3Ô∏è‚É£ OpenAI API:
   - API Key pr√©sente: True
   - API Key length: 164

4Ô∏è‚É£ Test Query:
   - Query 'action' retourne des r√©sultats: True

‚úÖ Tout semble correct!
   ‚Üí La recherche s√©mantique devrait fonctionner
