# RAG Flow Testing Notebook

This notebook runs the main RAG flow in simple steps:
1) Start a new session (create user),
2) Upload a document (which triggers embedding on the backend),
3) Ask questions ‚Äî both when there is no chat history (creates a new chat) and when there is an existing chat (reuses the same chat).

Run cells in order. Ensure the backend is running at http://127.0.0.1:8011

In [1]:
# Setup: imports and helpers
import requests
from pprint import pprint
import time
import os

BASE_URL = "http://127.0.0.1:8011"

def print_response(r):
    print(f"Status: {r.status_code}")
    try: pprint(r.json())
    except: print(r.text)
    print('-'*60)

print('Setup complete. Base URL:', BASE_URL)

Setup complete. Base URL: http://127.0.0.1:8011


## 1) Start a new session (create user)

In [2]:
username = 'rag_test_user'
resp = requests.post(f'{BASE_URL}/users/', json={'username': username})
print_response(resp)

if resp.status_code in (200, 201):  # accept both success codes
    user_id = resp.json().get('id')  # your API returns 'id', not 'user_id'
    print('Created user_id:', user_id)
else:
    raise SystemExit('Failed to create user - stop testing')


Status: 201
{'created_at': '2025-10-24T11:54:35.608132',
 'id': '5310b735-7619-4905-97df-956d77b30709',
 'total_chats': 0}
------------------------------------------------------------
Created user_id: 5310b735-7619-4905-97df-956d77b30709


## 2) Prepare and upload a document (triggers embedding on backend)

In [3]:
# Browse and select a file to upload
import os
from pathlib import Path

# Option 1: Provide file path directly (edit this line or input when prompted)
file_path = input("Enter the full path to your file (or press Enter to browse): ").strip()

# If no path provided, show current directory files
if not file_path:
    print("\nCurrent directory:", os.getcwd())
    print("\nAvailable files in current directory:")
    files = [f for f in os.listdir('.') if os.path.isfile(f) and not f.endswith('.ipynb')]
    
    if not files:
        raise SystemExit("No files found in current directory. Please provide a file path.")
    
    for i, f in enumerate(files, 1):
        size = os.path.getsize(f)
        print(f"{i}. {f} ({size:,} bytes)")
    
    file_choice = input("\nEnter file number or full path: ").strip()
    
    # Check if user entered a number (selecting from list) or a path
    if file_choice.isdigit() and 1 <= int(file_choice) <= len(files):
        file_path = files[int(file_choice) - 1]
    else:
        file_path = file_choice

# Convert to absolute path if it's relative
file_path = os.path.abspath(file_path)

# Validate file exists
if not os.path.exists(file_path):
    raise SystemExit(f"‚ùå File not found: {file_path}\nMake sure the path is correct!")

print(f"\n‚úÖ Selected file: {file_path}")
print(f"   File size: {os.path.getsize(file_path):,} bytes")

# Check if backend is reachable before uploading
try:
    health_check = requests.get(f'{BASE_URL}/health', timeout=2)
    print(f"   Backend status: {health_check.status_code}")
except Exception as e:
    raise SystemExit(f"‚ùå Cannot reach backend at {BASE_URL}. Is it running?\nError: {e}")

# Upload the document to the backend
# NOTE: user_id is sent as form data, not in the URL path
print(f"\nüì§ Uploading to: {BASE_URL}/documents/upload")
with open(file_path, 'rb') as fh:
    files = {'file': (os.path.basename(file_path), fh)}
    data = {'user_id': user_id}  # Send user_id as form field
    upload_resp = requests.post(f'{BASE_URL}/documents/upload', files=files, data=data)
    print_response(upload_resp)
    
    if upload_resp.status_code in (200, 201):
        # Backend returns 'id' field
        resp_data = upload_resp.json()
        document_id = resp_data.get('id') or resp_data.get('document_id')
        print(f'‚úÖ Uploaded document_id: {document_id}')
    else:
        error_detail = upload_resp.json() if upload_resp.text else "No error details"
        raise SystemExit(f'‚ùå Document upload failed with status {upload_resp.status_code}\nDetails: {error_detail}')


Current directory: c:\Users\BRT.MIBRAHIM\OneDrive - CMA CGM\fastapi_mini_project\scripts

Available files in current directory:
1. okk.pdf (907,936 bytes)

‚úÖ Selected file: c:\Users\BRT.MIBRAHIM\OneDrive - CMA CGM\fastapi_mini_project\scripts\okk.pdf
   File size: 907,936 bytes
   Backend status: 200

üì§ Uploading to: http://127.0.0.1:8011/documents/upload

‚úÖ Selected file: c:\Users\BRT.MIBRAHIM\OneDrive - CMA CGM\fastapi_mini_project\scripts\okk.pdf
   File size: 907,936 bytes
   Backend status: 200

üì§ Uploading to: http://127.0.0.1:8011/documents/upload
Status: 200
{'document_id': 'c30316b8-5536-4061-be7a-10c1737d5979',
 'file_type': '.pdf',
 'filename': 'okk.pdf',
 'message': 'Document processed and added to shared storage. 3 new chunks '
            'added. Total user chunks: 3',
 'total_chunks': 3}
------------------------------------------------------------
‚úÖ Uploaded document_id: c30316b8-5536-4061-be7a-10c1737d5979
Status: 200
{'document_id': 'c30316b8-5536-4061-be7

In [4]:
# Optionally wait a few seconds for embedding/background processing (adjust if your backend is slow)
print('Waiting 3 seconds for backend processing (if any)...')
time.sleep(3)

# Get documents list to verify indexing
# NOTE: user_id is sent as query parameter, not in URL path
docs_resp = requests.get(f'{BASE_URL}/documents', params={'user_id': user_id})
print("\nüìö User's documents:")
print_response(docs_resp)

if docs_resp.status_code == 200:
    docs = docs_resp.json()
    print(f"‚úÖ Found {len(docs)} document(s) for this user")
else:
    print("‚ö†Ô∏è Could not retrieve documents list")

Waiting 3 seconds for backend processing (if any)...

üìö User's documents:
Status: 200
[{'document_id': 'c30316b8-5536-4061-be7a-10c1737d5979',
  'file_size': 907936,
  'file_type': '.pdf',
  'filename': 'okk.pdf',
  'total_chunks': 3,
  'upload_date': '2025-10-24T11:54:40.767000'}]
------------------------------------------------------------
‚úÖ Found 1 document(s) for this user

üìö User's documents:
Status: 200
[{'document_id': 'c30316b8-5536-4061-be7a-10c1737d5979',
  'file_size': 907936,
  'file_type': '.pdf',
  'filename': 'okk.pdf',
  'total_chunks': 3,
  'upload_date': '2025-10-24T11:54:40.767000'}]
------------------------------------------------------------
‚úÖ Found 1 document(s) for this user


## 3) Ask me anything ‚Äî no previous chat (creates new chat)

In [5]:
# Send a first question WITHOUT chat_id - this creates a new chat automatically
question = 'What does the uploaded document say about embeddings?'

print(f'‚ùì You asked: {question}\n')

# Get list of chats BEFORE sending message (to compare after)
chats_before = requests.get(f'{BASE_URL}/chat/message/collection', params={'user_id': user_id})
chat_ids_before = [c['chatId'] for c in chats_before.json().get('chats', [])] if chats_before.status_code == 200 else []

# Send the message (creates new chat automatically)
send_resp = requests.post(
    f'{BASE_URL}/orchestrator/chat',
    json={'query': question, 'user_id': user_id}
)

if send_resp.status_code == 200:
    resp_data = send_resp.json()
    
    # Extract the bot's answer from the messages array
    messages = resp_data.get('messages', [])
    bot_message = next((m for m in messages if m.get('userType') == 'bot'), None)
    
    if bot_message:
        answer = bot_message.get('content', '')
        print(f'ü§ñ Agent Answer:\n{answer}\n')
        print('-' * 80)
    
    # Get chat_id by checking which new chat was created
    time.sleep(0.5)  # Small delay to ensure chat is saved
    chats_after = requests.get(f'{BASE_URL}/chat/message/collection', params={'user_id': user_id})
    
    if chats_after.status_code == 200:
        chat_ids_after = [c['chatId'] for c in chats_after.json().get('chats', [])]
        new_chats = [cid for cid in chat_ids_after if cid not in chat_ids_before]
        
        if new_chats:
            chat_id = new_chats[0]  # Get the newly created chat
            print(f'‚úÖ New Chat ID: {chat_id}')
        else:
            # If no new chat, use the most recent one
            all_chats = chats_after.json().get('chats', [])
            if all_chats:
                chat_id = all_chats[0]['chatId']
                print(f'‚úÖ Using most recent Chat ID: {chat_id}')
            else:
                raise SystemExit('Could not determine chat_id')
    else:
        print(f'‚ö†Ô∏è Failed to retrieve chat list (status {chats_after.status_code})')
        print('Response:', chats_after.text)
        raise SystemExit('Failed to retrieve chat list')
else:
    print(f'‚ùå Failed to send message (status {send_resp.status_code})')
    print_response(send_resp)
    raise SystemExit('Failed to create chat')

‚ùì You asked: What does the uploaded document say about embeddings?

ü§ñ Agent Answer:
The uploaded document does not contain any information about embeddings.

--------------------------------------------------------------------------------
ü§ñ Agent Answer:
The uploaded document does not contain any information about embeddings.

--------------------------------------------------------------------------------
‚úÖ New Chat ID: 62afe0a9-f4c6-4ee4-bac6-bc0a9d6e8b5e
‚úÖ New Chat ID: 62afe0a9-f4c6-4ee4-bac6-bc0a9d6e8b5e


## 4) Ask again ‚Äî reuse existing chat (append to same chat)

In [7]:
# Send another message to the SAME chat (reuse chat_id)
follow_up = 'what was my message before this one?'

print(f'‚ùì You asked: {follow_up}')
print(f'   (Using chat_id: {chat_id})\n')

# Include chat_id as query parameter to append to existing chat
send2 = requests.post(
    f'{BASE_URL}/orchestrator/chat',
    params={'chat_id': chat_id},
    json={'query': follow_up, 'user_id': user_id}
)

if send2.status_code == 200:
    resp_data = send2.json()
    
    # Extract the bot's answer
    messages = resp_data.get('messages', [])
    bot_message = next((m for m in messages if m.get('userType') == 'bot'), None)
    
    if bot_message:
        answer = bot_message.get('content', '')
        print(f'ü§ñ Agent Answer:\n{answer}\n')
        print('-' * 80)
    
    print('‚úÖ Message delivered to existing chat')
else:
    print(f'‚ùå Message delivery failed (status {send2.status_code})')
    print_response(send2)

# Show full conversation history (optional)
print('\n\nüìú Full Conversation History:')
hist = requests.get(f'{BASE_URL}/chat/message/chat/{chat_id}')

if hist.status_code == 200:
    all_messages = hist.json()
    print(f'Total messages in chat: {len(all_messages)}\n')
    
    for i, msg in enumerate(all_messages, 1):
        user_msg = msg.get('user_message', '')
        bot_msg = msg.get('assistant_message', '')
        print(f'Message {i}:')
        print(f'  User: {user_msg}')
        print(f'  Bot: {bot_msg[:100]}...' if len(bot_msg) > 100 else f'  Bot: {bot_msg}')
        print()
else:
    print('‚ö†Ô∏è Could not retrieve conversation history')

‚ùì You asked: what was my message before this one?
   (Using chat_id: 62afe0a9-f4c6-4ee4-bac6-bc0a9d6e8b5e)

ü§ñ Agent Answer:
You asked, "What does the uploaded document say about embeddings?"

--------------------------------------------------------------------------------
‚úÖ Message delivered to existing chat


üìú Full Conversation History:
Total messages in chat: 3

Message 1:
  User: What does the uploaded document say about embeddings?
  Bot: The uploaded document does not contain any information about embeddings. [Tool: rag_search]

Message 2:
  User: what was my message before this one?
  Bot: I'm sorry, but I don't have the ability to access previous messages or maintain a conversation histo...

Message 3:
  User: what was my message before this one?
  Bot: You asked, "What does the uploaded document say about embeddings?" [Tool: general_chat]

ü§ñ Agent Answer:
You asked, "What does the uploaded document say about embeddings?"

------------------------------------------