# RAG Backend API demo

This notebook demonstrates the complete end-to-end flow of this mvp.

## Features:
1. Document ingestion
2. Rag Chat
3. Multi Turn Conversations
4. Interview Booking
5. Session Management

**Note:** Make sure to run the backend server before executing this notebook.

In [None]:
import requests
from datetime import datetime
from pprint import pprint
API_BASE_URL = "http://localhost:8000"
HEADERS = {"Content-Type": "application/json"}


print(f"API Base URL: {API_BASE_URL}")
print(f" Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

API Base URL: http://localhost:8000
 Timestamp: 2025-11-14 16:17:14


# 1. Initial Setup

In [15]:
from pinecone import Pinecone
import os
from dotenv import load_dotenv

load_dotenv()

PINECONE_API_KEY= os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")

pc = Pinecone(api_key=PINECONE_API_KEY)

try:
    index = pc.Index(PINECONE_INDEX_NAME)
    index.delete(delete_all=True)
    print(f"Index: {PINECONE_INDEX_NAME}")

except Exception as e:
    print(f"⚠️ Error clearing index: {e}")


Index: rag-documents-cohere


In [None]:
sample_content = """
Product Documentation: AI-Powered Interview Scheduler

Overview:
Our AI-powered interview scheduling platform streamlines the hiring process by automating 
interview coordination, candidate communication, and schedule optimization.

Key Features:
1. Automated Scheduling: Intelligent matching of candidate and interviewer availability
2. Multi-timezone Support: Seamlessly coordinate across different time zones
3. Calendar Integration: Sync with Google Calendar, Outlook, and other major platforms
4. AI-Powered Recommendations: Smart suggestions for optimal interview times
5. Candidate Self-Service: Allow candidates to pick from available slots

Technical Specifications:
- Built with FastAPI and Python 3.11+
- Uses PostgreSQL for data persistence
- Redis for caching and session management
- Pinecone for vector similarity search
- Groq LLM for intelligent conversation
- RESTful API architecture

Interview Types Supported:
- Phone Screening (30 minutes)
- Technical Interview (60 minutes)
- Behavioral Interview (45 minutes)
- Final Round Interview (90 minutes)

Contact Information:
For support, email: support@example.com
Website: https://interview-scheduler.example.com

Current Hiring Roles for AI Engineer Position:

1) Senior AI Engineer - Full-time
    - Seniority: Senior (5+ years)
    - Location: Remote or Hybrid
    - Responsibilities:
      * Design and implement machine learning models for interview matching algorithms
      * Develop NLP pipelines for resume parsing and candidate-job matching
      * Build recommendation systems for optimal interview scheduling
      * Work with LLMs for conversational AI and candidate screening
      * Optimize embedding models and vector search for semantic matching
    - Tech stack: Python, PyTorch/TensorFlow, Transformers, LangChain, Cohere, OpenAI API
    - Requirements: PhD or Masters in CS/AI, strong publication record, experience with production ML systems

2) ML Engineer (Embeddings & RAG) - Full-time
    - Seniority: Mid to Senior (3+ years)
    - Location: Remote
    - Responsibilities:
      * Design and optimize RAG (Retrieval-Augmented Generation) pipelines
      * Fine-tune embedding models for domain-specific search
      * Implement vector databases and similarity search algorithms
      * Evaluate and improve context retrieval quality
      * A/B test different embedding strategies and chunking methods
    - Tech stack: Python, Cohere, Pinecone, Qdrant, sentence-transformers, NumPy
    - Requirements: Strong understanding of embeddings, vector databases, and information retrieval

3) AI Research Engineer - Contract
    - Seniority: Senior (4+ years)
    - Location: Remote
    - Responsibilities:
      * Research state-of-the-art LLM techniques for interview automation
      * Experiment with prompt engineering and chain-of-thought reasoning
      * Develop evaluation metrics for AI-generated responses
      * Prototype new AI features for candidate assessment
    - Tech stack: Python, GPT-4, Claude, Llama, prompt engineering frameworks
    - Requirements: Strong research background, experience with LLM APIs, ability to publish findings

4) MLOps Engineer - Full-time
    - Seniority: Mid to Senior (3+ years)
    - Location: Remote or Hybrid
    - Responsibilities:
      * Build CI/CD pipelines for ML model deployment
      * Monitor model performance and data drift
      * Implement A/B testing infrastructure for ML experiments
      * Optimize inference latency and cost
      * Manage model versioning and experiment tracking
    - Tech stack: Docker, Kubernetes, MLflow, W&B, Prometheus, FastAPI
    - Requirements: Experience deploying ML models to production, strong DevOps skills

5) Data Scientist (AI/ML) - Full-time
    - Seniority: Mid (2+ years)
    - Location: Remote
    - Responsibilities:
      * Analyze user behavior and interview success metrics
      * Build predictive models for interview outcomes
      * Create dashboards and reports on AI system performance
      * Conduct statistical analysis of matching algorithms
    - Tech stack: Python, pandas, scikit-learn, SQL, Jupyter, Tableau
    - Requirements: Strong statistics background, experience with A/B testing, data visualization

How to Apply:
- Send your resume to careers@example.com with "AI Engineer - [Role Name]" in subject
- Include a cover letter explaining your AI/ML experience
- Portfolio or GitHub links showcasing ML projects strongly preferred
- Research papers or technical blog posts are a plus

Interview Process:
1. Initial screening call (30 min)
2. Technical assessment - ML coding challenge
3. System design interview - AI architecture
4. Team interview - cultural fit and collaboration
5. Final round with leadership

Benefits:
- Competitive salary with equity
- Remote-first culture
- Professional development budget
- Conference attendance support
- Health insurance and 401k
"""

with open("sample_document.txt", "w", encoding="utf-8") as f:
    f.write(sample_content)

print("Sample document created: sample_document.txt")


Sample document created: sample_document.txt


# 2. Document Ingestion
Upload text or pdf to the system for processing.

- Extract text from the document
- Chunk the text
- Generate embeddings
- Store in vector database
- Save metadata in the database (postgresql)

In [17]:
url = f"{API_BASE_URL}/api/ingest"
print(" Starting document ingestion...")

with open("sample_document.txt", "rb") as f:
    files = {"file": ("sample_document.txt", f, "text/plain")}
    
    data = {
        "chunking_type": "semantic", 
        "chunk_size": 500,
        "chunk_overlap": 50
    }
    
    try:
        response = requests.post(url, files=files, data=data, timeout=30)
    except requests.exceptions.Timeout:
        print(" Request timed out after 30 seconds")
    except requests.exceptions.ConnectionError:
        print(" Could not connect to API at http://localhost:8000")
    else:
        if response.status_code == 200:
            print(" Document ingested successfully!")
            result = response.json()
            print(f" Document ID: {result.get('document_id', 'N/A')}")
            print(f" Filename: {result.get('filename', 'N/A')}")
            print(f" Total Chunks: {result.get('total_chunks', 0)}")
            print(f" Chunking Strategy: {result.get('chunking_strategy', 'N/A')}")
            print(f" Vector Store: {result.get('vector_store', 'N/A')}")
            print(f"Created At: {result.get('created_at', 'N/A')}")
            print(f" Message: {result.get('message', 'N/A')}")
            
            print(f"\n Document is now ready for RAG queries!")
        else:
            print(f" Document ingestion failed with status: {response.status_code}")
            print(f"Response: {response.text}")

 Starting document ingestion...
 Document ingested successfully!
 Document ID: e315ffde-045b-45e8-be7b-092edf5f0933
 Filename: sample_document.txt
 Total Chunks: 3
 Chunking Strategy: semantic
 Vector Store: pinecone
Created At: 2025-11-14T10:52:06.087060
 Message: Document ingested successfully

 Document is now ready for RAG queries!


# 3. Rag Chat
Now that we have documents ingested, let's ask questions using RAG (Retrieval-Augmented Generation).

The system will:
1. Embed your query
2. Search Pinecone for similar document chunks
3. Send context + query to LLM (Groq)
4. Return an informed answer

In [None]:
import uuid
session_id = str(uuid.uuid4())

print(f"New Chat Session ID: {session_id}\n")

url = f"{API_BASE_URL}/api/chat"
payload = {
    "query": "Which roles are u hiring?",
    "session_id": session_id,
    "top_k": 5
}

response = requests.post(url, json=payload, headers=HEADERS)

if response.status_code == 200:
    result = response.json()
    print("Chat response received!\n")
    print(f"Answer:\n{result['answer']}\n")
    
    contexts = result.get('retrieved_contexts', [])
    print(f"Number of sources used: {len(contexts)}")
    print(f"\nSource chunks:")
    for i, ctx in enumerate(contexts, 1):
        print(f"\n  Source {i}:")
        print(f"    File: {ctx.get('filename', 'N/A')}")
        print(f"    Similarity Score: {ctx.get('similarity_score', 0):.4f}")
        print(f"    Text Preview: {ctx.get('chunk_text', '')[:150]}...")
    
    if result.get('booking_detected'):
        print(f"\nBooking detected! ID: {result.get('booking_id', 'N/A')}")

else:
    print(response.text)

New Chat Session ID: 5ee00b12-823f-4b29-a6c9-7c5a649dead8

Chat response received!

Answer:
We are currently hiring for the following AI Engineer positions:

1. **Senior AI Engineer** - Full-time: Design and implement machine learning models for interview matching algorithms, develop NLP pipelines, and build recommendation systems.
2. **ML Engineer (Embeddings & RAG)** - Full-time: Design and optimize RAG pipelines, fine-tune embedding models, and implement vector databases and similarity search algorithms.
3. **AI Research Engineer** - Contract: Research state-of-the-art LLM techniques for interview automation, experiment with prompt engineering, and develop evaluation metrics for AI-generated responses.
4. **MLOps Engineer** - Full-time: Build CI/CD pipelines for ML model deployment, monitor model performance, and implement A/B testing infrastructure.
5. **Data Scientist (AI/ML)** - Full-time: Analyze user behavior, build predictive models, create dashboards, and conduct statistical 

## 4. Multi-turn Conversation

Continue the conversation with follow-up questions. The system maintains chat history in Redis.

In [None]:
payload = {
    "query": "What interview types are supported and how long is each?",
    "session_id": session_id,
    "top_k": 5
}

response = requests.post(f"{API_BASE_URL}/api/chat", json=payload, headers=HEADERS)

if response.status_code == 200:
    result = response.json()
    print(f" Answer:\n{result['answer']}\n")
    print(f" This conversation has context from previous messages")
else:
    print(f"Request failed: {response.status_code}")
    print(response.text)

 Answer:
Our AI-powered interview scheduling platform supports the following types of interviews, along with their respective durations:

1. **Phone Screening**: 30 minutes
2. **Technical Interview**: 60 minutes
3. **Behavioral Interview**: 45 minutes
4. **Final Round Interview**: 90 minutes

These interview types and durations can be easily scheduled and managed through our platform, which streamlines the hiring process and optimizes interview coordination.

 This conversation has context from previous messages


## 5. Interview Booking Extraction

The system can automatically detect and extract interview booking information from conversations.

Try asking to schedule an interview with specific details.

In [20]:
payload = {
    "query": "I'd like to book a technical interview for Anuj Bhandari. His email is anuzb50@gmail.com. He prefers November 12, 2025 at 2:00 PM. He has 5 years of Python experience.",
    "session_id": session_id,
    "top_k": 3
}

response = requests.post(f"{API_BASE_URL}/api/chat", json=payload, headers=HEADERS)

if response.status_code == 200:
    result = response.json()
    print("Booking request processed!\n")
    print(f"Answer:\n{result['answer']}\n")
    
    if result.get('booking_detected'):
        print("Booking information successfully extracted and saved!")
        print(f"Booking ID: {result.get('booking_id', 'N/A')}")
        print("\nRun the next cell to see all bookings!")
    else:
        print("No booking information detected in this message")
        print("\nDebug info:")
        print(f"   Response keys: {list(result.keys())}")
        print(f"   Booking detected: {result.get('booking_detected', False)}")
else:
    print(f"Request failed: {response.status_code}")
    print(response.text)

Booking request processed!

Answer:
I'd be happy to help you schedule a technical interview for Anuj Bhandari.

Here are the details:

* Candidate Name: Anuj Bhandari
* Email: anuzb50@gmail.com
* Interview Type: Technical Interview (60 minutes)
* Preferred Date and Time: November 12, 2025, at 2:00 PM
* Relevant Experience: 5 years of Python experience

I've checked our availability, and we have a slot open on November 12, 2025, at 2:00 PM. I'm going to go ahead and book the technical interview for Anuj.

You will receive a confirmation email with the interview details, and Anuj will receive a separate email with instructions on how to join the interview.

Please note that our technical interview will assess Anuj's problem-solving skills, coding abilities, and experience with Python. If you have any specific requirements or questions you'd like us to cover during the interview, please let me know.

Is there anything else I can assist you with?

✅ Interview booking created successfully!


## 6. Retrieve All Bookings

Get all interview bookings stored in the system.

In [21]:
response = requests.get(f"{API_BASE_URL}/api/bookings")

if response.status_code == 200:
    data = response.json()
    bookings = data.get('bookings', [])
    total = data.get('total', 0)
    
    print(f"Retrieved {total} booking(s)\n")
    
    if bookings:
        for i, booking in enumerate(bookings, 1):
            print(f"{'='*60}")
            print(f"Booking #{i}")
            print(f"{'='*60}")
            print(f"ID: {booking.get('booking_id', 'N/A')}")
            print(f"Name: {booking.get('name', 'N/A')}")
            print(f"Email: {booking.get('email', 'N/A')}")
            print(f"Date: {booking.get('date', 'N/A')}")
            print(f"Time: {booking.get('time', 'N/A')}")
            print(f"Additional Notes: {booking.get('additional_notes', 'N/A')}")
            print(f"Status: {booking.get('status', 'N/A')}")
            print(f"Created At: {booking.get('created_at', 'N/A')}")
            print()
    else:
        print("No bookings found in the system")
else:
    print(f"Failed to retrieve bookings: {response.status_code}")
    print(response.text)

Retrieved 3 booking(s)

Booking #1
ID: d9c91267-6f75-4efb-9a12-d344e5d88dc4
Name: Anuj Bhandari
Email: anuzb50@gmail.com
Date: 2025-11-12
Time: 14:00
Additional Notes: He has 5 years of Python experience.
Status: pending
Created At: 2025-11-14T10:52:20.846883

Booking #2
ID: 1523ea44-5332-4d98-b980-bb32636902fe
Name: Anuj Bhandari
Email: anuzb50@gmail.com
Date: 2025-11-12
Time: 14:00
Additional Notes: He has 5 years of Python experience.
Status: pending
Created At: 2025-11-14T10:43:12.979284

Booking #3
ID: b3c696d4-90b1-4298-9ee2-7f28c025ea3e
Name: Anuj Bhandari
Email: anuzb50@gmail.com
Date: 2025-12-15
Time: 14:00
Additional Notes: He has 5 years of Python experience.
Status: pending
Created At: 2025-11-14T10:42:44.722871



## 7. Advanced Examples

Let's try some more complex queries to demonstrate the RAG capabilities.

In [22]:
import uuid

new_session_id = str(uuid.uuid4())

queries = [
    "What technologies are used to build the platform?",
    "How can I integrate the system with my calendar?",
    "What's the contact email for support?",
]

print(f"New Session: {new_session_id}\n")
print("="*70)

for i, query in enumerate(queries, 1):
    print(f"\nQuery {i}: {query}")
    print("-"*70)
    
    payload = {
        "query": query,
        "session_id": new_session_id,
        "top_k": 3
    }
    
    response = requests.post(f"{API_BASE_URL}/api/chat", json=payload, headers=HEADERS)
    
    if response.status_code == 200:
        result = response.json()
        print(f"Answer: {result['answer']}\n")
        contexts = result.get('retrieved_contexts', [])
        print(f"Sources: {len(contexts)} chunks used")
        if contexts:
            print(f"Top similarity score: {contexts[0].get('similarity_score', 0):.4f}")
    else:
        print(f"Error: {response.status_code}")
    
    print("="*70)

New Session: c566f346-5f09-4171-bcfb-7fb30917207a


Query 1: What technologies are used to build the platform?
----------------------------------------------------------------------
Answer: The AI-Powered Interview Scheduler platform is built using the following technologies:

1. **FastAPI**: A modern, fast (high-performance), web framework for building APIs.
2. **Python 3.11+**: The programming language used for development.
3. **PostgreSQL**: A powerful, open-source relational database management system used for data persistence.
4. **Redis**: An in-memory data store used for caching and session management.
5. **Pinecone**: A vector database used for vector similarity search.
6. **Groq LLM**: A large language model used for intelligent conversation.

Additionally, the platform also integrates with popular calendar platforms, including:

1. **Google Calendar**
2. **Outlook**

The platform follows a **RESTful API architecture**, ensuring a scalable and maintainable design.

Sources: 2 