# DAY 2: RAG - Passing Documents and Citations

### What this file does:
Demonstrates how to pass custom documents directly to OCI Generative AI (Cohere models) for chat responses with citations. This is a basic form of Retrieval-Augmented Generation (RAG) without external storageâ€”documents are provided in the request.

**Supported models:** cohere.command-r-08-2024, cohere.command-r-plus-08-2024, cohere.command-a-03-2025. For other models (e.g., Llama), use the generic API.

**Documentation to reference:**
- OCI GenAI Chat: https://docs.oracle.com/en-us/iaas/Content/generative-ai/chat-models.htm
- Cohere Documents and Citations: https://docs.cohere.com https://docs.cohere.com/v2/docs/rag-citations#citation-modes
- OCI Python SDK: https://github.com/oracle/oci-python-sdk/tree/master/src/oci/generative_ai_inference

**Relevant slack channels:**
- #generative-ai-users or #igiu-innovation-lab: *For questions*
- #igiu-ai-learning: *For errors running code*

**Env setup:**
- sandbox.yaml: Needs "oci" (configFile, profile, compartment) sections.
- .env: Load environment variables if needed.
-  configure cwd for jupyter match your workspace python code: 
    -  vscode menu -> Settings > Extensions > Jupyter > Notebook File Root
    -  change from `${fileDirname}` to `${workspaceFolder}`


**How to run in notebook:**
- Ensure dependencies installed (uv sync).
- Run cells in order.
- Update LLM_MODEL if using a different Cohere variant.

In [None]:
# Import required libraries
from oci.generative_ai_inference import GenerativeAiInferenceClient
from oci.generative_ai_inference.models import OnDemandServingMode, CohereChatRequest, ChatDetails
import oci
import os
from dotenv import load_dotenv
from envyaml import EnvYAML

In [None]:
# Constants
SANDBOX_CONFIG_FILE = "sandbox.yaml"
load_dotenv()
LLM_MODEL = "cohere.command-a-03-2025"  # Options: cohere.command-r-08-2024, cohere.command-r-plus-08-2024
llm_service_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"

### Step 1: Load configuration, import libraries, and initialize the OCI client for inference

In [None]:
# Load sandbox config
scfg = EnvYAML(SANDBOX_CONFIG_FILE)
if scfg is None or "oci" not in scfg:
    raise RuntimeError("Invalid sandbox configuration.")

# Set up OCI config
config = oci.config.from_file(os.path.expanduser(scfg["oci"]["configFile"]), scfg["oci"]["profile"])

# Initialize LLM client
llm_client = GenerativeAiInferenceClient(
    config=config,
    service_endpoint=llm_service_endpoint,
    retry_strategy=oci.retry.NoneRetryStrategy(),
    timeout=(10, 240)
)

### Step 2: Define Chat Request Setup

In [None]:
# Configure the Cohere chat request with preamble, parameters, and documents for citation.
# Set up chat request with preamble and parameters
llm_chat_request = CohereChatRequest()
# llm_chat_request.preamble_override = "Provide factual answers based on documents provided. Include citations if possible. Say you can't answer if not in documents."
llm_chat_request.is_stream = False
llm_chat_request.max_tokens = 500  # Max tokens to generate
llm_chat_request.temperature = 1.0  # Higher = more random (default 0.3)
# llm_chat_request.seed = 7555  # For deterministic results (not guaranteed)
llm_chat_request.top_p = 0.7  # Nucleus sampling (default 0.75)
llm_chat_request.top_k = 0  # Top-k sampling (0 = off, max 500)
llm_chat_request.frequency_penalty = 0.0  # Reduce repetition (max 1.9)
# llm_chat_request.documents = get_documents()  # Restricts to provided docs
llm_chat_request.citation_quality = CohereChatRequest.CITATION_QUALITY_FAST  # Or CITATION_QUALITY_ACCURATE for better (slower) citations

In [None]:
# Sample documents for testing
docs = [
    {
        "title": "Oracle",
        "snippet": "Oracle database services and products offer customers cost-optimized and high-performance versions of Oracle Database, the world's leading converged, multi-model database management system, as well as in-memory, NoSQL and MySQL databases. Oracle Autonomous Database, available on premises via Oracle Cloud@Customer or in the Oracle Cloud Infrastructure, enables customers to simplify relational database environments and reduce management workloads.",
        "website": "https://www.oracle.com/database",
        "id": "ORA001"
    },
    {
        "title": "Amazon",
        "snippet": "AWS provides the broadest selection of purpose-built databases allowing you to save, grow, and innovate faster. Purpose Built: Choose from 15+ purpose-built database engines including relational, key-value, document, in-memory, graph, time series, wide column, and ledger databases. Performance at Scale: Get relational databases that are 3-5X faster than popular alternatives, or non-relational databases that give you microsecond to sub-millisecond latency. Fully Managed: AWS continuously monitors your clusters to keep your workloads running with self-healing storage and automated scaling, so that you can focus on application development. Secure & Highly Available: AWS databases are built for business-critical, enterprise workloads, offering high availability, reliability, and security.",
        "website": "https://aws.amazon.com/free/database",
        "id": "AWS001"
    }
]

llm_chat_request.documents = docs

In [None]:
# Set up chat details
chat_detail = ChatDetails()
chat_detail.serving_mode = OnDemandServingMode(model_id=LLM_MODEL)
chat_detail.compartment_id = scfg["oci"]["compartment"]
chat_detail.chat_request = llm_chat_request

### Step 3: Ask a Question and Get Response with Citations

In [None]:
# Send a query to the model and print the response along with citations.
# Set the user message
# llm_chat_request.seed = 7555  # Uncomment to try deterministic results
llm_chat_request.message = "Tell me about AWS databases."

# Get response
llm_response = llm_client.chat(chat_detail)

# Print results
print("************************** Chat Result **************************")
print(llm_response.data.chat_response.text)
print("************************** Citations **************************")
print(llm_response.data.chat_response.citations)

### Step 4: Update History for Context

In [None]:
# Add previous chat messages to enable conversation history.
# Update history with previous messages
previous_chat_message = oci.generative_ai_inference.models.CohereUserMessage(message="Tell me something about Oracle.")
previous_chat_reply = oci.generative_ai_inference.models.CohereChatBotMessage(message="Oracle is one of the largest vendors in the enterprise IT market and the shorthand name of its flagship product. The database software sits at the center of many corporate IT")

llm_chat_request.chat_history = [previous_chat_message, previous_chat_reply]

### Step 5: Follow-Up Question with History

In [None]:
# Ask a follow-up question using the history for context.
# Clear documents for history test
# llm_chat_request.documents = []

llm_chat_request.message = "Tell me more about its databases."  # Refers to previous Oracle context

llm_response = llm_client.chat(chat_detail)

# Print results
print("************************** Chat Result **************************")
print(llm_response.data.chat_response.text)
print("************************** Citations **************************")
print(llm_response.data.chat_response.citations)

## Experimentation Section
Try these variations to learn more:
- Change the model to cohere.command-r-plus-08-2024 and compare responses.
- Adjust temperature (e.g., 0.3 for factual, 1.0 for creative) and observe changes.
- Add your own documents (e.g., snippets from PDFs) and query about them.
- Enable/disable chat_history and see how context affects answers.
- Set citation_quality to ACCURATE for more precise citations (may be slower).
- Query something not in documents to test handling of unknowns.

## Practice Exercises
1. **Custom Documents**: Replace sample docs with text from a favorite book or article. Ask questions and check citations.
2. **History Depth**: Add more messages to chat_history and experiment with long conversations.
3. **Parameter Tuning**: Create a loop to test different temperature/top_p values and rate response quality.
4. **Discussion**: How do citations improve trust in AI responses? What are limitations of this approach compared to full RAG?

For help or ideas, reach out in #igiu-innovation-lab or #igiu-ai-learning.