# 03: Scientific Workflow and RAG

This notebook focuses on the `ScientificWorkflowAgent`'s Retrieval-Augmented Generation (RAG) capabilities. You will learn how to upload various document types (CSV, PDF, DOCX, images, TXT, TEX), process them to create embeddings, and then query their content using natural language.

## Getting Started

Before running this notebook, ensure that the Docker services for Chapter 06 are up and running. You can start them by navigating to the `chapter-06-advanced-multi-agent-orchestration` directory and executing the lifecycle script:

```bash
./start-chapter-resources.sh
```

Also, ensure you have an SSH session with port forwarding for OpenWebUI (`8902`), the Agent Gateway (`8083`), and the main MCP server (`8080`):

```bash
ssh -L 8902:localhost:8902 -L 8083:localhost:8083 -L 8080:localhost:8080 user@remote-server
```

## 1. Uploading and Processing Documents

The RAG workflow begins by placing a document into the shared `data/uploaded_files` volume. Once the file is in place, we call the `process_uploaded_file` tool on the main MCP server to generate embeddings and index the document for RAG.

For this example, we'll create a dummy CSV file and manually place it into the shared volume. In a real application, files would be uploaded via a dedicated UI or API that places them into this shared volume.

First, let's define the content of our dummy CSV file and the path where it should be placed within the shared volume. You will need to manually create this file in your local `data/uploaded_files` directory (e.g., `docs/tutorial-branches/chapter-06-advanced-multi-agent-orchestration/data/uploaded_files/dummy_data.csv`).


In [None]:
import requests
import json
import os
import uuid
import io

# Configuration
DEFAULT_MCP_SERVER_URL = "http://localhost:8080/mcp" # Main MCP server for processing tools
AGENT_GATEWAY_URL = "http://localhost:8083/v1" # Agent Gateway for chat interactions
MODEL_ID = "agentic-framework/scientific-agent-v1"
API_KEY = "not-a-real-key" # Dummy API key if auth is disabled

# Create a dummy CSV file content
csv_content = """name,age,city
Alice,30,New York
Bob,24,London
Charlie,35,Paris"""
dummy_filename = "dummy_data.csv"

# Define the full path where the file should be manually placed
file_path_on_server = f"/app/data/uploaded_files/{dummy_filename}"

print(f"Please manually create a file named '{dummy_filename}' in your local 'data/uploaded_files/' directory (e.g., docs/tutorial-branches/chapter-06-advanced-multi-agent-orchestration/data/uploaded_files/).")
print(f"Paste the following content into it:
---
{csv_content}
---")
print(f"Once created, the file will be accessible by the MCP server at: {file_path_on_server}")

# Step 2: Call the process_uploaded_file tool on the main MCP server
print(f"
Processing manually placed file with MCP tool 'process_uploaded_file'...")
process_params = {
    "file_path_on_server": file_path_on_server,
    "original_filename": dummy_filename,
    "mcp_session_id": str(uuid.uuid4()) # Use a new session ID for this example
}

mcp_call_payload = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "process_uploaded_file",
        "arguments": process_params
    },
    "id": str(uuid.uuid4())
}

try:
    process_response = requests.post(DEFAULT_MCP_SERVER_URL, json=mcp_call_payload)
    process_response.raise_for_status()
    process_result = process_response.json()
    print("Processing successful:")
    print(json.dumps(process_result, indent=2))
    
    # Extract the file_id for later use
    file_id = process_result["result"]["result"]["file_id"]
    print(f"
Extracted file_id: {file_id}")
    
except requests.exceptions.RequestException as e:
    print(f"Error during file processing: {e}")
except json.JSONDecodeError:
    print(f"Error decoding JSON response: {process_response.text if 'process_response' in locals() else ''}")
except KeyError as e:
    print(f"Missing key in JSON response: {e}. Full response: {process_result}")


## 2. Querying Processed Documents (RAG)

Once a document is processed and indexed, you can use the `QueryProcessedDocumentData` tool (exposed by the `ScientificWorkflowAgent`) to ask questions about its content. The agent will use its RAG capabilities to retrieve relevant information and formulate an answer.


In [None]:
import openai
import json

# Ensure file_id is set from the previous cell's output
if 'file_id' not in locals():
    print("Please run the previous cell to upload and process a document first.")
else:
    user_query = f"What is Alice's age from the document with file_id {file_id}?"
    
    client = openai.OpenAI(
        base_url=AGENT_GATEWAY_URL,
        api_key=API_KEY,
    )
    
    print(f"Sending query to agent: {user_query}")
    
    try:
        response = client.chat.completions.create(
            model=MODEL_ID,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."}
                {"role": "user", "content": user_query}
            ],
            stream=False,
        )
        
        print("
--- Agent Response ---")
        if response.choices:
            print(response.choices[0].message.content)
        else:
            print("The agent did not return any choices.")
            
    except openai.APIConnectionError as e:
        print(f"Failed to connect to the Agent Gateway: {e}")
    except openai.APIStatusError as e:
        print(f"The Agent Gateway returned an error status code: {e.status_code}. Response: {e.response.text}")
    "except Exception as e:
        print(f"An unexpected error occurred: {e}")


## 3. Listing Uploaded Files

You can list all files that have been uploaded and processed in the current session using the `ListUploadedFiles` tool. This is useful for keeping track of the documents available for RAG.


In [None]:
import requests
import json

# Call the list_uploaded_files tool on the main MCP server
print(f"Listing uploaded files for the current session...")
list_params = {
    "mcp_session_id": file_id # Use the same session ID as the processed file
}

mcp_call_payload = {
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
        "name": "list_uploaded_files",
        "arguments": list_params
    },
    "id": str(uuid.uuid4())
}

try:
    list_response = requests.post(DEFAULT_MCP_SERVER_URL, json=mcp_call_payload)
    list_response.raise_for_status()
    list_result = list_response.json()
    print("Uploaded files:")
    print(json.dumps(list_result, indent=2))
except requests.exceptions.RequestException as e:
    print(f"Error listing files: {e}")
except json.JSONDecodeError:
    print(f"Error decoding JSON response: {list_response.text}")


This notebook demonstrated how to upload, process, and query documents using the `ScientificWorkflowAgent`'s RAG capabilities. In the next notebook, we will explore the `sandbox_mcp_server` for secure code execution.
