# Test REST APIs

This notebook provides example commands for interacting with the AI-Q Research Assistant backend APIs. Before using this notebook, ensure you have followed the deployment guide to deploy the NVIDIA RAG blueprint and the AI-Q Reseearch Assistant.

 Update the following environment variables with the IP address of the server where you are running these tests.

In [None]:
import os 
import subprocess
import sys
os.environ["INFERENCE_ORIGIN"]="http://your-server-ip:3838"
os.environ["RAG_BASE_URL"]= "http://your-server-ip"

#Helper function to measure the time taken to execute the query
import time

def run_curl(curl_cmd, max_output_lines=50, save_to_file=None):
    print(f"\nExecuting: {curl_cmd}")
    start = time.time()

    # Run the command
    result = subprocess.run(curl_cmd, shell=True, capture_output=True, text=True)

    duration = time.time() - start
    
    # Handle large outputs
    output_lines = result.stdout.strip().split('\n') if result.stdout.strip() else []
    
    if save_to_file and result.stdout.strip():
        with open(save_to_file, 'w') as f:
            f.write(result.stdout.strip())
        print(f"\n--- Response saved to {save_to_file} ---")
        print(f"Output contains {len(output_lines)} lines")
        
        # Show first few and last few lines
        if len(output_lines) > max_output_lines:
            print("\n--- First 20 lines ---")
            for line in output_lines[:20]:
                print(line)
            print(f"\n... ({len(output_lines) - 40} lines truncated) ...")
            print("\n--- Last 20 lines ---")
            for line in output_lines[-20:]:
                print(line)
        else:
            print("\n--- Full Response ---")
            print(result.stdout.strip())
    else:
        print("\n--- Response ---")
        if len(output_lines) > max_output_lines:
            print(f"Output too large ({len(output_lines)} lines). Showing first {max_output_lines} lines:")
            for line in output_lines[:max_output_lines]:
                print(line)
            print(f"\n... ({len(output_lines) - max_output_lines} more lines truncated)")
            print("💡 Tip: Use save_to_file parameter to save full output to a file")
        else:
            print(result.stdout.strip())

    if result.stderr:
        print("\n--- Errors ---")
        print(result.stderr.strip())

    print(f"\n⏱️  Completed in {duration:.2f} seconds\n")

def run_curl_streaming(curl_cmd, save_to_file=None, show_progress=True):
    """
    Special handler for streaming endpoints that can produce large outputs
    """
    print(f"\nExecuting streaming command: {curl_cmd}")
    start = time.time()
    
    if save_to_file:
        # Redirect output to file to avoid IOPub limit
        cmd_with_redirect = f"{curl_cmd} > {save_to_file} 2>&1"
        result = subprocess.run(cmd_with_redirect, shell=True)
        
        duration = time.time() - start
        print(f"\n✅ Streaming response saved to {save_to_file}")
        
        # Show file size and first few lines
        try:
            with open(save_to_file, 'r') as f:
                content = f.read()
                lines = content.split('\n')
                print(f"📄 File size: {len(content)} characters, {len(lines)} lines")
                
                if len(lines) > 10:
                    print("\n--- First 10 lines of response ---")
                    for line in lines[:10]:
                        print(line)
                    print("... (see full response in file)")
                else:
                    print("\n--- Full Response ---")
                    print(content)
        except Exception as e:
            print(f"Error reading saved file: {e}")
            
        print(f"\n⏱️  Completed in {duration:.2f} seconds\n")
    else:
        # For non-file output, use regular run_curl with truncation
        run_curl(curl_cmd, max_output_lines=30)

## Creating a Collection

The following command creates a collection. For any unsuccessful query executions, check the logs in the following containers for troubleshooting:
 - `docker logs aira-backend`
 - `docker logs ingestor-server`

In [None]:
run_curl(f"""curl -s -X POST {os.environ['INFERENCE_ORIGIN']}/collection \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{{
    "collection_name": "test_collection",
    "vdb_endpoint": "http://milvus:19530",
    "embedding_dimension": 2048,
    "metadata_schema": []
  }}'""")

## Upload a File to the Collection

The following command uploads a file to the test collection. For any unsuccessful query executions, check the logs in the following container for troubleshooting:

- `docker logs ingestor-server`

In [None]:
 # Update the directory based on where you have downloaded the notebook
 
 run_curl(f"""curl -X POST {os.environ['INFERENCE_ORIGIN']}/documents \
  -H 'accept: application/json' \
  -F 'documents=@simple.pdf' \
  -F 'data={{\"collection_name\": \"test_collection\"}}'""")

## Ask Questions About Your PDF to RAG

The following endpoint allows you to ask questions about your PDF to RAG. For any errors, check the logs of the container below for troubleshooting:

- `docker logs rag-server`

In [None]:
 run_curl(f"""curl -X POST "{os.environ['RAG_BASE_URL']}:8081/v1/generate" \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{{ 
            "messages": [ 
                {{ \
                "role": "user", 
                "content": "Give me a summary of the topic" 
                }} 
            ], 
            "use_knowledge_base": true, 
            "temperature": 0.2, 
            "top_p": 0.7, 
            "max_tokens": 1024, 
            "reranker_top_k": 10, 
            "vdb_top_k": 100, 
            "vdb_endpoint": "http://milvus:19530", 
            "collection_name": "test_collection", 
            "enable_query_rewriting": false, 
            "enable_reranker": false, 
            "enable_guardrails": false, 
            "enable_citations": true, 
            "stop": [] 
            }}'""")

## Generate a Sample Research Plan

This command replicates the first part of the AI-Q Research Assistant workflow by creating a research plan given a collection, report topic, and report organization prompt.Troubleshooting Steps:
- If no response is given, check the backend logs by running `docker logs aira-backend`.
- If a 403 or 404 is given, check that the nemotron API key and model configuration in the aira config.yaml file are correct. Try making a direct shell request to the nemotron model.

In [None]:
# Using streaming function to handle potentially large output
run_curl_streaming(f"""curl -s -X POST {os.environ['INFERENCE_ORIGIN']}/generate_query/stream \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{{
            "topic": "An awesome report",
            "report_organization": "A comprehensive report with an introduction, body, and conclusion with a witty joke",
            "num_queries": 3,
            "llm_name": "nemotron"
    }}'""", save_to_file="research_plan.json")

## Generate a Report

The following command generates a report based on the topic, report organization, and queries. If you encounter any errors while running it, you can troubleshoot by checking the logs of the following container:
- `docker logs aira-backend`

*Note that the following command outputs the response.txt file instead of printing the output to the console.*


In [None]:
# Using streaming function to handle large report output
run_curl_streaming(f"""curl -s -X POST {os.environ['INFERENCE_ORIGIN']}/generate_summary/stream \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{{
        "topic": "An awesome report",
        "report_organization": "A comprehensive report with an introduction, body, and conclusion with a witty joke",
        "queries": [
            {{
            "query": "overview of the topic",
            "report_section": "overview",
            "rationale": "key information"
            }},
            {{
            "query": "big mac ingredients",
            "report_section": "web search",
            "rationale": "just for fun"
            }}
        ],
        "search_web": true,
        "rag_collection": "test_collection",
        "reflection_count": 1,
        "llm_name": "nemotron"
    }}'""", save_to_file="report.txt")

## Ask Q&A

The following command replicates the human in the loop interactions available in the web frontend including editing the report plan, the draft report, and doing Q&A. 

Troubleshooting tips:You can troubleshoot by checking the logs of the container below:
- `docker logs aira-backend`

The Q&A endpoint relies on the instruct_llm configured in the AI-Q Research Assistant config.yaml file. Verify this configuration and attempt a direct shell command against that llm.

In [None]:
# Using streaming function to handle Q&A response
run_curl_streaming(f"""curl -s -X POST {os.environ['INFERENCE_ORIGIN']}/artifact_qa/stream \
        -H "accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{{
            "additional_context": "",
            "artifact": "\\n\\n# An extensive report on the topic of the users favorite PDF",
            "chat_history": [],
            "question": "edit the title to something more professional",
            "rewrite_mode": "entire",
            "use_internet": false,
            "rag_collection": "test_collection"
        }}'""", save_to_file="qa_response.txt")