# Grand Jury Server Testing Notebook

This notebook provides interactive testing for the Grand Jury FastAPI server endpoints.

## Prerequisites
1. Start the server: `uvicorn app.main:app --reload`
2. Server should be running on `http://localhost:8000`
3. Install required packages if needed: `uv add requests pandas`

In [3]:
# Import required libraries
import requests
import json
import pandas as pd
from datetime import datetime, timedelta
from typing import Dict, List, Any
import warnings
warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully")

✅ Libraries imported successfully


In [4]:
# Configuration
BASE_URL_LOCAL = "http://localhost:8000"
BASE_URL_RENDER = "https://grandjury-server.onrender.com"
API_KEY = "test-key"  # From VALID_KEYS in main.py
BASE_URL = BASE_URL_RENDER  # Use local URL for testing

# Headers for authenticated requests
AUTH_HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Headers for non-authenticated requests
HEADERS = {
    "Content-Type": "application/json"
}

print(f"🔧 Configuration set for server: {BASE_URL}")
print(f"🔑 Using API key: {API_KEY}")

🔧 Configuration set for server: https://grandjury-server.onrender.com
🔑 Using API key: test-key


In [5]:
# Test server connectivity
def test_server_health():
    """Check if the server is running and accessible."""
    try:
        response = requests.get(f"{BASE_URL}/docs", timeout=5)
        if response.status_code == 200:
            print("✅ Server is running and accessible")
            print(f"📖 API docs available at: {BASE_URL}/docs")
            return True
        else:
            print(f"⚠️ Server responded with status: {response.status_code}")
            return False
    except requests.exceptions.RequestException as e:
        print(f"❌ Cannot connect to server: {e}")
        print("💡 Make sure to start the server first:")
        print("   uvicorn app.main:app --reload")
        return False

# Test server health
server_ok = test_server_health()

✅ Server is running and accessible
📖 API docs available at: https://grandjury-server.onrender.com/docs


## Sample Data Generation

Let's create realistic sample data for testing the verdict endpoints.

In [5]:
# Custom sample data: 3 voters, some inferences with 2 voters, some with all 3
sample_data = [
    {
        "inference_id": 1,
        "input": "input1",
        "output": "output1",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": True,
        "voter_id": 101,
        "vote_time": datetime(2024, 7, 7, 19, 22, 30, 608761).isoformat(),
        "voter_prompt_id": 0
    },
    {
        "inference_id": 2,
        "input": "input2",
        "output": "output2",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": True,
        "voter_id": 102,
        "vote_time": datetime(2024, 7, 7, 19, 22, 30, 608770).isoformat(),
        "voter_prompt_id": 0
    },
    # inference 3 has all 3 voters
    {
        "inference_id": 3,
        "input": "input3",
        "output": "output3",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": True,
        "voter_id": 101,
        "vote_time": datetime(2024, 8, 6, 19, 22, 30, 608777).isoformat(),
        "voter_prompt_id": 0
    },
    {
        "inference_id": 3,
        "input": "input3",
        "output": "output3",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": False,
        "voter_id": 102,
        "vote_time": datetime(2024, 8, 6, 19, 22, 30, 608784).isoformat(),
        "voter_prompt_id": 0
    },
    {
        "inference_id": 3,
        "input": "input3",
        "output": "output3",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": None,
        "voter_id": 103,
        "vote_time": datetime(2024, 8, 6, 19, 22, 30, 608790).isoformat(),
        "voter_prompt_id": 0
    },
    # inference 4 has 2 voters
    {
        "inference_id": 4,
        "input": "input4",
        "output": "output4",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": False,
        "voter_id": 101,
        "vote_time": datetime(2024, 10, 6, 19, 22, 30, 608796).isoformat(),
        "voter_prompt_id": 0
    },
    {
        "inference_id": 4,
        "input": "input4",
        "output": "output4",
        "inference_time": datetime(2024, 7, 6, 19, 22, 30, 608736).isoformat(),
        "vote": False,
        "voter_id": 102,
        "vote_time": datetime(2024, 10, 6, 19, 22, 30, 608803).isoformat(),
        "voter_prompt_id": 0
    }
]

# Define the voter list
voter_list = [101, 102, 103, 104]

# Show results
print(f"📊 Generated {len(sample_data)} sample vote records")
print(f"👥 Voter list: {voter_list}")
print("\n📋 Sample of the data:")
df_sample = pd.DataFrame(sample_data)
df_sample

📊 Generated 7 sample vote records
👥 Voter list: [101, 102, 103, 104]

📋 Sample of the data:


Unnamed: 0,inference_id,input,output,inference_time,vote,voter_id,vote_time,voter_prompt_id
0,1,input1,output1,2024-07-06T19:22:30.608736,True,101,2024-07-07T19:22:30.608761,0
1,2,input2,output2,2024-07-06T19:22:30.608736,True,102,2024-07-07T19:22:30.608770,0
2,3,input3,output3,2024-07-06T19:22:30.608736,True,101,2024-08-06T19:22:30.608777,0
3,3,input3,output3,2024-07-06T19:22:30.608736,False,102,2024-08-06T19:22:30.608784,0
4,3,input3,output3,2024-07-06T19:22:30.608736,,103,2024-08-06T19:22:30.608790,0
5,4,input4,output4,2024-07-06T19:22:30.608736,False,101,2024-10-06T19:22:30.608796,0
6,4,input4,output4,2024-07-06T19:22:30.608736,False,102,2024-10-06T19:22:30.608803,0


## 1. Testing Scoring Endpoint

Test the `/api/v1/evaluate` endpoint that requires API authentication.

In [6]:
def test_scoring_endpoint():
    """Test the /api/v1/evaluate endpoint."""
    print("🧪 Testing Scoring Endpoint...")
    
    payload = {
        "previous_score": 0.5,
        "previous_timestamp": (datetime.now() - timedelta(hours=2)).isoformat(),
        "votes": [0.8, 0.6, 0.9, 0.7, 0.5],
        "reputations": [1.0, 0.8, 1.2, 0.9, 0.7]
    }
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/evaluate",
            headers=AUTH_HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print("✅ Success! Response:")
            print(f"   Score: {result['score']:.4f}")
            print(f"   Freshness: {result['freshness']:.4f}")
            print(f"   Timestamp: {result['timestamp']}")
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test scoring endpoint
if server_ok:
    scoring_result = test_scoring_endpoint()
else:
    print("⚠️ Skipping scoring test - server not accessible")

🧪 Testing Scoring Endpoint...
Status Code: 200
✅ Success! Response:
   Score: 0.7261
   Freshness: 1.0000
   Timestamp: 2025-07-06T21:14:20.002376Z


## 2. Testing Vote Time Histogram

Test the `/api/v1/verdict/histogram` endpoint.

In [39]:
def test_vote_histogram(data, duration_minutes=60, gross=True):
    """Test the vote time histogram endpoint."""
    print(f"📊 Testing Vote Time Histogram (duration: {duration_minutes}min, gross: {gross})...")
    
    payload = {
        "data": data,
        "duration_minutes": duration_minutes,
        "gross": gross
    }
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/verdict/histogram",
            headers=HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print(f"✅ Success! Found {len(result)} time buckets")
            
            # Display first few buckets
            for i, (timestamp, count) in enumerate(list(result.items())[:5]):
                print(f"   {timestamp}: {count} votes")
            
            if len(result) > 5:
                print(f"   ... and {len(result) - 5} more buckets")
            
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test histogram endpoint
if server_ok:
    histogram_result = test_vote_histogram(sample_data, duration_minutes=120, gross=True)
else:
    print("⚠️ Skipping histogram test - server not accessible")

📊 Testing Vote Time Histogram (duration: 120min, gross: True)...
Status Code: 200
✅ Success! Found 3 time buckets
   2024-07-07 18:00:00: 2 votes
   2024-08-06 18:00:00: 3 votes
   2024-10-06 18:00:00: 2 votes


## 3. Testing Vote Completeness

Test the `/api/v1/verdict/completeness` endpoint.

In [40]:
def test_vote_completeness(data, voter_list, inference_ids=None, gross=False):
    """Test the vote completeness endpoint."""
    print(f"📋 Testing Vote Completeness (gross: {gross})...")
    
    payload = {
        "data": data,
        "voter_list": voter_list,
        "gross": gross
    }
    
    if inference_ids is not None:
        payload["inference_ids"] = inference_ids
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/verdict/completeness",
            headers=HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print("✅ Success! Completeness results:")
            
            if isinstance(result, dict):
                for inf_id, completeness in result.items():
                    print(f"   Inference {inf_id}: {completeness:.1%}")
            else:
                print(f"   Overall completeness: {result:.1%}")
            
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test completeness endpoint
if server_ok:
    print("Testing per-inference completeness:")
    completeness_result = test_vote_completeness(
        sample_data, 
        voter_list, 
        inference_ids=[1, 2, 3, 4], 
        gross=False
    )
    
    print("\nTesting overall completeness:")
    completeness_overall = test_vote_completeness(
        sample_data, 
        voter_list, 
        gross=True
    )
else:
    print("⚠️ Skipping completeness test - server not accessible")

Testing per-inference completeness:
📋 Testing Vote Completeness (gross: False)...
Status Code: 200
✅ Success! Completeness results:
   Inference 1: 25.0%
   Inference 2: 25.0%
   Inference 3: 75.0%
   Inference 4: 50.0%

Testing overall completeness:
📋 Testing Vote Completeness (gross: True)...
Status Code: 200
✅ Success! Completeness results:
   Overall completeness: 75.0%


## 4. Testing Population Confidence

Test the `/api/v1/verdict/population-confidence` endpoint.

In [41]:
def test_population_confidence(data, voter_list, inference_ids=None):
    """Test the population confidence endpoint."""
    
    payload = {
        "data": data,
        "voter_list": voter_list
    }
    
    if inference_ids is not None:
        payload["inference_ids"] = inference_ids
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/verdict/population-confidence",
            headers=HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print("✅ Success! Population confidence results:")
            
            if isinstance(result, dict):
                for inf_id, confidence in result.items():
                    print(f"   Inference {inf_id}: {confidence:.1%}")
            else:
                print(f"   Overall confidence: {result:.1%}")
            
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test population confidence endpoint
if server_ok:
    print("Testing per-inference confidence:")
    confidence_result = test_population_confidence(
        sample_data, 
        voter_list, 
        inference_ids=[1, 2, 3, 4],
    )
    
    print("\nTesting overall confidence:")
    confidence_overall = test_population_confidence(
        sample_data, 
        voter_list
    )
else:
    print("⚠️ Skipping confidence test - server not accessible")

Testing per-inference confidence:
Status Code: 200
✅ Success! Population confidence results:
   Inference 1: 25.0%
   Inference 2: 25.0%
   Inference 3: 75.0%
   Inference 4: 50.0%

Testing overall confidence:
Status Code: 200
✅ Success! Population confidence results:
   Overall confidence: 75.0%


## 5. Testing Majority Good Votes

Test the `/api/v1/verdict/majority-good` endpoint.

In [42]:
def test_majority_good_votes(data, good_vote=True, threshold=0.5):
    """Test the majority good votes endpoint."""
    print(f"👍 Testing Majority Good Votes (good_vote: {good_vote}, threshold: {threshold})...")
    
    payload = {
        "data": data,
        "good_vote": good_vote,
        "threshold": threshold
    }
    
    # print(f"Payload: {json.dumps(payload, indent=2)}")
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/verdict/majority-good",
            headers=HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print("✅ Success! Majority good votes result:")
            print(result)
            print(f"   Number of inferences with majority good votes: {result['count']}")
            
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test majority good votes endpoint
if server_ok:
    print("Testing with boolean True votes:")
    majority_result = test_majority_good_votes(
        sample_data, 
        good_vote=True, 
        threshold=0.5
    )
    
    print("\nTesting with string 'good' votes:")
    majority_result_string = test_majority_good_votes(
        sample_data, 
        good_vote=True, 
        threshold=0.3
    )
else:
    print("⚠️ Skipping majority votes test - server not accessible")

Testing with boolean True votes:
👍 Testing Majority Good Votes (good_vote: True, threshold: 0.5)...
Status Code: 200
✅ Success! Majority good votes result:
{'count': 2}
   Number of inferences with majority good votes: 2

Testing with string 'good' votes:
👍 Testing Majority Good Votes (good_vote: True, threshold: 0.3)...
Status Code: 200
✅ Success! Majority good votes result:
{'count': 3}
   Number of inferences with majority good votes: 3


## 6. Testing Votes Distribution

Test the `/api/v1/verdict/votes-distribution` endpoint.

In [43]:
def test_votes_distribution(data, inference_ids=None):
    """Test the votes distribution endpoint."""
    print(f"📊 Testing Votes Distribution...")
    
    payload = {
        "data": data
    }
    
    if inference_ids is not None:
        payload["inference_ids"] = inference_ids
    
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/verdict/votes-distribution",
            headers=HEADERS,
            json=payload
        )
        
        print(f"Status Code: {response.status_code}")
        
        if response.status_code == 200:
            result = response.json()
            print("✅ Success! Votes distribution:")
            print(result)
            
            for inf_id, count in sorted(result.items()):
                print(f"   Inference {inf_id}: {count} votes")
            
            return result
        else:
            print(f"❌ Error: {response.text}")
            return None
            
    except Exception as e:
        print(f"❌ Connection error: {e}")
        return None

# Test votes distribution endpoint
if server_ok:
    print("Testing distribution for all inferences:")
    distribution_result = test_votes_distribution(sample_data)
    
    print("\nTesting distribution for specific inferences:")
    distribution_specific = test_votes_distribution(
        sample_data, 
        inference_ids=[1]
    )
else:
    print("⚠️ Skipping distribution test - server not accessible")

Testing distribution for all inferences:
📊 Testing Votes Distribution...
Status Code: 200
✅ Success! Votes distribution:
{'3': 3, '4': 2, '2': 1, '1': 1}
   Inference 1: 1 votes
   Inference 2: 1 votes
   Inference 3: 3 votes
   Inference 4: 2 votes

Testing distribution for specific inferences:
📊 Testing Votes Distribution...
Status Code: 200
✅ Success! Votes distribution:
{'1': 1}
   Inference 1: 1 votes


## 7. Custom Testing Section

Use this section to create your own custom tests with different data.

In [44]:
# Create custom test data
custom_data = [
    {
        "inference_id": 1,
        "voter_id": 101,
        "vote": True,
        "vote_time": "2025-07-06T10:00:00",
        "input": "Custom test input 1",
        "output": "Custom test output 1"
    },
    {
        "inference_id": 1,
        "voter_id": 102,
        "vote": True,
        "vote_time": "2025-07-06T10:15:00",
        "input": "Custom test input 1",
        "output": "Custom test output 1"
    },
    {
        "inference_id": 1,
        "voter_id": 103,
        "vote": False,
        "vote_time": "2025-07-06T10:30:00",
        "input": "Custom test input 1",
        "output": "Custom test output 1"
    }
]

custom_voters = [101, 102, 103, 104, 105]

print("📝 Custom test data created")
print(f"   Records: {len(custom_data)}")
print(f"   Voters: {custom_voters}")
print(f"   Data: {custom_data}")

📝 Custom test data created
   Records: 3
   Voters: [101, 102, 103, 104, 105]
   Data: [{'inference_id': 1, 'voter_id': 101, 'vote': True, 'vote_time': '2025-07-06T10:00:00', 'input': 'Custom test input 1', 'output': 'Custom test output 1'}, {'inference_id': 1, 'voter_id': 102, 'vote': True, 'vote_time': '2025-07-06T10:15:00', 'input': 'Custom test input 1', 'output': 'Custom test output 1'}, {'inference_id': 1, 'voter_id': 103, 'vote': False, 'vote_time': '2025-07-06T10:30:00', 'input': 'Custom test input 1', 'output': 'Custom test output 1'}]


In [45]:
# Test custom data with majority good votes
if server_ok:
    print("🧪 Testing custom data for majority good votes:")
    custom_majority = test_majority_good_votes(
        custom_data, 
        good_vote=True, 
        threshold=0.6  # 60% threshold
    )
    
    print("\n🧪 Testing custom data for completeness:")
    custom_completeness = test_vote_completeness(
        custom_data,
        custom_voters,
        gross=True
    )
else:
    print("⚠️ Skipping custom tests - server not accessible")

🧪 Testing custom data for majority good votes:
👍 Testing Majority Good Votes (good_vote: True, threshold: 0.6)...
Status Code: 200
✅ Success! Majority good votes result:
{'count': 1}
   Number of inferences with majority good votes: 1

🧪 Testing custom data for completeness:
📋 Testing Vote Completeness (gross: True)...
Status Code: 200
✅ Success! Completeness results:
   Overall completeness: 60.0%


## Summary

This notebook provides comprehensive testing for all Grand Jury server endpoints. You can:

1. **Modify test data** in the cells above to test different scenarios
2. **Change parameters** like thresholds, voter lists, and inference IDs
3. **Add new test cases** by creating custom data
4. **Debug issues** by examining the detailed responses

### Useful Tips:
- Run cells individually to test specific endpoints
- Check the server logs for debugging information
- Modify the `BASE_URL` if your server runs on a different port
- Use different API keys by changing the `API_KEY` variable

### API Documentation:
Visit `http://localhost:8000/docs` for interactive API documentation.