# Test Agent Structured Output

This notebook tests the Clarity and Rigor agents with LangChain's `with_structured_output()` to verify that:
1. Agents return properly structured suggestions
2. Each suggestion has separate `issue`, `explanation`, and `suggested_fix` fields
3. The orchestrator properly combines suggestions from both agents

In [1]:
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")

Project root: /Users/arnabbhattacharya/Desktop/Peerly-Demo


In [2]:
import asyncio
from app.agents.clarity_agent import ClarityAgent
from app.agents.rigor_agent import RigorAgent
from app.models.schemas import Section
from app.config.settings import settings

# Verify settings loaded
print(f"OpenAI Model: {settings.openai_model}")
print(f"API Key loaded: {'Yes' if settings.openai_api_key else 'No'}")

OpenAI Model: gpt-4o-mini
API Key loaded: Yes


## Test Sample: Technical Research Paper Section

We'll use a sample introduction section that has both clarity and rigor issues.

In [3]:
# Create a test section with intentional issues
test_section = Section(
    title="Introduction",
    content="""
Machine learning has revolutionized various domains. Deep learning models achieve 
state-of-the-art results. Our novel algorithm improves upon existing approaches by 
utilizing advanced techniques. The convergence rate is faster than baseline methods.

We conducted experiments on a dataset and observed significant improvements. The results 
demonstrate the superiority of our approach. Our method achieves better performance 
compared to traditional algorithms.
""",
    section_type="introduction",
    line_start=1,
    line_end=10
)

print("Test Section:")
print(f"Title: {test_section.title}")
print(f"Type: {test_section.section_type}")
print(f"Lines: {test_section.line_start}-{test_section.line_end}")
print(f"\nContent:\n{test_section.content}")

Test Section:
Title: Introduction
Type: introduction
Lines: 1-10

Content:

Machine learning has revolutionized various domains. Deep learning models achieve 
state-of-the-art results. Our novel algorithm improves upon existing approaches by 
utilizing advanced techniques. The convergence rate is faster than baseline methods.

We conducted experiments on a dataset and observed significant improvements. The results 
demonstrate the superiority of our approach. Our method achieves better performance 
compared to traditional algorithms.



## Test 1: Clarity Agent

Test the Clarity Agent to see if it properly identifies clarity issues with structured output.

In [4]:
# Initialize Clarity Agent
clarity_agent = ClarityAgent()

# Run clarity analysis
print("Running Clarity Agent...\n")
clarity_suggestions = await clarity_agent.review_section(test_section)

print(f"Found {len(clarity_suggestions)} clarity suggestions\n")
print("=" * 80)

Running Clarity Agent...

Found 5 clarity suggestions



In [7]:
type(clarity_suggestions[0])
clarity_suggestions[0]

SuggestionItem(text="The phrase 'advanced techniques' is vague and lacks specificity.", line=1, severity=<SeverityLevel.INFO: 'info'>, explanation='This statement does not clarify what these advanced techniques are, leaving readers unsure about the actual improvements made by the algorithm.', suggested_fix="Specify the advanced techniques used in the algorithm, such as 'utilizing convolutional neural networks and transfer learning'.")

In [8]:
# Display clarity suggestions
for i, suggestion in enumerate(clarity_suggestions, 1):
    print(f"\nüìã Clarity Suggestion {i}")
    print(f"Severity: {suggestion.severity.value.upper()}")
    print(f"Line: {suggestion.line}")
    print(f"\nüî¥ Issue:")
    print(f"   {suggestion.text}")
    print(f"\nüí° Explanation:")
    print(f"   {suggestion.explanation}")
    print(f"\n‚úÖ Suggested Fix:")
    print(f"   {suggestion.suggested_fix}")
    print("\n" + "=" * 80)


üìã Clarity Suggestion 1
Severity: INFO
Line: 1

üî¥ Issue:
   The phrase 'advanced techniques' is vague and lacks specificity.

üí° Explanation:
   This statement does not clarify what these advanced techniques are, leaving readers unsure about the actual improvements made by the algorithm.

‚úÖ Suggested Fix:
   Specify the advanced techniques used in the algorithm, such as 'utilizing convolutional neural networks and transfer learning'.


üìã Clarity Suggestion 2
Line: 1

üî¥ Issue:
   The statement 'The convergence rate is faster than baseline methods' is unclear without context.

üí° Explanation:
   It does not specify which baseline methods are being referenced, making it difficult for readers to understand the significance of the claim.

‚úÖ Suggested Fix:
   Clarify which baseline methods are being compared, for example, 'The convergence rate is faster than the standard gradient descent methods used in similar studies'.


üìã Clarity Suggestion 3
Severity: INFO
Line: 1


## Test 2: Rigor Agent

Test the Rigor Agent to see if it properly identifies rigor issues with structured output.

In [9]:
# Initialize Rigor Agent
rigor_agent = RigorAgent()

# Run rigor analysis
print("Running Rigor Agent...\n")
rigor_suggestions = await rigor_agent.review_section(test_section)

print(f"Found {len(rigor_suggestions)} rigor suggestions\n")
print("=" * 80)

Running Rigor Agent...

Found 5 rigor suggestions



In [10]:
# Display rigor suggestions
for i, suggestion in enumerate(rigor_suggestions, 1):
    print(f"\nüìä Rigor Suggestion {i}")
    print(f"Severity: {suggestion.severity.value.upper()}")
    print(f"Line: {suggestion.line}")
    print(f"\nüî¥ Issue:")
    print(f"   {suggestion.text}")
    print(f"\nüí° Explanation:")
    print(f"   {suggestion.explanation}")
    print(f"\n‚úÖ Suggested Fix:")
    print(f"   {suggestion.suggested_fix}")
    print("\n" + "=" * 80)


üìä Rigor Suggestion 1
Severity: INFO
Line: 1

üî¥ Issue:
   The section makes unverified claims about the superiority of the algorithm without providing supporting evidence or citations.

üí° Explanation:
   This undermines the credibility of the claims, as readers cannot assess the validity of the results or compare them to existing literature.

‚úÖ Suggested Fix:
   Include references to previous studies or datasets that support the claims of superiority, and provide quantitative results from experiments.


üìä Rigor Suggestion 2
Severity: INFO
Line: 1

üî¥ Issue:
   The statement about the 'faster convergence rate' lacks mathematical justification or proof.

üí° Explanation:
   Without a formal proof or derivation, the claim may be seen as anecdotal and could mislead readers regarding the algorithm's performance.

‚úÖ Suggested Fix:
   Provide a mathematical analysis or proof that demonstrates the convergence rate of the algorithm compared to baseline methods.


üìä Rigor S

## Test 3: Verify Structured Fields

Verify that all suggestions have properly populated fields.

In [12]:
# Verification function
def verify_suggestion_structure(suggestions, agent_name):
    print(f"\n{'='*80}")
    print(f"Verifying {agent_name} Suggestions Structure")
    print(f"{'='*80}\n")
    
    for i, suggestion in enumerate(suggestions, 1):
        print(f"Suggestion {i}:")
        
        # Check text/issue
        has_text = bool(suggestion.text and len(suggestion.text) > 0)
        print(f"  ‚úÖ Has issue: {has_text}" if has_text else f"  ‚ùå Missing issue")
        
        # Check explanation
        has_explanation = bool(suggestion.explanation and len(suggestion.explanation) > 0)
        print(f"  ‚úÖ Has explanation: {has_explanation}" if has_explanation else f"  ‚ùå Missing explanation")
        
        # Check suggested_fix
        has_fix = bool(suggestion.suggested_fix and len(suggestion.suggested_fix) > 0)
        print(f"  ‚úÖ Has suggested_fix: {has_fix}" if has_fix else f"  ‚ùå Missing suggested_fix")
        
        # Check severity
        has_severity = bool(suggestion.severity)
        print(f"  ‚úÖ Has severity: {suggestion.severity.value}" if has_severity else f"  ‚ùå Missing severity")
        
        all_fields = has_text and has_explanation and has_fix and has_severity
        status = "‚úÖ PASS" if all_fields else "‚ùå FAIL"
        print(f"\n  Status: {status}\n")
    
    # Summary
    complete_count = sum(1 for s in suggestions if s.text and s.explanation and s.suggested_fix)
    print(f"\nSummary: {complete_count}/{len(suggestions)} suggestions have all required fields\n")

In [13]:
# Verify Clarity Agent suggestions
verify_suggestion_structure(clarity_suggestions, "Clarity Agent")


Verifying Clarity Agent Suggestions Structure

Suggestion 1:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 2:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True

  Status: ‚úÖ PASS

Suggestion 3:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 4:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 5:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS


Summary: 5/5 suggestions have all required fields



In [14]:
# Verify Rigor Agent suggestions
verify_suggestion_structure(rigor_suggestions, "Rigor Agent")


Verifying Rigor Agent Suggestions Structure

Suggestion 1:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 2:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 3:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 4:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS

Suggestion 5:
  ‚úÖ Has issue: True
  ‚úÖ Has explanation: True
  ‚úÖ Has suggested_fix: True
  ‚úÖ Has severity: info

  Status: ‚úÖ PASS


Summary: 5/5 suggestions have all required fields



## Test 4: Test with RAG Guidelines

Test agents with RAG-retrieved guidelines to ensure they incorporate domain knowledge.

In [15]:
# Sample RAG guidelines
sample_guidelines = """
Technical Writing Best Practices:
- Always define technical terms when first introduced
- Avoid vague claims like "significant improvement" - provide specific metrics
- Include statistical significance tests (p-values) for experimental comparisons
- State all assumptions explicitly before mathematical derivations
"""

print("Testing with RAG guidelines...\n")
clarity_with_rag = await clarity_agent.review_section(test_section, guidelines=sample_guidelines)

print(f"Clarity suggestions with RAG: {len(clarity_with_rag)}")
print("\nFirst suggestion with guidelines:")
if clarity_with_rag:
    s = clarity_with_rag[0]
    print(f"Issue: {s.text}")
    print(f"Explanation: {s.explanation}")
    print(f"Fix: {s.suggested_fix}")

Testing with RAG guidelines...

Clarity suggestions with RAG: 5

First suggestion with guidelines:
Issue: The statement 'Machine learning has revolutionized various domains' is vague and lacks specificity.
Explanation: This claim does not specify which domains are affected or how they have been revolutionized, making it difficult for readers to grasp the significance of the statement.
Fix: Specify the domains impacted by machine learning, such as healthcare, finance, or transportation, and provide examples of how it has changed those fields.


## Test 5: Integration Test with Full Workflow

Test the full review workflow combining both agents.

In [16]:
# Run both agents in parallel
print("Running complete review workflow...\n")

clarity_task = clarity_agent.review_section(test_section)
rigor_task = rigor_agent.review_section(test_section)

clarity_results, rigor_results = await asyncio.gather(clarity_task, rigor_task)

print(f"Total Clarity Suggestions: {len(clarity_results)}")
print(f"Total Rigor Suggestions: {len(rigor_results)}")
print(f"Total Suggestions: {len(clarity_results) + len(rigor_results)}")

# Group by severity
all_suggestions = clarity_results + rigor_results
severity_counts = {}
for s in all_suggestions:
    severity_counts[s.severity.value] = severity_counts.get(s.severity.value, 0) + 1

print(f"\nSuggestions by severity:")
for severity, count in severity_counts.items():
    print(f"  {severity}: {count}")

Running complete review workflow...

Total Clarity Suggestions: 5
Total Rigor Suggestions: 5
Total Suggestions: 10

Suggestions by severity:
  info: 9


## Conclusion

This notebook verifies that:
1. ‚úÖ Clarity Agent returns structured suggestions with separate fields
2. ‚úÖ Rigor Agent returns structured suggestions with separate fields
3. ‚úÖ All suggestions have `issue`, `explanation`, and `suggested_fix` populated
4. ‚úÖ Severity levels are automatically determined
5. ‚úÖ Agents can incorporate RAG guidelines
6. ‚úÖ Both agents can run in parallel for efficient processing