# Deep Search Agent Testing & Configuration Notebook

This notebook provides a testing environment for SMEs to experiment with and tweak the functionality of the DeepLitSearchAgent. DeepLitSearchAgent specializes on Agentic RAG workflows for Literature Search, but can be repurposed to be used for other Agentic search tasks in AKD. e.g. code search agent, etc. In essence, it is a multi-agent system that can be used to perform deep search on a given query, configured to specific set of sources and parameters.

## Features
- **Configurable Parameters**: Easily adjust search behavior, thresholds, and agent settings
- **Editable Prompts**: Directly modify system prompts in cells
- **Interactive Testing**: Run searches with different configurations and see results
- **Result Analysis**: Visualize and analyze search results and quality metrics

## Deep Search Agent Workflow

The DeepLitSearchAgent follows a multi-stage workflow with embedded components working together:

### 🔄 **Main Workflow Stages**
1. **Query Triage** → Determines if query needs clarification or can proceed directly
2. **Clarification** → Asks follow-up questions to refine vague queries (if needed)  
3. **Instruction Building** → Transforms user query into detailed research instructions
4. **Iterative Deep search** → Performs multiple search iterations with quality improvement
5. **Research Synthesis** → Compiles findings into comprehensive report

### 🎯 **What You Can Tweak**

#### **Configuration Parameters:**
- `MAX_RESEARCH_ITERATIONS` → Number of search refinement cycles (more = deeper but slower)
- `QUALITY_THRESHOLD` → When to stop iterating based on result quality (higher = more thorough)
- `MIN_RELEVANCY_SCORE` → Minimum score to include results (higher = more selective)
- `FULL_CONTENT_THRESHOLD` → Score needed to fetch full content (impacts depth vs. speed)
- `ENABLE_PER_LINK_ASSESSMENT` → Enable detailed relevancy scoring per result
- `USE_SEMANTIC_SCHOLAR` → Include academic paper searches from Semantic Scholar

#### **System Prompts (edit directly in cells below):**
- **Triage Agent** → Controls when clarification is needed vs. direct research
- **Clarification Agent** → Shapes what clarifying questions are asked
- **Instruction Builder** → Determines how user queries become research briefs
- **Deep Research Agent** → Guides search strategy and synthesis approach
- **Relevancy Assessor** → Defines quality standards for filtering results

#### **Key Iteration Improvements:**
- **Query Refinement** → Later searches use insights from earlier results
- **Relevancy Learning** → Results should get more targeted over iterations  
- **Quality Assessment** → Multi-rubric scoring improves result filtering
- **Content Depth** → High-scoring results trigger full content fetching

### 📊 **Monitoring Effectiveness**
The notebook tracks relevancy improvement across iterations, showing whether the agent is learning and refining its search strategy effectively.

## Setup and Imports

In [1]:
import asyncio
import json
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Any
from IPython.display import display, HTML, Markdown
import warnings
warnings.filterwarnings('ignore')

# AKD imports
from akd.agents.search.deep_search import DeepLitSearchAgent, DeepLitSearchAgentConfig
from akd.agents.search._base import LitSearchAgentInputSchema
from akd.configs.project import get_project_settings

print("✅ Setup complete!")

✅ Setup complete!


## Configuration Panel

Adjust these parameters to experiment with different search behaviors:

In [15]:
# Research Parameters
MAX_RESEARCH_ITERATIONS = 10  # Reduce for faster testing
QUALITY_THRESHOLD = 0.7      # Quality score threshold (0-1)
MAX_CLARIFYING_ROUNDS = 3    # Number of clarification rounds

# Search Behavior
AUTO_CLARIFY = True          # Automatically ask clarifying questions
USE_SEMANTIC_SCHOLAR = True  # Include Semantic Scholar searches
ENABLE_STREAMING = False     # Disable for notebook testing

# Link Assessment
ENABLE_PER_LINK_ASSESSMENT = True   # Enable relevancy assessment per link
MIN_RELEVANCY_SCORE = 0.3           # Minimum score to include results
FULL_CONTENT_THRESHOLD = 0.7        # Score threshold for full content fetch
ENABLE_FULL_CONTENT_SCRAPING = True # Enable full content scraping

# ISSN Whitelist Filtering
ENABLE_ISSN_WHITELIST_FILTER = True  # Enable ISSN whitelist filtering of sources
ISSN_WHITELIST_FILE_PATH = None       # Path to 'docs/issn_whitelist.json' or custom file; None uses default
ISSN_VALIDATION_TIMEOUT_SECONDS = 25  # Timeout for CrossRef lookups
ISSN_VALIDATION_MAX_CONCURRENCY = 8   # Max concurrent CrossRef requests

# Debug Settings
DEBUG_MODE = True            # Enable detailed logging

print(f"📊 Configuration loaded:")
print(f"   Max Iterations: {MAX_RESEARCH_ITERATIONS}")
print(f"   Quality Threshold: {QUALITY_THRESHOLD}")
print(f"   Debug Mode: {DEBUG_MODE}")
print(f"   Semantic Scholar: {USE_SEMANTIC_SCHOLAR}")
print(f"   Per-link Assessment: {ENABLE_PER_LINK_ASSESSMENT}")
print(f"   ISSN Whitelist Enabled: {ENABLE_ISSN_WHITELIST_FILTER}")
print(f"   ISSN Whitelist Path: {ISSN_WHITELIST_FILE_PATH or 'default'}")

📊 Configuration loaded:
   Max Iterations: 10
   Quality Threshold: 0.7
   Debug Mode: True
   Semantic Scholar: True
   Per-link Assessment: True
   ISSN Whitelist Enabled: True
   ISSN Whitelist Path: default


## System Prompts - Edit Directly

Modify the system prompts used by the agent components by editing the strings below:

In [16]:
# TRIAGE AGENT PROMPT - Edit this prompt to customize triage behavior
TRIAGE_AGENT_PROMPT = """IDENTITY and PURPOSE:
You are an expert query triage specialist who determines the optimal path for research requests. Your role is to quickly assess whether a query has sufficient context for immediate research or needs clarification first.

DECISION CRITERIA:
1. **Needs Clarification If:**
   - The query is too vague or broad (e.g., "Tell me about AI")
   - Key parameters are missing (timeframe, scope, specific aspects)
   - Multiple interpretations are possible
   - The research goal or intended use is unclear

2. **Ready for Instructions If:**
   - The query has clear scope and boundaries
   - Specific aspects or questions are identified
   - The depth/type of research needed is apparent
   - Any ambiguity wouldn't significantly impact research quality

3. **Direct Research (Rare) If:**
   - The query is extremely specific and detailed
   - All necessary context is provided
   - No clarification could improve the research

OUTPUT INSTRUCTIONS:
- Make a quick, decisive routing decision
- Provide brief reasoning (1-2 sentences)
- Err on the side of clarity - better to clarify than to research the wrong thing
- Consider the research domain (scientific, technical, historical, etc.)"""

print("📝 Triage Agent Prompt loaded")

📝 Triage Agent Prompt loaded


In [17]:
# CLARIFYING AGENT PROMPT - Edit this prompt to customize clarification behavior
CLARIFYING_AGENT_PROMPT = """IDENTITY and PURPOSE:
You are an expert research assistant who helps users clarify their research requests to ensure comprehensive and accurate results.

If the user hasn't specifically asked for research (unlikely), ask them what research they would like you to do.

GUIDELINES:
1. **Be concise while gathering all necessary information** 
   - Ask 2–3 clarifying questions to gather more context for research
   - Make sure to gather all the information needed to carry out the research task in a concise, well-structured manner
   - Use bullet points or numbered lists if appropriate for clarity
   - Don't ask for unnecessary information, or information that the user has already provided

2. **Maintain a Friendly and Professional Tone**
   - For example, instead of saying "I need a bit more detail on Y," say, "Could you share more detail on Y?"
   - Be encouraging and show genuine interest in helping with the research

3. **Focus on Research-Relevant Clarifications**
   - Ask about scope, depth, time period, specific aspects of interest
   - Clarify any ambiguous terms or concepts
   - Understand the intended use or application of the research

OUTPUT INSTRUCTIONS:
- Return 2-3 focused clarifying questions
- Each question should help narrow down or better define the research scope
- Questions should be clear and easy to answer"""

print("📝 Clarifying Agent Prompt loaded")

📝 Clarifying Agent Prompt loaded


In [18]:
# RESEARCH INSTRUCTION AGENT PROMPT - Edit this prompt to customize instruction building
RESEARCH_INSTRUCTION_AGENT_PROMPT = """IDENTITY and PURPOSE:
You are an expert research instruction designer who transforms user queries and clarifications into detailed, actionable research briefs for deep research execution.

Based on the following guidelines, take the users query (and any clarifications), and rewrite it into detailed research instructions. OUTPUT ONLY THE RESEARCH INSTRUCTIONS, NOTHING ELSE.

GUIDELINES:
1. **Maximize Specificity and Detail**
   - Include all known user preferences and explicitly list key attributes or dimensions to consider
   - It is of utmost importance that all details from the user are included in the expanded prompt
   - Be explicit about depth, breadth, and type of analysis required

2. **Fill in Unstated But Necessary Dimensions as Open-Ended**
   - If certain attributes are essential for meaningful output but the user has not provided them, explicitly state that they are open-ended or default to "no specific constraint"
   - Guide the research to explore these dimensions comprehensively

3. **Avoid Unwarranted Assumptions**
   - If the user has not provided a particular detail, do not invent one
   - Instead, state the lack of specification and guide the deep research model to treat it as flexible or accept all possible options

4. **Use the First Person**
   - Phrase the request from the perspective of the user
   - Example: "I need research on..." rather than "The user needs..."

5. **Structure and Formatting Requirements**
   - Explicitly request appropriate headers and formatting for clarity
   - If the research would benefit from tables, comparisons, or structured data, explicitly request them

6. **Source Requirements**
   - Specify preference for peer-reviewed sources, primary research, or authoritative publications
   - Request proper citations and attribution for all claims
   - If domain-specific sources are important, mention them explicitly

7. **Language and Style**
   - Maintain scientific rigor and objectivity
   - Request evidence-based conclusions
   - Ask for identification of conflicting viewpoints or contradictory evidence

8. **Expected Deliverables**
   - Be clear about what constitutes a complete research output
   - Specify if synthesis, analysis, or recommendations are needed
   - Request identification of gaps or areas needing further research

IMPORTANT: Ensure the instructions are comprehensive yet focused on the user's actual needs"""

print("📝 Research Instruction Agent Prompt loaded")

📝 Research Instruction Agent Prompt loaded


## Run a quick Deep Search

- Set `USER_QUERY` below.
- Runs the `DeepLitSearchAgent`.
- Shows a brief summary, research report, and top results.
- Saves a JSON to `notebooks/` with a timestamped filename.


In [19]:
# Enter your query and run the deep search
import asyncio
import json
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List

from akd.agents.search._base import LitSearchAgentInputSchema

# Ensure DEBUG_MODE exists
DEBUG_MODE = globals().get("DEBUG_MODE", False)
config = DeepLitSearchAgentConfig(
        max_research_iterations=3,
        use_semantic_scholar=True,
        enable_per_link_assessment=True,
        enable_full_content_scraping=True,
        min_relevancy_score=0.3,
        full_content_threshold=0.7,
        enable_streaming=False,
        enable_issn_filter=True,
        debug=DEBUG_MODE,
    )

# Ensure `agent` exists (fallback to default construction if earlier cell wasn't run)
try:
    agent  # type: ignore[name-defined]
except NameError:
    from akd.agents.search.deep_search import DeepLitSearchAgent, DeepLitSearchAgentConfig
    agent = DeepLitSearchAgent(
        config=DeepLitSearchAgentConfig(config=config),
        debug=DEBUG_MODE,
    )

# 1) Set your query here
USER_QUERY = "Find research papers on studies that use climate and hydrological modeling, LiDAR-derived snowpack data, and precipitation."

# 2) Prepare input payload for the agent
agent_input = LitSearchAgentInputSchema(
    query=USER_QUERY,
    category="science",
    max_results=50,
)

# 3) Run the agent (async)
print("🔎 Running DeepLitSearchAgent... (this may take a minute)")
output = await agent.arun(agent_input)

# 4) Display a concise summary
num_results = len(output.results)
iterations = getattr(output, "iterations_performed", 1)
print(f"\n✅ Done. Results: {num_results}, Iterations: {iterations}")

# The first result is the synthesized research report if present
report = None
if output.results and isinstance(output.results[0], dict) and output.results[0].get("url") == "deep-research://report":
    report = output.results[0]

if report:
    from IPython.display import Markdown, display
    display(Markdown("### Research Report"))
    display(Markdown(report.get("content", "(no report)")))

# Show top 5 links
print("\nTop results:")
shown = 0
for idx, item in enumerate(output.results):
    if idx == 0 and item.get("url") == "deep-research://report":
        continue
    title = item.get("title") or "Untitled"
    url = item.get("url") or ""
    score = item.get("relevancy_score")
    score_str = f" (score: {score:.2f})" if isinstance(score, (int, float)) else ""
    print(f"- {title}{score_str}\n  {url}")
    shown += 1
    if shown >= 5:
        break

# 5) Save full output to notebooks/ with timestamp
save_dir = Path("notebooks")
save_dir.mkdir(parents=True, exist_ok=True)

fname = f"deep_search_results_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
path = save_dir / fname

# Convert pydantic model to plain dict for JSON
as_dict: Dict[str, Any] = {
    "category": output.category,
    "iterations_performed": iterations,
    "results": output.results,
}

with path.open("w", encoding="utf-8") as f:
    json.dump(as_dict, f, ensure_ascii=False, indent=2)

print(f"\n💾 Saved results to: {path}")


[32m2025-08-09 02:37:47.454[0m | [34m[1mDEBUG   [0m | [36makd._base[0m:[36marun[0m:[36m231[0m - [34m[1mRunning DeepLitSearchAgent with params: queries=[] category='science' max_results=50 query='Find research papers on studies that use climate and hydrological modeling, LiDAR-derived snowpack data, and precipitation.'[0m
[32m2025-08-09 02:37:47.455[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.deep_search[0m:[36m_handle_triage[0m:[36m291[0m - [34m[1mStarting triage for query: Find research papers on studies that use climate and hydrological modeling, LiDAR-derived snowpack data, and precipitation.[0m
[32m2025-08-09 02:37:47.456[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.components.triage[0m:[36mprocess[0m:[36m71[0m - [34m[1mTriaging query: Find research papers on studies that use climate and hydrological modeling, LiDAR-derived snowpack data, and precipitation.[0m
[32m2025-08-09 02:37:47.456[0m | [34m[1mDEBUG   [0m | [36makd._base

🔎 Running DeepLitSearchAgent... (this may take a minute)


[32m2025-08-09 02:37:49.992[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.components.triage[0m:[36mprocess[0m:[36m77[0m - [34m[1mTriage decision: Ready for Instructions[0m
[32m2025-08-09 02:37:49.993[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.components.triage[0m:[36mprocess[0m:[36m78[0m - [34m[1mNeeds clarification: False[0m
[32m2025-08-09 02:37:49.993[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.deep_search[0m:[36m_handle_triage[0m:[36m296[0m - [34m[1mTriage decision: Ready for Instructions[0m
[32m2025-08-09 02:37:49.993[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.deep_search[0m:[36m_handle_triage[0m:[36m297[0m - [34m[1mReasoning: The query is specific and outlines clear parameters, focusing on research papers that involve climate and hydrological modeling, LiDAR-derived snowpack data, and precipitation.[0m
[32m2025-08-09 02:37:49.994[0m | [34m[1mDEBUG   [0m | [36makd.agents.search.deep_search[0m:[36m_build


✅ Done. Results: 15, Iterations: 1


### Research Report

# Research Report on Integration of Climate and Hydrological Modeling with LiDAR-Derived Snowpack and Precipitation Data

## Introduction
The integration of climate and hydrological models with LiDAR-derived snowpack data and precipitation data is a crucial advancement in understanding hydrological processes and snowpack dynamics. This report synthesizes current research on methodologies and applications of these integrated models, focusing on their implications for water resource management and climate change adaptation.

## Literature Review
### Methodologies in Climate and Hydrological Modeling
1. **LiDAR-Derived Snowpack Data**: LiDAR technology provides high-resolution snow depth measurements, which are crucial for accurate snow water equivalent (SWE) estimation. Studies like those by Harpold et al. (2014) have demonstrated the utility of LiDAR in capturing snowpack dynamics across diverse terrains ([Harpold et al., 2014](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2013wr013935)).

2. **Hydrological Modeling**: Models such as the Soil and Water Assessment Tool (SWAT) have been enhanced by integrating LiDAR-derived data to improve predictions of snowmelt and runoff in mountainous regions ([Ougahi & Rowan, 2024](https://www.mdpi.com/2072-4292/16/2/264/pdf?version=1704817456)).

3. **Climate Modeling**: The use of LiDAR data in conjunction with climate models helps in understanding the impact of snowpack changes on regional climate systems. This integration is crucial for predicting future water availability under climate change scenarios.

### Impact of Precipitation Data
- **Accuracy and Effectiveness**: Precipitation data significantly influence the accuracy of hydrological models. Studies have shown that high-resolution precipitation data improve model predictions of snowpack dynamics and water flow ([Bao et al., 2025](https://www.sciencedirect.com/science/article/abs/pii/S136481522500060X)).

- **Case Studies**: In regions like the Western United States, integrating precipitation data with LiDAR-derived snowpack information has enhanced the understanding of snowmelt processes and water resource management ([Mital et al., 2022](https://journals.ametsoc.org/downloadpdf/journals/aies/1/4/AIES-D-22-0010.1.pdf)).

### Applications and Implications
- **Water Resource Management**: The integration of these models aids in better water resource planning and management, especially in snow-dominated watersheds. It helps in predicting water availability and managing flood risks.

- **Climate Change Adaptation**: Understanding snowpack dynamics through these integrated models is vital for developing strategies to adapt to climate change impacts, particularly in regions dependent on snowmelt for water supply.

## Analysis and Synthesis
The integration of LiDAR-derived snowpack data with climate and hydrological models provides a comprehensive approach to understanding and managing water resources. While these models offer significant improvements in accuracy, challenges remain in data integration and model calibration. Further research is needed to refine these models and expand their applicability across different climatic regions.

## Conclusion
The integration of climate and hydrological models with LiDAR-derived snowpack and precipitation data represents a significant advancement in hydrological science. These models enhance our ability to predict and manage water resources in the face of climate change, offering valuable insights for policymakers and resource managers.

## References
- Harpold, A. A., et al. (2014). LiDAR‐derived snowpack data sets from mixed conifer forests across the Western United States. [Wiley Online Library](https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1002/2013wr013935).
- Ougahi, J. H., & Rowan, J. S. (2024). Combining Hydrological Models and Remote Sensing to Characterize Snowpack Dynamics in High Mountains. [MDPI](https://www.mdpi.com/2072-4292/16/2/264/pdf?version=1704817456).
- Bao, Q., et al. (2025). Quantifying the impact of different precipitation data on hydrological modeling. [ScienceDirect](https://www.sciencedirect.com/science/article/abs/pii/S136481522500060X).
- Mital, U., et al. (2022). Modeling Spatial Distribution of Snow Water Equivalent by Combining Meteorological and Satellite Data with Lidar Maps. [AMS Journals](https://journals.ametsoc.org/downloadpdf/journals/aies/1/4/AIES-D-22-0010.1.pdf).


Top results:
- Hierarchical Conditional Multi-Task Learning for Streamflow Modeling (score: 1.00)
  http://arxiv.org/abs/2410.14137v1
- On the ability of LIDAR snow depth measurements to determine or evaluate the HRU discretization in a land surface model (score: 1.00)
  https://www.mdpi.com/2306-5338/7/2/20
- Remote sensing, hydrological modeling and in situ ... (score: 1.00)
  https://www.sciencedirect.com/science/article/abs/pii/S0022169418302804
- Towards the assimilation of satellite reflectance into semi-distributed ensemble snowpack simulations (score: 1.00)
  http://arxiv.org/abs/1910.10966v1
- Snow redistribution for the hydrological modeling of alpine catchments (score: 1.00)
  https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wat2.1232

💾 Saved results to: notebooks/deep_search_results_20250809_023906.json


In [14]:
output.results

[{'url': 'deep-research://report',
  'title': 'Deep Research Report',
  'content': '# Research Report on Integration of Climate and Hydrological Modeling with LiDAR-Derived Snowpack and Precipitation Data\n\n## Introduction\nThe integration of climate and hydrological models with LiDAR-derived snowpack data and precipitation data is a growing area of research that enhances our understanding of hydrological processes and snowpack dynamics. This report synthesizes current research on methodologies and applications of these integrated models, focusing on their implications for water resource management and climate change adaptation.\n\n## Literature Review\n### Methodologies in Climate and Hydrological Modeling\n1. **LiDAR-Derived Snowpack Data**: LiDAR technology provides high-resolution snow depth measurements, which are crucial for accurate snow water equivalent (SWE) estimation. Studies like those by Harpold et al. (2014) have demonstrated the utility of LiDAR in capturing snowpack dy

In [8]:
# DEEP RESEARCH AGENT PROMPT - Edit this prompt to customize research behavior
DEEP_RESEARCH_AGENT_PROMPT = """IDENTITY and PURPOSE:
You are an expert deep research agent with advanced capabilities in scientific literature search, synthesis, and analysis. You perform comprehensive, iterative research to produce high-quality, evidence-based reports.

CORE CAPABILITIES:
1. **Iterative Search Strategy**
   - Start with broad searches to understand the landscape
   - Progressively refine queries based on initial findings
   - Identify and pursue promising research threads
   - Recognize when sufficient depth has been achieved

2. **Source Evaluation**
   - Prioritize peer-reviewed and authoritative sources
   - Assess credibility and potential biases
   - Note publication dates and relevance
   - Identify consensus vs. controversial findings

3. **Synthesis and Analysis**
   - Connect findings across multiple sources
   - Identify patterns, trends, and relationships
   - Highlight contradictions or conflicting evidence
   - Draw evidence-based conclusions

4. **Research Quality Assurance**
   - Maintain scientific rigor throughout
   - Provide proper attribution for all claims
   - Acknowledge limitations and gaps
   - Avoid overgeneralization or speculation

RESEARCH PROCESS:
1. Parse and understand the detailed research instructions
2. Plan initial search strategy and keywords
3. Execute searches and evaluate results
4. Identify knowledge gaps and refine approach
5. Iterate until quality threshold is met
6. Synthesize findings into comprehensive report

OUTPUT REQUIREMENTS:
- Well-structured research report with clear sections
- Executive summary of key findings
- Detailed evidence with proper citations
- Identification of gaps or areas for future research
- Objective presentation of conflicting viewpoints
- Tables, comparisons, or visualizations where helpful

QUALITY STANDARDS:
- Comprehensive coverage of the topic
- Balanced representation of different perspectives
- Clear distinction between evidence and interpretation
- Appropriate depth for the intended use
- Professional, academic writing style"""

print("📝 Deep Research Agent Prompt loaded")

📝 Deep Research Agent Prompt loaded


In [9]:
# RELEVANCY ASSESSOR PROMPT - Edit this prompt to customize relevancy assessment
MULTI_RUBRIC_RELEVANCY_SYSTEM_PROMPT = """IDENTITY and PURPOSE:
You are an expert literature relevance assessor with deep expertise in academic research, scientific methodology, and content quality evaluation. Your task is to evaluate content against a given query using six specific relevancy rubrics to ensure high-quality literature search results.

INTERNAL ASSISTANT STEPS:
1. Carefully read and understand the query to identify its main topic, scope, and research requirements.
2. Systematically evaluate the content across the following six relevancy dimensions:
   - Topic Alignment: Does the content directly address the main concepts in the query?
   - Content Depth: Is the treatment of the topic comprehensive or surface-level?
   - Recency Relevance: Is the content current enough, given the norms of the field?
   - Methodological Relevance: Are the methods or approaches used sound and appropriate?
   - Evidence Quality: Is the evidence credible, strong, and well-supported by reliable sources?
   - Scope Relevance: Does the scope of the content match what the query is seeking?
3. Synthesize your findings into an overall relevance judgment.
4. For each rubric, provide clear, specific reasoning to justify your assessment.

OUTPUT INSTRUCTIONS:
- Be strict in your assessments — content must meet high standards across multiple dimensions.
- For literature search, prioritize methodological soundness and evidence quality.
- Mark content as:
  - ALIGNED only if it directly addresses the main topic — not if it's merely tangentially related.
  - COMPREHENSIVE only if the content provides substantial, detailed coverage.
  - METHODOLOGICALLY_SOUND only for rigorous, appropriate research approaches.
  - HIGH_QUALITY_EVIDENCE only for credible, well-supported claims from reliable sources.
- Always provide specific, actionable reasoning for each assessment.
- Be conservative in your judgments to maintain the quality of literature search results."""

print("📝 Multi-Rubric Relevancy System Prompt loaded")

📝 Multi-Rubric Relevancy System Prompt loaded


## Agent Initialization

Initialize the DeepLitSearchAgent with your configured parameters:

In [10]:
config = DeepLitSearchAgentConfig(
    max_research_iterations=MAX_RESEARCH_ITERATIONS,
    quality_threshold=QUALITY_THRESHOLD,
    auto_clarify=AUTO_CLARIFY,
    max_clarifying_rounds=MAX_CLARIFYING_ROUNDS,
    enable_streaming=ENABLE_STREAMING,
    use_semantic_scholar=USE_SEMANTIC_SCHOLAR,
    enable_per_link_assessment=ENABLE_PER_LINK_ASSESSMENT,
    min_relevancy_score=MIN_RELEVANCY_SCORE,
    full_content_threshold=FULL_CONTENT_THRESHOLD,
    enable_full_content_scraping=ENABLE_FULL_CONTENT_SCRAPING,
    enable_issn_whitelist_filter=ENABLE_ISSN_WHITELIST_FILTER,
    issn_whitelist_file_path=ISSN_WHITELIST_FILE_PATH,
    issn_validation_timeout_seconds=ISSN_VALIDATION_TIMEOUT_SECONDS,
    issn_validation_max_concurrency=ISSN_VALIDATION_MAX_CONCURRENCY,
    debug=DEBUG_MODE
)

print("🚀 Initializing DeepLitSearchAgent...")
agent = DeepLitSearchAgent(config=config, debug=DEBUG_MODE)
print("✅ Agent initialized successfully!")

print("\n📋 Active Configuration:")
config_dict = config.model_dump()
for key, value in config_dict.items():
    print(f"   {key}: {value}")

🚀 Initializing DeepLitSearchAgent...
✅ Agent initialized successfully!

📋 Active Configuration:
   debug: True
   base_url: https://api.openai.com/v1
   api_key: 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
   model_name: gpt-4o-mini
   temperature: 0.0
   system_prompt: IDENTITY and PURPOSE
This is a conversation with a helpful and friendly AI assistant.

OUTPUT INSTRUCTIONS
- Always respond using the proper JSON schema.
- Always use the available additional information and context to enhance the response.
   max_iterations: 5
   max_research_iterations: 10
   quality_threshold: 0.7
   auto_clarify: True
   max_clarifying_rounds: 3
   enable_streaming: False
   use_semantic_scholar: True
   enable_per_link_assessment: True
   min_relevancy_score: 0.3
   full_content_threshold: 0.7
   enable_full_content_scraping: True
   enable_issn_whitelist_filter: False
   issn_wh