Gap Analysis Agent by movinam · Pull Request #86 · NASA-IMPACT/akd-core

movinam · 2025-07-28T15:10:23Z

Summary:

This PR addresses Issue #54 and #22. It introduces the gap agent to model scientific literature using knowledge graphs to analyse gaps in the content found via the Search Tool.

Details:

Fetch paper details via Semantic Scholar.
Groups sections and classifies them under key section categories.
A knowledge graph is created using the paper's metadata and section data.
For a given pre-defined / user-defined gap, the graph is traversed to select appropriate nodes.
Local answers are generated per node for the selected gap.
The local answers are given to a final writer to produce the final output.

Usage:

from akd.agents.gap_analysis.gap_analysis import GapAgent, GapAgentConfig, GapInputSchema
from akd.tools.scrapers import (
    DoclingScraperConfig
)
from akd.tools.search import SearxNGSearchTool, SearxNGSearchToolInputSchema
from akd.tools.search import (SemanticScholarSearchToolConfig)

docling_config = DoclingScraperConfig(do_table_structure=True, pdf_mode='accurate', export_type='html', debug=False)
s2_config = SemanticScholarSearchToolConfig(api_key="", debug=False, external_id="ARXIV", fields = ["paperId", "title", "externalIds", "isOpenAccess", "openAccessPdf"])

# Search
query = 'methods and estimation to map landslides in Nepal'
search_tool = SearxNGSearchTool.from_params(engines=['arxiv'], debug=True)
# Category has to be set to None to make the tool adhere to the engines
search_output = await search_tool.arun(SearxNGSearchToolInputSchema(queries=[query], category=None, max_results=5))

gap_agent_config = GapAgentConfig(docling_config=docling_config,
                                  s2_tool_config=s2_config,
                                  model_name='gpt-4o-mini', api_key="",
                                  debug=True)
gap_agent = GapAgent(gap_agent_config)


# for pre-defined gaps
gap_output_1 = await gap_agent.arun(GapInputSchema(search_results=search_output.results, gap="evidence"))
# for user-defined gaps
defined_gap = "Are there findings that have not been replicated or independently validated across different studies?"
gap_output_2 = await gap_agent.arun(GapInputSchema(search_results=search_output.results, gap=defined_gap))

# For generated output
output = gap_output_1.output
# For generated answers per selected node
attributed_source_answers = gap_output_1.attributed_source_answers
# For the graph created from the ingested papers
G = gap_output_1.G

Limitations:

The section grouper and classifier rely on content parsed using HTML. Need to add support for markdown format.
The agent is restricted to using Docling to make use of it's export_to_html functionality. This will be replaced with composite parser once markdown support is added to the section grouper.
It is recommended to run the agent with the search engine set to Arxiv to ensure PDFs are available since the code relies on SearchResultItems' pdf_url having a value.

Checks

Closed Automatic Knowledge Graph Creation from Literature for Gap Identification #54
Close Identify Research Gaps existing in the literature #22
Tested Changes
Stakeholder Approval

…discovery into feature/gap-agent

…T/accelerated-discovery into feature/gap-agent

NISH1001 · 2025-07-28T15:35:56Z

@movinam thanks for the PR. will look sometime today and test as well.

NISH1001

Pass 1 comments

…discovery into feature/gap-agent

…erated-discovery into feature/gap-agent

…discovery into feature/gap-agent

github-actions · 2025-09-03T18:04:27Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 219
Failed: 6
Warnings: 110
Coverage: 70%

Branch: feature/gap-agent
PR: #86
Commit: 79e9dd71d7f7a7b5ec2fe82143d15c4bcc93215d

📋 Full coverage report and logs are available in the workflow run.

github-actions · 2025-09-03T18:18:23Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 219
Failed: 6
Warnings: 110
Coverage: 70%

Branch: feature/gap-agent
PR: #86
Commit: aa2aaae4839eedafb4609a435f995a821c70abd6

📋 Full coverage report and logs are available in the workflow run.

NISH1001

On networkx Graph serialization

github-actions · 2025-09-03T18:33:12Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 219
Failed: 6
Warnings: 110
Coverage: 70%

Branch: feature/gap-agent
PR: #86
Commit: 39d63b6a8b62baad1e7ca0dcf3026ccc568abc01

📋 Full coverage report and logs are available in the workflow run.

NISH1001

Review: Pass II

NISH1001

Pass III

NISH1001 · 2025-09-03T18:54:29Z

@movinam Also one thing:
Can we also get some runtime stats for running the gap agent. Say, x seconds per 5 papers etc? That will help us know where can we use the gap agent in the akd workflow. If fast, we could pteotnailly integrate it to deep search every iteration. If slow, we need to rethink it as one-time process that runs in the background in backend....

Any time usage stats will help us refine the scope.

movinam · 2025-09-03T20:08:15Z

Runtime statistics are dependent on the number of search results, the length of the papers, docling configuration and how long docling takes to process a batch of papers.

Using the default settings, it takes approximately 3-5 minutes for 5 papers and 5-7 minutes for 10 papers. We can do a better analysis per domain when we benchmark the agent.

I think to integrate to deep search we have to restrict the gap agent to just the abstracts. I can add that as an enhancement in a future PR.

github-actions · 2025-09-03T20:21:43Z

❌ Tests failed (exit code: 1)

📊 Test Results

Passed: 219
Failed: 6
Warnings: 112
Coverage: 70%

Branch: feature/gap-agent
PR: #86
Commit: be495d5295b70ccf317d50ddec3bde9b172f0f6f

📋 Full coverage report and logs are available in the workflow run.

NISH1001 · 2025-09-04T02:52:41Z

Runtime statistics are dependent on the number of search results, the length of the papers, docling configuration and how long docling takes to process a batch of papers.

Using the default settings, it takes approximately 3-5 minutes for 5 papers and 5-7 minutes for 10 papers. We can do a better analysis per domain when we benchmark the agent.

I think to integrate to deep search we have to restrict the gap agent to just the abstracts. I can add that as an enhancement in a future PR.

I think running gap only at abstract level will defeats the purpose and will not justify the gaps. We should probably need

lighter scraper to read pdf (other than VLM, maybe traditional fitz way)
reducer/summarizer of the paper that compresses the paper contexts nicely (I presume this is also causing the large runtime since we're using full text. Summarizer will be say just compression and probably just getting summary of each sections. Abstract might not capture the nuances for detail gap analysis.

This is a topic of discussion for another thread though.

movinam added 14 commits July 23, 2025 13:30

Modify searxng to prioritise engines, then category

dbae7c8

Temp fix to avoid TypeError: Cannot mix str and non-str arguments

9dc6606

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

5de2359

…discovery into feature/gap-agent

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

eb52989

…discovery into feature/gap-agent

Merge branch 'feature/semantic-apis' of https://github.com/NASA-IMPAC…

b828285

…T/accelerated-discovery into feature/gap-agent

Merge branch 'feature/semantic-apis' of https://github.com/NASA-IMPAC…

bb8dbf2

…T/accelerated-discovery into feature/gap-agent

Merge branch 'feature/semantic-apis' of https://github.com/NASA-IMPAC…

3a43457

…T/accelerated-discovery into feature/gap-agent

Add prompts and structures for gap analysis

1446d61

Add graph utility functions for papers

8e0126a

Add parsing utility functions for papers

00c5fe2

Modify graph utils to include better docs

ed00f71

Add gap agent config files

47f7714

Update config to include Base Settings

3ca3fe8

Add gap analysis agent

988d859

NISH1001 reviewed Jul 28, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/gap_analysis.py Outdated

NISH1001 requested changes Jul 28, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/parsing_utils.py

Comment thread akd/configs/gap_config.py Outdated

movinam added 7 commits July 31, 2025 22:23

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

8a74ec4

…discovery into feature/gap-agent

Refactor initialisation

5c9713c

Merge branch 'bugfix/s2-tool' of https://github.com/NASA-IMPACT/accel…

80a9308

…erated-discovery into feature/gap-agent

Update fetch_paper_items to manage empty results

8e38f8d

Add pre-defined gap queries

dd7d67c

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

8ae9dfe

…discovery into feature/gap-agent

Reformat GapAgentOutput

78c8d48

movinam marked this pull request as ready for review August 5, 2025 12:14

movinam added 4 commits August 15, 2025 10:51

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

3dd23bd

…discovery into feature/gap-agent

Modify prompt to adhere to output structure

1e48756

Merge branch 'develop' of https://github.com/NASA-IMPACT/accelerated-…

4934550

…discovery into feature/gap-agent

Add test notebook

77d0721

github-actions Bot added a commit that referenced this pull request Sep 3, 2025

Auto-merge PR #86 (feature/gap-agent) into integration for testing

555fff9

Remove old configurations

3b9fc24

movinam temporarily deployed to integration September 3, 2025 18:09 — with GitHub Actions Inactive

NISH1001 requested changes Sep 3, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/gap_analysis.py Outdated

Comment thread akd/agents/gap_analysis/gap_analysis.py Outdated

github-actions Bot added a commit that referenced this pull request Sep 3, 2025

Auto-merge PR #86 (feature/gap-agent) into integration for testing

0e18f3b

Move default configs to default_factory

843734f

movinam temporarily deployed to integration September 3, 2025 18:24 — with GitHub Actions Inactive

NISH1001 reviewed Sep 3, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/gap_analysis.py Outdated

NISH1001 reviewed Sep 3, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/gap_analysis.py

github-actions Bot added a commit that referenced this pull request Sep 3, 2025

Auto-merge PR #86 (feature/gap-agent) into integration for testing

67147e2

NISH1001 requested changes Sep 3, 2025

View reviewed changes

Dump graph as json

692c52d

NISH1001 reviewed Sep 3, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/graph_utils.py Outdated

NISH1001 reviewed Sep 3, 2025

View reviewed changes

Comment thread akd/agents/gap_analysis/gap_analysis.py

movinam added 2 commits September 3, 2025 16:10

Refactor variables

833a04c

Remove unused function

cb5c4f3

movinam temporarily deployed to integration September 3, 2025 20:12 — with GitHub Actions Inactive

github-actions Bot added a commit that referenced this pull request Sep 3, 2025

Auto-merge PR #86 (feature/gap-agent) into integration for testing

7c0cc7c

NISH1001 approved these changes Sep 4, 2025

View reviewed changes

NISH1001 merged commit e9b08e4 into develop Sep 4, 2025
2 checks passed

NISH1001 deleted the feature/gap-agent branch September 4, 2025 02:57

Conversation

movinam commented Jul 28, 2025 • edited by NISH1001 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Details:

Usage:

Limitations:

Checks

Uh oh!

NISH1001 commented Jul 28, 2025

Uh oh!

Uh oh!

NISH1001 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Sep 3, 2025

📊 Test Results

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Sep 3, 2025

📊 Test Results

Uh oh!

Uh oh!

NISH1001 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Sep 3, 2025

📊 Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NISH1001 left a comment

Choose a reason for hiding this comment

Uh oh!

NISH1001 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NISH1001 commented Sep 3, 2025

Uh oh!

movinam commented Sep 3, 2025

Uh oh!

github-actions Bot commented Sep 3, 2025

📊 Test Results

Uh oh!

NISH1001 commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

movinam commented Jul 28, 2025 •

edited by NISH1001

Loading

NISH1001 left a comment •

edited

Loading