Skip to content

scienceaix/deepresearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Awesome Deep Research Projects

For more detailed report, please refer to A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications

Projects

Category Subcategory Name URL
Open-source Agent Framework XAgent https://github.com/OpenBMB/XAgent
Open-source Agent Framework AutoGen https://github.com/microsoft/autogen
Open-source Agent Framework Qwen-Agent https://github.com/QwenLM/Qwen-Agent
Open-source Agent Framework OpenAI Agents SDK https://github.com/openai/openai-agents-python
Open-source Agent Framework N8n https://github.com/n8n-io/n8n
Open-source Agent Framework AutoChain https://github.com/Forethought-Technologies/AutoChain
Open-source Agent Framework AgentGPT https://github.com/reworkd/AgentGPT
Open-source Agent Framework Open-operator https://github.com/browserbase/open-operator
Open-source Agent Framework BabyAGI https://github.com/yoheinakajima/babyagi
Open-source Agent Framework AutoGPT https://github.com/Significant-Gravitas/AutoGPT
Open-source Agent Framework MetaGPT https://github.com/geekan/MetaGPT
Open-source Agent Framework Llama_index https://github.com/run-llama/llama_index
Open-source Agent Framework LangGraph https://github.com/langchain-ai/langgraph
Open-source Agent Framework GoogleADK https://google.github.io/adk-docs/
Open-source Agent Framework CrewAI https://github.com/crewAIInc/crewAI
Open-source Agent Framework Agno https://github.com/agno-agi/agno
Open-source Agent Framework Temporal https://github.com/temporalio/temporal
Open-source Agent Framework Orkes https://orkes.io/use-cases/agentic-workflows
Open-source Agent Framework Pydantic-AI https://github.com/pydantic/pydantic-ai
Open-source Agent Framework Letta https://github.com/letta-ai/letta
Open-source Agent Framework Mastra https://github.com/mastra-ai/mastra
Open-source Agent Framework Semantic Kernel https://github.com/microsoft/semantic-kernel
Open-source Agent Orchestration Platform Dify https://github.com/langgenius/dify
Closed-source Agent Orchestration Platform Coze Space https://www.coze.cn/space-preview
Closed-source Agent Orchestration Platform Flowise https://flowiseai.com/
Closed-source ​AI Assistant Tools​ NotebookLm https://notebooklm.google/
Closed-source ​AI Assistant Tools​ MGX.dev https://mgx.dev
Closed-source ​AI Assistant Tools​ You https://you.com/about
Closed-source ​AI Assistant Tools​ Microsoft Copilot https://www.microsoft.com/en-us/microsoft-copilot/organizations
Closed-source Workflow​ Claude Research https://www.anthropic.com/news/research
Open-source Workflow​ Google-gemini/gemini-fullstack-langgraph-quickstart https://github.com/google-gemini/gemini-fullstack-langgraph-quickstart
Open-source Workflow​ Dzhng/deep-research https://github.com/dzhng/deep-research
Open-source Workflow​ Jina-AI/node-DeepResearch https://github.com/jina-ai/node-DeepResearch
Open-source Workflow​ LangChain-AI/open_deep_research https://github.com/langchain-ai/open_deep_research
Open-source Workflow​ TheBlewish/Automated-AI-Web-Researcher-Ollama https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama
Open-source Workflow​ Btahir/open_deep_research https://github.com/btahir/open-deep-research
Open-source Workflow​ Nickscamara/open-deep-research https://github.com/nickscamara/open-deep-research
Open-source Workflow​ Mshumer/OpenDeepResearcher https://github.com/mshumer/OpenDeepResearcher
Open-source Workflow​ Grapeot/deep_research_agent https://github.com/grapeot/deep_research_agent
Open-source Workflow​ Smolagents/open_deep_research https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research
Open-source Workflow​ Assafelovic/GPT-Researcher https://github.com/assafelovic/gpt-researcher/
Open-source Workflow​ HKUDS/Auto-Deep-Research https://github.com/HKUDS/Auto-Deep-Research
Open-source Workflow​ AgentLaboratory https://github.com/SamuelSchmidgall/AgentLaboratory
Closed-source Multi-modal Agent UI​ Manus https://manus.im/
Closed-source Multi-modal Agent UI​ Flowith-Oracle Mode https://flowith.net/
Open-source Multi-modal Agent UI​ OpenManus https://github.com/FoundationAgents/OpenManus
)
Open-source Multi-modal Agent UI​ Camel-AI/OWL https://github.com/camel-ai/owl
Open-source Multi-modal Agent UI​ TARS https://github.com/bytedance/UI-TARS-desktop
Open-source Multi-modal Agent UI​ Nanobrowser https://github.com/nanobrowser/nanobrowser
Open-source Multi-modal Agent UI​ JARVIS https://github.com/microsoft/JARVIS
Closed-source Multi-modal Agent UI​ Devin https://devin.ai/
Closed-source Foundation Models​ OpenAI Deep Research https://openai.com/index/introducing-deep-research/
Closed-source Foundation Models​ Gimini Deep Research https://blog.google/products/gemini/google-gemini-deep-research/
Closed-source Foundation Models​ Perplexity Deep Research https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research
Closed-source Foundation Models​ Grok 3 Beta https://x.ai/news/grok-3
Closed-source Foundation Models​ AutoGLM-Research https://autoglm-research.zhipuai.cn/
Closed-source Foundation Models​ DeepSeek-R1 https://arxiv.org/abs/2501.12948
Closed-source Developer Tools​ Vercel https://vercel.com/
Closed-source Developer Tools​ Bolt https://bolt.new/
Closed-source Developer Tools​ Cursor https://www.cursor.com/
Closed-source Developer Tools​ Github Copilot https://github.com/features/copilot?ref=nav.poetries.top
Open-source Developer Tools​ Cline https://github.com/cline/cline
Open-source Developer Tools​ GPT-pilot https://github.com/Pythagora-io/gpt-pilot
Open-source Developer Tools​ Restate https://restate.dev/
Open-source Developer Tools​ OpenAI Codex https://github.com/openai/codex
Closed-source ​Research/Academic Search Elicit https://elicit.com/?redirected=true
Closed-source ​Research/Academic Search ResearchRabbit https://www.researchrabbit.ai/
Closed-source ​Research/Academic Search STORM https://storm.genie.stanford.edu/
Closed-source ​Research/Academic Search Consensus https://consensus.app/
Closed-source ​Research/Academic Search Scite https://scite.ai/
Closed-source ​Research/Academic Search Scispace https://scispace.com/
Closed-source ​Research/Academic Search FutureHouse Platform https://www.futurehouse.org/research-announcements/launching-futurehouse-platform-ai-agents
Open-source ​Research/Academic Search PaperQA https://github.com/Future-House/paper-qa
Open-source ​Research/Academic Search HKUDS/AI-Researcher https://github.com/HKUDS/AI-Researcher
Open-source Model Training Frameworks​ Agent-RL/ReSearch https://github.com/Agent-RL/ReSearch
Open-source Model Training Frameworks​ DSPy https://github.com/stanfordnlp/dspy
Open-source Model Training Frameworks​ Gair-NLP/DeepResearcher https://github.com/GAIR-NLP/DeepResearcher
Open-source Model Training Frameworks​ ModelTC/lightllm https://github.com/ModelTC/lightllm
Open-source Other LLM Tools​ Ollama https://github.com/ollama/ollama
Open-source Other LLM Tools​ Vllm https://github.com/vllm-project/vllm
Open-source Other LLM Tools​ Web-LLM https://github.com/mlc-ai/web-llm

Papers

Category Paper Title URL
​AI Agent Frameworks & Development​ AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents https://arxiv.org/pdf/2502.05957
​AI Agent Frameworks & Development​ Building effective agents https://www.anthropic.com/engineering/building-effective-agents
​AI Agent Frameworks & Development​ OpenAgents: An Open Platform for Language Agents in the Wild https://arxiv.org/pdf/2310.10634
​AI Agent Frameworks & Development​ Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research https://arxiv.org/pdf/2502.04644
​AI Agent Frameworks & Development​ AutoGLM: Autonomous Foundation Agents for GUIs https://arxiv.org/pdf/2411.00820
​AI Agent Frameworks & Development​ TapeAgents: A Holistic Framework for Agent Development and Optimization https://arxiv.org/pdf/2412.08445
​AI Agent Frameworks & Development​ How to think about agent frameworks https://blog.langchain.dev/how-to-think-about-agent-frameworks/
​AI for Scientific Research​ Towards an AI Co-Scientist https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf
​AI for Scientific Research​ DeepResearcher: Scaling Deep Research via Reinforcement Learning https://arxiv.org/pdf/2504.03160
​AI for Scientific Research​ AI Achieves Silver-Medal Standard Solving IMO Problems https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
​AI for Scientific Research​ Accelerating Scientific Research Through Multi-LLM Frameworks https://arxiv.org/pdf/2502.07960
​AI for Scientific Research​ The AI Scientist: Fully Automated Open-Ended Scientific Discovery https://arxiv.org/pdf/2408.06292
​AI for Scientific Research​ Transforming Science with LLMs: Survey on AI-Assisted Discovery https://arxiv.org/pdf/2502.05151
​AI for Scientific Research​ AI's Deep Research Revolution in Biomedical Literature https://journals.lww.com/jcma/citation/9900/ai_s_deep_research_revolution__transforming.508.aspx
​AI for Scientific Research​ Unlocking AI Researchers' Potential in Scientific Discovery https://arxiv.org/pdf/2503.05822
​AI for Scientific Research​ Empowering Biomedical Discovery with AI Agents https://arxiv.org/pdf/2404.02831
​AI for Scientific Research​ Automated Scientific Discovery Systems https://arxiv.org/abs/2305.02251
​LLM Tool Integration & API Control​ ToolLLM: Mastering 16K+ Real-World APIs https://arxiv.org/pdf/2307.16789
​LLM Tool Integration & API Control​ MetaGPT: Multi-Agent Collaborative Framework https://arxiv.org/pdf/2308.00352
​LLM Tool Integration & API Control​ AutoGen: Next-Gen LLM Apps via Multi-Agent Conversation https://arxiv.org/pdf/2308.08155
​LLM Tool Integration & API Control​ LLaVA-Plus: Creating Multimodal Agents with Tools https://arxiv.org/pdf/2311.05437
​LLM Tool Integration & API Control​ ChemCrow: Augmenting LLMs with Chemistry Tools https://arxiv.org/pdf/2304.05376
​LLM Tool Integration & API Control​ TORL: Scaling Tool-Integrated Reinforcement Learning https://arxiv.org/pdf/2503.23383
​Deep Research Systems​ OpenAI's 'Deep Research' Tool: Usefulness for Scientists https://www.nature.com/articles/d41586-025-00377-9
​Deep Research Systems​ OpenAI's Deep Research: Functionality and Applications https://www.youreverydayai.com/openais-deep-research-how-it-works-and-what-to-use-it-for/
​Deep Research Systems​ Deep Research System Card https://cdn.openai.com/deep-research-system-card.pdf
​Deep Research Systems​ Gemini Launches Deep Research on Gemini 2.5 Pro https://www.ctol.digital/news/gemini-deep-research-launch-2-5-pro-vs-openai/
​Deep Research Systems​ Deep Research Now Available on Gemini 2.5 Pro Experimental https://blog.google/products/gemini/deep-research-gemini-2-5-pro-experimental/
​Deep Research Systems​ ChatGPT's Deep Research vs. Google's Gemini 1.5 Pro: Comparison https://whitebeardstrategies.com/ai-prompt-engineering/chatgpts-deep-research-vs-googles-gemini-1-5-pro-with-deep-research-a-detailed-comparison/
​Deep Research Systems​ ChatGPT Deep Research vs Perplexity: Comparative Analysis https://blog.getbind.co/2025/02/03/chatgpt-deep-research-is-it-better-than-perplexity/
​Deep Research Systems​ Sonar by Perplexity [Technical Documentation] https://docs.perplexity.ai/guides/model-cards#research-models
​RAG Technology​ Ragnarök: Reusable RAG Framework for TREC 2024 http://arxiv.org/pdf/2406.16828
​RAG Technology​ From Documents to Dialogue: KG-RAG Enhanced AI Assistants https://arxiv.org/pdf/2502.15237
​RAG Technology​ GEAR-Up: AI-Augmented Scholarly Search for Systematic Reviews https://arxiv.org/pdf/2312.09948
​RAG Technology​ Survey on RAG for Large Language Models https://arxiv.org/pdf/2405.06211
​RAG Technology​ Knowledge Retrieval Based on Generative AI https://arxiv.org/pdf/2501.04635
​LLM Reasoning & Optimization​ Self-Consistency Improves Chain-of-Thought Reasoning https://arxiv.org/pdf/2203.11171
​LLM Reasoning & Optimization​ Chain-of-Thought Prompting Elicits Reasoning in LLMs https://arxiv.org/pdf/2201.11903
​LLM Reasoning & Optimization​ Training LLMs to Follow Instructions with Human Feedback https://arxiv.org/pdf/2203.02155
​LLM Reasoning & Optimization​ Debate Enhances Weak-to-Strong Generalization https://arxiv.org/pdf/2501.13124
​LLM Reasoning & Optimization​ Mask-DPO: Factuality Alignment for LLMs https://arxiv.org/pdf/2503.02846
​LLM Reasoning & Optimization​ QuestBench: Can LLMs Ask Optimal Questions? https://arxiv.org/abs/2503.22674
​Multi-Agent Systems​ AgentVerse: Multi-Agent Collaboration and Emergent Behaviors https://arxiv.org/pdf/2308.10848
​Multi-Agent Systems​ MetaAgents: Human Behavior Simulation for Task Coordination https://arxiv.org/pdf/2310.06500
​Multi-Agent Systems​ CAMEL: Communicative Agents for LLM Society Exploration https://arxiv.org/pdf/2303.17760
​Multi-Agent Systems​ Many Heads Improve Scientific Idea Generation https://arxiv.org/pdf/2410.09403
​Multi-Agent Systems​ Why Multi-Agent LLM Systems Fail https://arxiv.org/pdf/2503.13657
​Multi-Agent Systems​ Multi-Agent System for Cosmological Parameter Analysis https://arxiv.org/pdf/2412.00431
​Code & Software Development​ CodeA11y: Accessible Web Development with AI https://arxiv.org/pdf/2502.10884
​Code & Software Development​ AutoDev: Automated AI-Driven Development https://arxiv.org/pdf/2403.08299
​Code & Software Development​ ChatDev: Communicative Agents for Software Development https://aclanthology.org/2024.acl-long.810.pdf
​Code & Software Development​ Natural Language as a Programming Language https://drops.dagstuhl.de/storage/00lipics/lipics-vol071-snapl2017/LIPIcs.SNAPL.2017.4/LIPIcs.SNAPL.2017.4.pdf
​Code & Software Development​ AIDE: AI-Driven Code Exploration https://arxiv.org/pdf/2502.13138
​Code & Software Development​ AI-Assisted Programming: Big Code NLP https://arxiv.org/pdf/2307.02503
​Code & Software Development​ AI-Assisted SQL Authoring at Industry Scale https://arxiv.org/pdf/2407.13280
​Code & Software Development​ Steward: Natural Language Web Automation https://arxiv.org/pdf/2409.15441
​Domain-Specific AI Tools​ MatPilot: AI Materials Scientist https://arxiv.org/pdf/2411.08063
​Domain-Specific AI Tools​ EvoPat: Multi-LLM Patent Summarization Agent https://arxiv.org/pdf/2412.18100
​Domain-Specific AI Tools​ ChartCitor: Fine-Grained Chart Attribution Framework https://arxiv.org/pdf/2502.00989
​Domain-Specific AI Tools​ PatentGPT: Knowledge-Based Patent Drafting https://arxiv.org/pdf/2409.00092
​Domain-Specific AI Tools​ SciAgents: Multi-Agent Scientific Discovery https://arxiv.org/pdf/2409.05556
​Domain-Specific AI Tools​ Dolphin: Closed-Loop Open-Ended Auto-Research https://arxiv.org/pdf/2501.03916
​Domain-Specific AI Tools​ SeqMate: Automating RNA Sequencing with LLMs https://arxiv.org/pdf/2407.03381
​Domain-Specific AI Tools​ Knowledge Synthesis of Photosynthesis via LLMs https://arxiv.org/pdf/2502.01059
​Domain-Specific AI Tools​ GeoLLM: Geospatial Knowledge Extraction from LLMs https://arxiv.org/pdf/2310.06213
​HCI & AI User Experience​ System Usability Scale: Evolution and Future https://doi.org/10.1080/10447318.2018.1455307
​HCI & AI User Experience​ CARE: Collaborative AI Reading Environment https://arxiv.org/pdf/2302.12611
​HCI & AI User Experience​ VISAR: Visual Argumentative Writing Assistant https://arxiv.org/pdf/2304.07810
​HCI & AI User Experience​ AdaptoML-UX: User-Centered AutoML Toolkit https://arxiv.org/pdf/2410.17469
​HCI & AI User Experience​ AI Assistants for Semi-Automated Data Wrangling https://arxiv.org/pdf/2211.00192
​HCI & AI User Experience​ Documentation Matters: Human-Centered AI Systems https://arxiv.org/pdf/2102.12592
​HCI & AI User Experience​ Need Help? Proactive Programming Assistants https://arxiv.org/abs/2410.04596
​HCI & AI User Experience​ Large-Scale Survey on AI Programming Assistant Usability https://arxiv.org/abs/2303.17125
​AI Evaluation & Benchmarking​ TruthfulQA: Measuring Model Mimicry of Human Falsehoods https://arxiv.org/pdf/2109.07958
​AI Evaluation & Benchmarking​ HotpotQA: Dataset for Multi-hop Question Answering https://arxiv.org/pdf/1809.09600
​AI Evaluation & Benchmarking​ WebArena: Web Agent Benchmark https://github.com/web-arena-x/webarena
​AI Evaluation & Benchmarking​ Measuring Short-Form Factuality in LLMs https://cdn.openai.com/papers/simpleqa.pdf
​AI Evaluation & Benchmarking​ Survey on LLM-Generated Text Detection https://arxiv.org/pdf/2310.14724
​AI Evaluation & Benchmarking​ Evaluating AI-Assisted Code Generation Tools https://arxiv.org/pdf/2304.10778
​AI Evaluation & Benchmarking​ Benchmarking ChatGPT, Codeium, and GitHub Copilot https://arxiv.org/pdf/2409.19922
​AI Evaluation & Benchmarking​ FinEval: Chinese Financial Knowledge Benchmark https://arxiv.org/pdf/2308.09975
​AI Evaluation & Benchmarking​ Knowledge-Based Evaluation Methodology for AI Assistants https://arxiv.org/pdf/2406.05603
​AI Evaluation & Benchmarking​ GRADE Guidelines: Rating Evidence Quality https://pubmed.ncbi.nlm.nih.gov/21208779/
​AI Evaluation & Benchmarking​ Holistic Evaluation of Language Models https://arxiv.org/pdf/2211.09110
​AI Evaluation & Benchmarking​ AGIEvalA Human-Centric Benchmark for Evaluating Foundation Models https://arxiv.org/pdf/2304.06364
​AI Evaluation & Benchmarking​ GAIA:A Benchmark for General AI Assistants https://arxiv.org/pdf/2311.12983
​AI Evaluation & Benchmarking​ MMLU benchmarkTesting LLMs multi-task capabilities https://www.bracai.eu/post/mmlu-benchmark
​AI Evaluation & Benchmarking​ Enabling AI Scientists to Recognize InnovationA Domain-Agnostic Algorithm for Assessing Novelty https://arxiv.org/pdf/2503.01508
​AI Evaluation & Benchmarking​ The impact of AI and peer feedback on research writing skillsa study using the CGScholar platform among Kazakhstani scholars https://arxiv.org/pdf/2503.05820
​AI Evaluation & Benchmarking​ Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform https://arxiv.org/pdf/2502.21266
​AI Evaluation & Benchmarking​ EAIRAEstablishing a Methodology for Evaluating AI Models as Scientific Research Assistants https://arxiv.org/pdf/2502.20309
​AI Evaluation & Benchmarking​ Bridging Logic Programming and Deep Learning for Explainability through ILASP https://arxiv.org/pdf/2502.09227
​AI Evaluation & Benchmarking​ Self-Explanation in Social AI Agents https://arxiv.org/pdf/2501.13945
​AI Evaluation & Benchmarking​ Fine-Grained Appropriate RelianceHuman-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition https://arxiv.org/pdf/2501.10909
​AI Evaluation & Benchmarking​ CATERLeveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation https://arxiv.org/pdf/2412.11261
​AI Evaluation & Benchmarking​ GigaCheckDetecting LLM-generated Content https://arxiv.org/pdf/2410.23728
​AI Evaluation & Benchmarking​ Vital InsightAssisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents https://arxiv.org/pdf/2410.14879
​AI Evaluation & Benchmarking​ Aligning AI-driven discovery with human intuition https://arxiv.org/pdf/2410.07
​AI Evaluation & Benchmarking​ Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics https://arxiv.org/pdf/2502.15815
​AI Evaluation & Benchmarking​ Insect-FoundationA Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding https://arxiv.org/pdf/2502.09906
​AI Evaluation & Benchmarking​ MinervaA Programmable Memory Test Benchmark for Language Models https://arxiv.org/pdf/2502.03358
​AI Evaluation & Benchmarking​ UGPhysicsA Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models https://arxiv.org/pdf/2502.00334
​AI Evaluation & Benchmarking​ Learning to Coordinate with Experts https://arxiv.org/pdf/2502.09583
​AI Evaluation & Benchmarking​ Auto-BenchAn Automated Benchmark for Scientific Discovery in LLMs https://arxiv.org/pdf/2502.15224
​AI Evaluation & Benchmarking​ How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation https://arxiv.org/pdf/2412.18573
​AI Evaluation & Benchmarking​ LLM4DSEvaluating Large Language Models for Data Science Code Generation https://arxiv.org/pdf/2411.11908
​AI Evaluation & Benchmarking​ RedCodeRisky Code Execution and Generation Benchmark for Code Agents https://arxiv.org/pdf/2411.07781
​AI Evaluation & Benchmarking​ SeafloorAIA Large-scale Vision-Language Dataset for Seafloor Geological Survey https://arxiv.org/pdf/2411.00172
​AI Evaluation & Benchmarking​ INQUIREA Natural World Text-to-Image Retrieval Benchmark https://arxiv.org/pdf/2411.02537
​AI Evaluation & Benchmarking​ AAAR-1.0Assessing AI's Potential to Assist Research https://arxiv.org/pdf/2410.22394
​AI Evaluation & Benchmarking​ AutoPenBenchBenchmarking Generative Agents for Penetration Testing https://arxiv.org/pdf/2410.03225
​AI Evaluation & Benchmarking​ CodeMMLUA Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs https://arxiv.org/pdf/2410.01999
​AI Evaluation & Benchmarking​ UniSumEvalTowards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs https://arxiv.org/pdf/2409.19898
​AI Evaluation & Benchmarking​ CI-BenchBenchmarking Contextual Integrity of AI Assistants on Synthetic Data https://arxiv.org/pdf/2409.13903
​AI Evaluation & Benchmarking​ ChemDFM-XTowards Large Multimodal Model for Chemistry https://arxiv.org/pdf/2409.13194
​AI Evaluation & Benchmarking​ DSBenchHow Far Are Data Science Agents to Becoming Data Science Experts? https://arxiv.org/pdf/2409.07703
​AI Evaluation & Benchmarking​ GMAI-MMBenchA Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI https://arxiv.org/pdf/2408.03361
​AI Evaluation & Benchmarking​ MMSciA Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding https://arxiv.org/pdf/2407.04903
​AI Evaluation & Benchmarking​ SciCodeA Research Coding Benchmark Curated by Scientists https://arxiv.org/pdf/2407.13168
​AI Evaluation & Benchmarking​ MASSWA New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows https://arxiv.org/pdf/2406.06357
​AI Evaluation & Benchmarking​ Turing Tests For An AI Scientist https://arxiv.org/pdf/2405.13352
​AI Evaluation & Benchmarking​ LHRS-BotEmpowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model https://arxiv.org/pdf/2402.02544
​AI Evaluation & Benchmarking​ GAIAa benchmark for General AI Assistants https://arxiv.org/pdf/2311.12983
​AI Evaluation & Benchmarking​ OceanGPTA Large Language Model for Ocean Science Tasks https://arxiv.org/pdf/2310.02031
​AI Evaluation & Benchmarking​ LatEvalAn Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles https://arxiv.org/pdf/2308.10855
​AI Evaluation & Benchmarking​ BOLAABenchmarking and Orchestrating LLM-augmented Autonomous Agents https://arxiv.org/pdf/2308.05960
​AI Evaluation & Benchmarking​ MegaWikaMillions of reports and their sources across 50 diverse languages https://arxiv.org/pdf/2307.07049
​AI Evaluation & Benchmarking​ Learn to ExplainMultimodal Reasoning via Thought Chains for Science Question Answering https://arxiv.org/pdf/2209.09513
​AI Evaluation & Benchmarking​ Benchmarking Agentic Workflow Generation https://arxiv.org/abs/2410.07869
​AI Evaluation & Benchmarking​ TheAgentCompanyBenchmarking LLM Agents on Consequential Real World Tasks https://arxiv.org/abs/2412.14161

Citation

@misc{xu2025comprehensive,
    title={A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications},
    author={Renjun Xu and Jingwen Peng},
    year={2025},
    eprint={2506.12594},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}

About

Awesome Deep Research list

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published