For more detailed report, please refer to A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications
Category | Subcategory | Name | URL |
---|---|---|---|
Open-source | Agent Framework | XAgent | https://github.com/OpenBMB/XAgent |
Open-source | Agent Framework | AutoGen | https://github.com/microsoft/autogen |
Open-source | Agent Framework | Qwen-Agent | https://github.com/QwenLM/Qwen-Agent |
Open-source | Agent Framework | OpenAI Agents SDK | https://github.com/openai/openai-agents-python |
Open-source | Agent Framework | N8n | https://github.com/n8n-io/n8n |
Open-source | Agent Framework | AutoChain | https://github.com/Forethought-Technologies/AutoChain |
Open-source | Agent Framework | AgentGPT | https://github.com/reworkd/AgentGPT |
Open-source | Agent Framework | Open-operator | https://github.com/browserbase/open-operator |
Open-source | Agent Framework | BabyAGI | https://github.com/yoheinakajima/babyagi |
Open-source | Agent Framework | AutoGPT | https://github.com/Significant-Gravitas/AutoGPT |
Open-source | Agent Framework | MetaGPT | https://github.com/geekan/MetaGPT |
Open-source | Agent Framework | Llama_index | https://github.com/run-llama/llama_index |
Open-source | Agent Framework | LangGraph | https://github.com/langchain-ai/langgraph |
Open-source | Agent Framework | GoogleADK | https://google.github.io/adk-docs/ |
Open-source | Agent Framework | CrewAI | https://github.com/crewAIInc/crewAI |
Open-source | Agent Framework | Agno | https://github.com/agno-agi/agno |
Open-source | Agent Framework | Temporal | https://github.com/temporalio/temporal |
Open-source | Agent Framework | Orkes | https://orkes.io/use-cases/agentic-workflows |
Open-source | Agent Framework | Pydantic-AI | https://github.com/pydantic/pydantic-ai |
Open-source | Agent Framework | Letta | https://github.com/letta-ai/letta |
Open-source | Agent Framework | Mastra | https://github.com/mastra-ai/mastra |
Open-source | Agent Framework | Semantic Kernel | https://github.com/microsoft/semantic-kernel |
Open-source | Agent Orchestration Platform | Dify | https://github.com/langgenius/dify |
Closed-source | Agent Orchestration Platform | Coze Space | https://www.coze.cn/space-preview |
Closed-source | Agent Orchestration Platform | Flowise | https://flowiseai.com/ |
Closed-source | AI Assistant Tools | NotebookLm | https://notebooklm.google/ |
Closed-source | AI Assistant Tools | MGX.dev | https://mgx.dev |
Closed-source | AI Assistant Tools | You | https://you.com/about |
Closed-source | AI Assistant Tools | Microsoft Copilot | https://www.microsoft.com/en-us/microsoft-copilot/organizations |
Closed-source | Workflow | Claude Research | https://www.anthropic.com/news/research |
Open-source | Workflow | Google-gemini/gemini-fullstack-langgraph-quickstart | https://github.com/google-gemini/gemini-fullstack-langgraph-quickstart |
Open-source | Workflow | Dzhng/deep-research | https://github.com/dzhng/deep-research |
Open-source | Workflow | Jina-AI/node-DeepResearch | https://github.com/jina-ai/node-DeepResearch |
Open-source | Workflow | LangChain-AI/open_deep_research | https://github.com/langchain-ai/open_deep_research |
Open-source | Workflow | TheBlewish/Automated-AI-Web-Researcher-Ollama | https://github.com/TheBlewish/Automated-AI-Web-Researcher-Ollama |
Open-source | Workflow | Btahir/open_deep_research | https://github.com/btahir/open-deep-research |
Open-source | Workflow | Nickscamara/open-deep-research | https://github.com/nickscamara/open-deep-research |
Open-source | Workflow | Mshumer/OpenDeepResearcher | https://github.com/mshumer/OpenDeepResearcher |
Open-source | Workflow | Grapeot/deep_research_agent | https://github.com/grapeot/deep_research_agent |
Open-source | Workflow | Smolagents/open_deep_research | https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research |
Open-source | Workflow | Assafelovic/GPT-Researcher | https://github.com/assafelovic/gpt-researcher/ |
Open-source | Workflow | HKUDS/Auto-Deep-Research | https://github.com/HKUDS/Auto-Deep-Research |
Open-source | Workflow | AgentLaboratory | https://github.com/SamuelSchmidgall/AgentLaboratory |
Closed-source | Multi-modal Agent UI | Manus | https://manus.im/ |
Closed-source | Multi-modal Agent UI | Flowith-Oracle Mode | https://flowith.net/ |
Open-source | Multi-modal Agent UI | OpenManus | https://github.com/FoundationAgents/OpenManus |
) | |||
Open-source | Multi-modal Agent UI | Camel-AI/OWL | https://github.com/camel-ai/owl |
Open-source | Multi-modal Agent UI | TARS | https://github.com/bytedance/UI-TARS-desktop |
Open-source | Multi-modal Agent UI | Nanobrowser | https://github.com/nanobrowser/nanobrowser |
Open-source | Multi-modal Agent UI | JARVIS | https://github.com/microsoft/JARVIS |
Closed-source | Multi-modal Agent UI | Devin | https://devin.ai/ |
Closed-source | Foundation Models | OpenAI Deep Research | https://openai.com/index/introducing-deep-research/ |
Closed-source | Foundation Models | Gimini Deep Research | https://blog.google/products/gemini/google-gemini-deep-research/ |
Closed-source | Foundation Models | Perplexity Deep Research | https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research |
Closed-source | Foundation Models | Grok 3 Beta | https://x.ai/news/grok-3 |
Closed-source | Foundation Models | AutoGLM-Research | https://autoglm-research.zhipuai.cn/ |
Closed-source | Foundation Models | DeepSeek-R1 | https://arxiv.org/abs/2501.12948 |
Closed-source | Developer Tools | Vercel | https://vercel.com/ |
Closed-source | Developer Tools | Bolt | https://bolt.new/ |
Closed-source | Developer Tools | Cursor | https://www.cursor.com/ |
Closed-source | Developer Tools | Github Copilot | https://github.com/features/copilot?ref=nav.poetries.top |
Open-source | Developer Tools | Cline | https://github.com/cline/cline |
Open-source | Developer Tools | GPT-pilot | https://github.com/Pythagora-io/gpt-pilot |
Open-source | Developer Tools | Restate | https://restate.dev/ |
Open-source | Developer Tools | OpenAI Codex | https://github.com/openai/codex |
Closed-source | Research/Academic Search | Elicit | https://elicit.com/?redirected=true |
Closed-source | Research/Academic Search | ResearchRabbit | https://www.researchrabbit.ai/ |
Closed-source | Research/Academic Search | STORM | https://storm.genie.stanford.edu/ |
Closed-source | Research/Academic Search | Consensus | https://consensus.app/ |
Closed-source | Research/Academic Search | Scite | https://scite.ai/ |
Closed-source | Research/Academic Search | Scispace | https://scispace.com/ |
Closed-source | Research/Academic Search | FutureHouse Platform | https://www.futurehouse.org/research-announcements/launching-futurehouse-platform-ai-agents |
Open-source | Research/Academic Search | PaperQA | https://github.com/Future-House/paper-qa |
Open-source | Research/Academic Search | HKUDS/AI-Researcher | https://github.com/HKUDS/AI-Researcher |
Open-source | Model Training Frameworks | Agent-RL/ReSearch | https://github.com/Agent-RL/ReSearch |
Open-source | Model Training Frameworks | DSPy | https://github.com/stanfordnlp/dspy |
Open-source | Model Training Frameworks | Gair-NLP/DeepResearcher | https://github.com/GAIR-NLP/DeepResearcher |
Open-source | Model Training Frameworks | ModelTC/lightllm | https://github.com/ModelTC/lightllm |
Open-source | Other LLM Tools | Ollama | https://github.com/ollama/ollama |
Open-source | Other LLM Tools | Vllm | https://github.com/vllm-project/vllm |
Open-source | Other LLM Tools | Web-LLM | https://github.com/mlc-ai/web-llm |
Category | Paper Title | URL |
---|---|---|
AI Agent Frameworks & Development | AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | https://arxiv.org/pdf/2502.05957 |
AI Agent Frameworks & Development | Building effective agents | https://www.anthropic.com/engineering/building-effective-agents |
AI Agent Frameworks & Development | OpenAgents: An Open Platform for Language Agents in the Wild | https://arxiv.org/pdf/2310.10634 |
AI Agent Frameworks & Development | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | https://arxiv.org/pdf/2502.04644 |
AI Agent Frameworks & Development | AutoGLM: Autonomous Foundation Agents for GUIs | https://arxiv.org/pdf/2411.00820 |
AI Agent Frameworks & Development | TapeAgents: A Holistic Framework for Agent Development and Optimization | https://arxiv.org/pdf/2412.08445 |
AI Agent Frameworks & Development | How to think about agent frameworks | https://blog.langchain.dev/how-to-think-about-agent-frameworks/ |
AI for Scientific Research | Towards an AI Co-Scientist | https://storage.googleapis.com/coscientist_paper/ai_coscientist.pdf |
AI for Scientific Research | DeepResearcher: Scaling Deep Research via Reinforcement Learning | https://arxiv.org/pdf/2504.03160 |
AI for Scientific Research | AI Achieves Silver-Medal Standard Solving IMO Problems | https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ |
AI for Scientific Research | Accelerating Scientific Research Through Multi-LLM Frameworks | https://arxiv.org/pdf/2502.07960 |
AI for Scientific Research | The AI Scientist: Fully Automated Open-Ended Scientific Discovery | https://arxiv.org/pdf/2408.06292 |
AI for Scientific Research | Transforming Science with LLMs: Survey on AI-Assisted Discovery | https://arxiv.org/pdf/2502.05151 |
AI for Scientific Research | AI's Deep Research Revolution in Biomedical Literature | https://journals.lww.com/jcma/citation/9900/ai_s_deep_research_revolution__transforming.508.aspx |
AI for Scientific Research | Unlocking AI Researchers' Potential in Scientific Discovery | https://arxiv.org/pdf/2503.05822 |
AI for Scientific Research | Empowering Biomedical Discovery with AI Agents | https://arxiv.org/pdf/2404.02831 |
AI for Scientific Research | Automated Scientific Discovery Systems | https://arxiv.org/abs/2305.02251 |
LLM Tool Integration & API Control | ToolLLM: Mastering 16K+ Real-World APIs | https://arxiv.org/pdf/2307.16789 |
LLM Tool Integration & API Control | MetaGPT: Multi-Agent Collaborative Framework | https://arxiv.org/pdf/2308.00352 |
LLM Tool Integration & API Control | AutoGen: Next-Gen LLM Apps via Multi-Agent Conversation | https://arxiv.org/pdf/2308.08155 |
LLM Tool Integration & API Control | LLaVA-Plus: Creating Multimodal Agents with Tools | https://arxiv.org/pdf/2311.05437 |
LLM Tool Integration & API Control | ChemCrow: Augmenting LLMs with Chemistry Tools | https://arxiv.org/pdf/2304.05376 |
LLM Tool Integration & API Control | TORL: Scaling Tool-Integrated Reinforcement Learning | https://arxiv.org/pdf/2503.23383 |
Deep Research Systems | OpenAI's 'Deep Research' Tool: Usefulness for Scientists | https://www.nature.com/articles/d41586-025-00377-9 |
Deep Research Systems | OpenAI's Deep Research: Functionality and Applications | https://www.youreverydayai.com/openais-deep-research-how-it-works-and-what-to-use-it-for/ |
Deep Research Systems | Deep Research System Card | https://cdn.openai.com/deep-research-system-card.pdf |
Deep Research Systems | Gemini Launches Deep Research on Gemini 2.5 Pro | https://www.ctol.digital/news/gemini-deep-research-launch-2-5-pro-vs-openai/ |
Deep Research Systems | Deep Research Now Available on Gemini 2.5 Pro Experimental | https://blog.google/products/gemini/deep-research-gemini-2-5-pro-experimental/ |
Deep Research Systems | ChatGPT's Deep Research vs. Google's Gemini 1.5 Pro: Comparison | https://whitebeardstrategies.com/ai-prompt-engineering/chatgpts-deep-research-vs-googles-gemini-1-5-pro-with-deep-research-a-detailed-comparison/ |
Deep Research Systems | ChatGPT Deep Research vs Perplexity: Comparative Analysis | https://blog.getbind.co/2025/02/03/chatgpt-deep-research-is-it-better-than-perplexity/ |
Deep Research Systems | Sonar by Perplexity [Technical Documentation] | https://docs.perplexity.ai/guides/model-cards#research-models |
RAG Technology | Ragnarök: Reusable RAG Framework for TREC 2024 | http://arxiv.org/pdf/2406.16828 |
RAG Technology | From Documents to Dialogue: KG-RAG Enhanced AI Assistants | https://arxiv.org/pdf/2502.15237 |
RAG Technology | GEAR-Up: AI-Augmented Scholarly Search for Systematic Reviews | https://arxiv.org/pdf/2312.09948 |
RAG Technology | Survey on RAG for Large Language Models | https://arxiv.org/pdf/2405.06211 |
RAG Technology | Knowledge Retrieval Based on Generative AI | https://arxiv.org/pdf/2501.04635 |
LLM Reasoning & Optimization | Self-Consistency Improves Chain-of-Thought Reasoning | https://arxiv.org/pdf/2203.11171 |
LLM Reasoning & Optimization | Chain-of-Thought Prompting Elicits Reasoning in LLMs | https://arxiv.org/pdf/2201.11903 |
LLM Reasoning & Optimization | Training LLMs to Follow Instructions with Human Feedback | https://arxiv.org/pdf/2203.02155 |
LLM Reasoning & Optimization | Debate Enhances Weak-to-Strong Generalization | https://arxiv.org/pdf/2501.13124 |
LLM Reasoning & Optimization | Mask-DPO: Factuality Alignment for LLMs | https://arxiv.org/pdf/2503.02846 |
LLM Reasoning & Optimization | QuestBench: Can LLMs Ask Optimal Questions? | https://arxiv.org/abs/2503.22674 |
Multi-Agent Systems | AgentVerse: Multi-Agent Collaboration and Emergent Behaviors | https://arxiv.org/pdf/2308.10848 |
Multi-Agent Systems | MetaAgents: Human Behavior Simulation for Task Coordination | https://arxiv.org/pdf/2310.06500 |
Multi-Agent Systems | CAMEL: Communicative Agents for LLM Society Exploration | https://arxiv.org/pdf/2303.17760 |
Multi-Agent Systems | Many Heads Improve Scientific Idea Generation | https://arxiv.org/pdf/2410.09403 |
Multi-Agent Systems | Why Multi-Agent LLM Systems Fail | https://arxiv.org/pdf/2503.13657 |
Multi-Agent Systems | Multi-Agent System for Cosmological Parameter Analysis | https://arxiv.org/pdf/2412.00431 |
Code & Software Development | CodeA11y: Accessible Web Development with AI | https://arxiv.org/pdf/2502.10884 |
Code & Software Development | AutoDev: Automated AI-Driven Development | https://arxiv.org/pdf/2403.08299 |
Code & Software Development | ChatDev: Communicative Agents for Software Development | https://aclanthology.org/2024.acl-long.810.pdf |
Code & Software Development | Natural Language as a Programming Language | https://drops.dagstuhl.de/storage/00lipics/lipics-vol071-snapl2017/LIPIcs.SNAPL.2017.4/LIPIcs.SNAPL.2017.4.pdf |
Code & Software Development | AIDE: AI-Driven Code Exploration | https://arxiv.org/pdf/2502.13138 |
Code & Software Development | AI-Assisted Programming: Big Code NLP | https://arxiv.org/pdf/2307.02503 |
Code & Software Development | AI-Assisted SQL Authoring at Industry Scale | https://arxiv.org/pdf/2407.13280 |
Code & Software Development | Steward: Natural Language Web Automation | https://arxiv.org/pdf/2409.15441 |
Domain-Specific AI Tools | MatPilot: AI Materials Scientist | https://arxiv.org/pdf/2411.08063 |
Domain-Specific AI Tools | EvoPat: Multi-LLM Patent Summarization Agent | https://arxiv.org/pdf/2412.18100 |
Domain-Specific AI Tools | ChartCitor: Fine-Grained Chart Attribution Framework | https://arxiv.org/pdf/2502.00989 |
Domain-Specific AI Tools | PatentGPT: Knowledge-Based Patent Drafting | https://arxiv.org/pdf/2409.00092 |
Domain-Specific AI Tools | SciAgents: Multi-Agent Scientific Discovery | https://arxiv.org/pdf/2409.05556 |
Domain-Specific AI Tools | Dolphin: Closed-Loop Open-Ended Auto-Research | https://arxiv.org/pdf/2501.03916 |
Domain-Specific AI Tools | SeqMate: Automating RNA Sequencing with LLMs | https://arxiv.org/pdf/2407.03381 |
Domain-Specific AI Tools | Knowledge Synthesis of Photosynthesis via LLMs | https://arxiv.org/pdf/2502.01059 |
Domain-Specific AI Tools | GeoLLM: Geospatial Knowledge Extraction from LLMs | https://arxiv.org/pdf/2310.06213 |
HCI & AI User Experience | System Usability Scale: Evolution and Future | https://doi.org/10.1080/10447318.2018.1455307 |
HCI & AI User Experience | CARE: Collaborative AI Reading Environment | https://arxiv.org/pdf/2302.12611 |
HCI & AI User Experience | VISAR: Visual Argumentative Writing Assistant | https://arxiv.org/pdf/2304.07810 |
HCI & AI User Experience | AdaptoML-UX: User-Centered AutoML Toolkit | https://arxiv.org/pdf/2410.17469 |
HCI & AI User Experience | AI Assistants for Semi-Automated Data Wrangling | https://arxiv.org/pdf/2211.00192 |
HCI & AI User Experience | Documentation Matters: Human-Centered AI Systems | https://arxiv.org/pdf/2102.12592 |
HCI & AI User Experience | Need Help? Proactive Programming Assistants | https://arxiv.org/abs/2410.04596 |
HCI & AI User Experience | Large-Scale Survey on AI Programming Assistant Usability | https://arxiv.org/abs/2303.17125 |
AI Evaluation & Benchmarking | TruthfulQA: Measuring Model Mimicry of Human Falsehoods | https://arxiv.org/pdf/2109.07958 |
AI Evaluation & Benchmarking | HotpotQA: Dataset for Multi-hop Question Answering | https://arxiv.org/pdf/1809.09600 |
AI Evaluation & Benchmarking | WebArena: Web Agent Benchmark | https://github.com/web-arena-x/webarena |
AI Evaluation & Benchmarking | Measuring Short-Form Factuality in LLMs | https://cdn.openai.com/papers/simpleqa.pdf |
AI Evaluation & Benchmarking | Survey on LLM-Generated Text Detection | https://arxiv.org/pdf/2310.14724 |
AI Evaluation & Benchmarking | Evaluating AI-Assisted Code Generation Tools | https://arxiv.org/pdf/2304.10778 |
AI Evaluation & Benchmarking | Benchmarking ChatGPT, Codeium, and GitHub Copilot | https://arxiv.org/pdf/2409.19922 |
AI Evaluation & Benchmarking | FinEval: Chinese Financial Knowledge Benchmark | https://arxiv.org/pdf/2308.09975 |
AI Evaluation & Benchmarking | Knowledge-Based Evaluation Methodology for AI Assistants | https://arxiv.org/pdf/2406.05603 |
AI Evaluation & Benchmarking | GRADE Guidelines: Rating Evidence Quality | https://pubmed.ncbi.nlm.nih.gov/21208779/ |
AI Evaluation & Benchmarking | Holistic Evaluation of Language Models | https://arxiv.org/pdf/2211.09110 |
AI Evaluation & Benchmarking | AGIEvalA Human-Centric Benchmark for Evaluating Foundation Models | https://arxiv.org/pdf/2304.06364 |
AI Evaluation & Benchmarking | GAIA:A Benchmark for General AI Assistants | https://arxiv.org/pdf/2311.12983 |
AI Evaluation & Benchmarking | MMLU benchmarkTesting LLMs multi-task capabilities | https://www.bracai.eu/post/mmlu-benchmark |
AI Evaluation & Benchmarking | Enabling AI Scientists to Recognize InnovationA Domain-Agnostic Algorithm for Assessing Novelty | https://arxiv.org/pdf/2503.01508 |
AI Evaluation & Benchmarking | The impact of AI and peer feedback on research writing skillsa study using the CGScholar platform among Kazakhstani scholars | https://arxiv.org/pdf/2503.05820 |
AI Evaluation & Benchmarking | Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform | https://arxiv.org/pdf/2502.21266 |
AI Evaluation & Benchmarking | EAIRAEstablishing a Methodology for Evaluating AI Models as Scientific Research Assistants | https://arxiv.org/pdf/2502.20309 |
AI Evaluation & Benchmarking | Bridging Logic Programming and Deep Learning for Explainability through ILASP | https://arxiv.org/pdf/2502.09227 |
AI Evaluation & Benchmarking | Self-Explanation in Social AI Agents | https://arxiv.org/pdf/2501.13945 |
AI Evaluation & Benchmarking | Fine-Grained Appropriate RelianceHuman-AI Collaboration with a Multi-Step Transparent Decision Workflow for Complex Task Decomposition | https://arxiv.org/pdf/2501.10909 |
AI Evaluation & Benchmarking | CATERLeveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation | https://arxiv.org/pdf/2412.11261 |
AI Evaluation & Benchmarking | GigaCheckDetecting LLM-generated Content | https://arxiv.org/pdf/2410.23728 |
AI Evaluation & Benchmarking | Vital InsightAssisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents | https://arxiv.org/pdf/2410.14879 |
AI Evaluation & Benchmarking | Aligning AI-driven discovery with human intuition | https://arxiv.org/pdf/2410.07 |
AI Evaluation & Benchmarking | Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics | https://arxiv.org/pdf/2502.15815 |
AI Evaluation & Benchmarking | Insect-FoundationA Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding | https://arxiv.org/pdf/2502.09906 |
AI Evaluation & Benchmarking | MinervaA Programmable Memory Test Benchmark for Language Models | https://arxiv.org/pdf/2502.03358 |
AI Evaluation & Benchmarking | UGPhysicsA Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models | https://arxiv.org/pdf/2502.00334 |
AI Evaluation & Benchmarking | Learning to Coordinate with Experts | https://arxiv.org/pdf/2502.09583 |
AI Evaluation & Benchmarking | Auto-BenchAn Automated Benchmark for Scientific Discovery in LLMs | https://arxiv.org/pdf/2502.15224 |
AI Evaluation & Benchmarking | How Well Do LLMs Generate Code for Different Application Domains? Benchmark and Evaluation | https://arxiv.org/pdf/2412.18573 |
AI Evaluation & Benchmarking | LLM4DSEvaluating Large Language Models for Data Science Code Generation | https://arxiv.org/pdf/2411.11908 |
AI Evaluation & Benchmarking | RedCodeRisky Code Execution and Generation Benchmark for Code Agents | https://arxiv.org/pdf/2411.07781 |
AI Evaluation & Benchmarking | SeafloorAIA Large-scale Vision-Language Dataset for Seafloor Geological Survey | https://arxiv.org/pdf/2411.00172 |
AI Evaluation & Benchmarking | INQUIREA Natural World Text-to-Image Retrieval Benchmark | https://arxiv.org/pdf/2411.02537 |
AI Evaluation & Benchmarking | AAAR-1.0Assessing AI's Potential to Assist Research | https://arxiv.org/pdf/2410.22394 |
AI Evaluation & Benchmarking | AutoPenBenchBenchmarking Generative Agents for Penetration Testing | https://arxiv.org/pdf/2410.03225 |
AI Evaluation & Benchmarking | CodeMMLUA Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs | https://arxiv.org/pdf/2410.01999 |
AI Evaluation & Benchmarking | UniSumEvalTowards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs | https://arxiv.org/pdf/2409.19898 |
AI Evaluation & Benchmarking | CI-BenchBenchmarking Contextual Integrity of AI Assistants on Synthetic Data | https://arxiv.org/pdf/2409.13903 |
AI Evaluation & Benchmarking | ChemDFM-XTowards Large Multimodal Model for Chemistry | https://arxiv.org/pdf/2409.13194 |
AI Evaluation & Benchmarking | DSBenchHow Far Are Data Science Agents to Becoming Data Science Experts? | https://arxiv.org/pdf/2409.07703 |
AI Evaluation & Benchmarking | GMAI-MMBenchA Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | https://arxiv.org/pdf/2408.03361 |
AI Evaluation & Benchmarking | MMSciA Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding | https://arxiv.org/pdf/2407.04903 |
AI Evaluation & Benchmarking | SciCodeA Research Coding Benchmark Curated by Scientists | https://arxiv.org/pdf/2407.13168 |
AI Evaluation & Benchmarking | MASSWA New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows | https://arxiv.org/pdf/2406.06357 |
AI Evaluation & Benchmarking | Turing Tests For An AI Scientist | https://arxiv.org/pdf/2405.13352 |
AI Evaluation & Benchmarking | LHRS-BotEmpowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model | https://arxiv.org/pdf/2402.02544 |
AI Evaluation & Benchmarking | GAIAa benchmark for General AI Assistants | https://arxiv.org/pdf/2311.12983 |
AI Evaluation & Benchmarking | OceanGPTA Large Language Model for Ocean Science Tasks | https://arxiv.org/pdf/2310.02031 |
AI Evaluation & Benchmarking | LatEvalAn Interactive LLMs Evaluation Benchmark with Incomplete Information from Lateral Thinking Puzzles | https://arxiv.org/pdf/2308.10855 |
AI Evaluation & Benchmarking | BOLAABenchmarking and Orchestrating LLM-augmented Autonomous Agents | https://arxiv.org/pdf/2308.05960 |
AI Evaluation & Benchmarking | MegaWikaMillions of reports and their sources across 50 diverse languages | https://arxiv.org/pdf/2307.07049 |
AI Evaluation & Benchmarking | Learn to ExplainMultimodal Reasoning via Thought Chains for Science Question Answering | https://arxiv.org/pdf/2209.09513 |
AI Evaluation & Benchmarking | Benchmarking Agentic Workflow Generation | https://arxiv.org/abs/2410.07869 |
AI Evaluation & Benchmarking | TheAgentCompanyBenchmarking LLM Agents on Consequential Real World Tasks | https://arxiv.org/abs/2412.14161 |
@misc{xu2025comprehensive,
title={A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications},
author={Renjun Xu and Jingwen Peng},
year={2025},
eprint={2506.12594},
archivePrefix={arXiv},
primaryClass={cs.AI}
}