IdeaScout helps researchers discover ideas from other fields that may transfer to their own research problems.
IdeaScout is a profile-guided toolkit for cross-domain research idea discovery.
Most paper search tools help you find papers that are already close to your topic.
IdeaScout is designed for a different use case:
Find methods, mechanisms, and ideas from other fields that can be adapted to your own task.
For example, a computer vision paper may contain a representation editing idea useful for speech.
A robotics paper may contain a temporal modeling idea useful for audio or video.
A multimodal learning paper may contain an alignment mechanism useful for another domain.
You define your own research profile, including your target task, preferred mechanisms, negative filters, and scoring criteria. IdeaScout then filters a large paper collection, asks an LLM to infer each paper's core idea, and ranks papers by how likely their ideas are to transfer to your research direction.
IdeaScout is not a replacement for reading papers.
It is a tool for reducing the search space and finding promising cross-domain inspiration.
IdeaScout is not meant to answer only:
Is this paper related to my topic?
Instead, it asks:
Can this paper's core idea be transferred to my research problem?
This makes it useful for early-stage research ideation. You can use it to mine ideas from fields such as:
- computer vision;
- natural language processing;
- multimodal learning;
- generative modeling;
- robotics;
- medical imaging;
- speech processing;
- representation learning.
The output is not just a list of similar papers.
It is a ranked list of papers whose ideas may be reusable in your own domain.
When doing research, we often do not only need papers that are about the same task.
We also want to know:
- Can a method from another field inspire my own work?
- Can a mechanism from CV, NLP, multimodal learning, or generative modeling be adapted to my problem?
- Which papers are worth reading because their ideas are transferable, even if their topics are different?
IdeaScout is built for this type of exploration.
It helps you:
- π search for transferable ideas, not only related papers;
- π§ discover useful mechanisms from other research fields;
- βοΈ define your own research profile and scoring criteria;
- π rank papers by transferability, novelty, and feasibility;
- π run large LLM-based scoring jobs with resume and auto-retry;
- π browse scored papers through a lightweight web portal.
IdeaScout separates idea discovery into two stages:
-
Rule-based candidate filtering
A fast stage that selects candidate papers using profile keywords, preferred mechanisms, and negative filters. -
LLM-based idea scoring
A semantic stage where an LLM reads each candidate paper's title and abstract, infers the core idea, identifies the transferable mechanism, and scores the paper against the user's profile.
| Step | What it does |
|---|---|
| 1. Define a profile | Describe your research task, preferred mechanisms, negative filters, and scoring dimensions. |
| 2. Filter candidates | Quickly prune a large paper collection using rule-based heuristics. |
| 3. Score with an LLM | Ask Codex to infer each paper's core idea and judge whether it transfers to your task. |
| 4. Export or browse | Export ranked CSV / JSONL files or inspect them through the web portal. |
- β Profile-guided idea discovery
- β Cross-domain paper screening
- β Rule-based candidate filtering
- β LLM-based core-idea inference and scoring
- β Custom scoring dimensions
- β Resume support for long-running jobs
- β Auto-retry for quota or transient failures
- β JSONL and CSV export
- β FastAPI web portal
- β Example profiles and example input files
research-idea-scout/
βββ README.md
βββ LICENSE
βββ CITATION.cff
βββ pyproject.toml
βββ requirements.txt
βββ assets/
β βββ pipeline_overview.png
β βββ screenshots/
β βββ portal_home.png
β βββ portal_article_library.png
β βββ portal_article_detail.png
βββ configs/
β βββ profile_template.yaml
β βββ profile_speechprivacy_accent_example.yaml
β βββ profile_cv_domain_adaptation_example.yaml
βββ examples/
β βββ example_input.jsonl
βββ idea_scout/
β βββ __init__.py
β βββ io_utils.py
β βββ profile.py
β βββ filter_candidates.py
β βββ codex_idea_score.py
β βββ run_autoretry.py
β βββ export_rankings.py
β βββ prepare_portal_ready.py
β βββ check_progress.py
βββ scripts/
β βββ filter_candidates.py
β βββ score_with_codex.py
β βββ run_autoretry.py
β βββ export_rankings.py
β βββ prepare_portal_ready.py
β βββ check_progress.py
βββ web/
βββ README.md
βββ import_jsonl.py
βββ app/
βββ __init__.py
βββ main.py
βββ static/
β βββ style.css
βββ templates/
βββ base.html
βββ home.html
βββ articles.html
βββ article_detail.html
git clone https://github.com/YOUR_USERNAME/research-idea-scout.git
cd research-idea-scout
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtTo use Codex-based scoring, make sure the Codex CLI is available:
codex login --device-auth
printf 'Reply only OK\n' | codex exec -Expected output:
OK
IdeaScout expects a JSONL file where each line is one paper.
Minimum fields:
{
"title": "A paper title",
"abstract": "The paper abstract.",
"venue": "ICLR",
"year": 2025,
"url": "https://example.com/paper"
}Example:
{"title":"Representation Surgery for Concept Editing","abstract":"We propose a method for identifying and editing concept directions in neural representations...","venue":"ICLR","year":2025,"url":"https://example.com/paper1"}
{"title":"Temporal Style Transfer for Motion Generation","abstract":"This paper introduces a temporal style factorization method for controllable motion generation...","venue":"CVPR","year":2026,"url":"https://example.com/paper2"}Copy the template:
cp configs/profile_template.yaml configs/my_profile.yamlEdit configs/my_profile.yaml.
Example:
project_name: My Research Project
description: >
I want to discover transferable ideas from cross-domain machine learning papers
that may help my own research problem.
target_tasks:
- name: Main task
description: >
Describe your core research problem here.
preferred_mechanisms:
- latent representation editing
- modular adapters
- cross-modal alignment
- controllable generation
- concept erasure
- temporal modeling
positive_keywords:
- representation editing
- disentanglement
- subspace
- latent direction
- retrieval augmentation
- routing
- controllable generation
negative_keywords:
- survey
- benchmark only
- dataset only
- leaderboard
- pure application
scoring_dimensions:
- key: transferability_to_my_task
name: Transferability to my task
description: Whether the paper's core idea can be adapted to my research task.
weight: 2.0
- key: method_novelty
name: Method novelty
description: Whether the paper contains a genuinely interesting method or theory idea.
weight: 1.2
- key: implementation_feasibility
name: Implementation feasibility
description: Whether the idea looks practical enough to implement or test.
weight: 1.0The profile is the main control interface. A precise profile gives more useful rankings.
Run rule-based filtering:
python scripts/filter_candidates.py \
--input examples/example_input.jsonl \
--profile configs/my_profile.yaml \
--output-keep data/candidates.jsonl \
--output-reject data/rejected.jsonl \
--output-summary reports/filter_summary.json \
--target-total 2000 \
--min-score 1.0This produces:
data/candidates.jsonl
data/rejected.jsonl
reports/filter_summary.json
The filtering step is fast and does not call an LLM.
Before running a large job, test one paper first:
python -u scripts/score_with_codex.py \
--input data/candidates.jsonl \
--profile configs/my_profile.yaml \
--output data/test_scores.jsonl \
--failures-output data/test_failures.jsonl \
--top-k 1 \
--max-new-items 1 \
--codex-cmd "codex exec"If the test works, run the full scoring job:
nohup python -u scripts/run_autoretry.py \
--input data/candidates.jsonl \
--profile configs/my_profile.yaml \
--output data/idea_scores.jsonl \
--failures-output data/idea_score_failures.jsonl \
--top-k 2000 \
--codex-cmd "codex exec" \
--batch-size 1 \
--sleep-between-rounds 2 \
--sleep-on-quota 3600 \
--sleep-on-error 600 \
--timeout 900 \
> logs/run_idea_scores_$(date +%F-%H%M%S).out 2>&1 &python scripts/check_progress.py \
--output data/idea_scores.jsonl \
--target-total 2000Or monitor continuously:
watch -n 30 'python scripts/check_progress.py --output data/idea_scores.jsonl --target-total 2000'To inspect the latest log:
tail -f $(ls -t logs/run_idea_scores_*.out | head -1)python scripts/export_rankings.py \
--input data/idea_scores.jsonl \
--output data/top100_ideas.csv \
--top-k 100This gives a ranked CSV file that can be opened in Excel, Numbers, LibreOffice, or any spreadsheet viewer.
Each scored paper contains the original metadata plus compact LLM-generated fields.
Example:
{
"title": "Representation Surgery for Concept Editing",
"venue": "ICLR",
"year": 2025,
"is_suitable": true,
"priority": "keep",
"idea_core": "The paper identifies editable concept directions in neural representations.",
"transferable_mechanism": "Subspace intervention can be reused for controlled representation editing.",
"fit_reason": "The mechanism aligns well with the user-defined profile.",
"risk_or_limitation": "The abstract does not show whether all constraints are preserved.",
"score_overall_fit": 8.0,
"score_theory_novelty": 7.0,
"scores": {
"transferability_to_my_task": 8.0,
"method_novelty": 7.0,
"implementation_feasibility": 6.0
},
"rank_score": 7.55
}IdeaScout includes a lightweight FastAPI web portal for browsing scored papers.
The portal provides:
- a dashboard with corpus-level statistics;
- an article library with search, filtering, and sorting;
- article detail pages with core ideas, transferable mechanisms, risks, and score cards.
First, import an IdeaScout JSONL output file into the portal database:
python web/import_jsonl.py \
--input data/idea_scores.jsonl \
--db web/ideascout_portal.dbThen start the web server:
python -m uvicorn web.app.main:app \
--host 127.0.0.1 \
--port 8080Open:
http://127.0.0.1:8080
When running the portal on a remote server, use SSH port forwarding:
ssh -N -L 8080:127.0.0.1:8080 user@serverThen open the same local URL in your browser:
http://127.0.0.1:8080
IdeaScout includes example profiles for different research directions.
configs/profile_speechprivacy_accent_example.yaml
This profile looks for ideas related to:
- multi-attribute speech disentanglement;
- selective attribute obfuscation;
- accent conversion;
- representation editing;
- leakage control;
- privacy-utility evaluation.
configs/profile_cv_domain_adaptation_example.yaml
This profile looks for ideas related to:
- domain generalization;
- distribution shift;
- test-time adaptation;
- robust representations;
- feature alignment.
These are examples only. The intended use is that each researcher creates their own profile.
If you see errors like:
401 Unauthorized
token_invalidated
refresh_token_invalidated
Your session has ended
Run:
codex logout || true
codex login --device-auth
printf 'Reply only OK\n' | codex exec -Then restart the same scoring command. IdeaScout will resume from the existing output file.
If Codex hits a usage limit, the auto-retry runner will sleep and try again later.
Typical log message:
[SLEEP_QUOTA] sleeping 3600s
Already processed papers are written to disk immediately, so progress is not lost.
Use unbuffered Python:
python -u scripts/run_autoretry.py ...For background jobs:
nohup python -u scripts/run_autoretry.py ... > logs/run.out 2>&1 &ps -ef | grep -E 'run_autoretry|score_with_codex|codex exec' | grep -v grepA practical workflow for large paper collections is:
- Collect papers from conference websites, OpenReview, DBLP, Semantic Scholar, or other sources.
- Convert them into a JSONL file with title and abstract.
- Write a research profile for your own task.
- Run rule-based filtering to keep 1k--5k candidates.
- Run LLM-based idea scoring.
- Export the top 50--200 papers.
- Browse the results in the web portal.
- Read only the most promising papers in depth.
- Use high-ranked ideas to design new methods or experiments.
Planned future features:
- PDF full-text parsing
- OpenReview paper collectors
- Semantic Scholar integration
- Web-based upload of JSONL files
- Multi-profile comparison
- Multi-LLM backend support
- Mechanism-based clustering
- BibTeX export
- Citation graph support
Contributions are welcome.
Good first contributions include:
- adding new example profiles;
- improving prompt templates;
- adding paper collectors;
- improving export and ranking tools;
- improving the web portal;
- adding visualization support.
This project is released under the MIT License.



