Skip to content

Pipeline Phases

Raghav Kattel edited this page Jun 1, 2026 · 1 revision

Pipeline Phases

The paper pipeline runs 10 sequential phases. Each phase must complete before the next begins. Verification and review phases can loop back for revision.

Phase 0: Voice Calibration

Input: 2-3 paragraphs of the author's published writing Output: Voice style profile JSON

The Research Director analyzes:

  • Sentence length distribution
  • Vocabulary level and word choice
  • Paragraph structure and transitions
  • Punctuation habits
  • Common constructions

This profile is loaded by every Writer agent. Writers match the author's voice at the sentence level.

Phase 1: Literature Review

Input: Research topic query Output: Merged, deduplicated literature corpus

5 parallel scouts hit different sources:

Scout Source Max Results
1 arXiv OAI-PMH 200
2 Semantic Scholar 100
3 CrossRef 50
4 OpenAlex 50
5 Field-specific (NCBI, etc.) Varies

Results are deduplicated by title similarity (first 80 chars). Each paper includes title, authors, abstract, citation count, key claims, methodology, limitations.

Phase 1.5: Novelty Generation

Input: Literature corpus Output: 50+ hypotheses scored by novelty × tractability

All 6 novelty engines run in parallel:

Engine Angle Output
Contrarian Invert field claims 10 counter-hypotheses
Cross-Pollinator Import from distant fields Top 5 analogies
Assumption Excavator Find unstated assumptions 5 testable assumptions
Counterfactual Generator Rewrite field history 5 counterfactual histories
Paradox Sifter Cross-reference limitations Paradoxes + elephants
Heretic 50 wild guesses from title alone 50 hypotheses + haunting idea

The orchestrator scores all hypotheses by:

  • Novelty (1-10): Has this been explored?
  • Tractability (1-10): Can we test this?
  • Evidence (1-10): How much partial evidence exists?

Top 3 become the paper's contribution.

Phase 2: Hypothesis Selection

Top hypotheses are selected (user input or automatic). Each includes:

  • Primary claim with evidence base
  • Gap in literature it fills
  • Proposed experimental approach

Phase 3: Methodology Design

The Methodology Designer agent:

  1. Recommends correct statistical tests
  2. Performs power analysis
  3. Designs experimental protocol
  4. Flags confounds and controls
  5. Outputs reproducible analysis plan

Phase 4: Data Engineering

The Data Engineer agent:

  1. Writes Python analysis code
  2. Generates publication-ready figures (SciencePlots, 300 DPI, colorblind-safe)
  3. Computes all statistics
  4. Outputs: analysis.py, figures/*.pdf, statistical_report.json

Phase 5: Parallel Writing

5 Writer subagents write simultaneously:

Writer Section
1 Abstract
2 Introduction
3 Methods
4 Results
5 Related Work + Discussion

Hard constraints (all 41 Humanizer patterns):

  • No significance inflation ("pivotal", "transformative")
  • No AI vocabulary ("showcasing", "underscores")
  • No em dashes (ZERO tolerated)
  • No synonym cycling
  • No passive voice without actor
  • No filler phrases
  • No "further research is needed"
  • No "state-of-the-art"

Phase 6: Verification

3 parallel verification modules:

Citation Verifier: Every citation checked against Semantic Scholar AND CrossRef. If a paper doesn't exist or doesn't contain the claimed result, it's flagged as hallucinated.

Statistical Auditor: Every p-value, test statistic, and error bar validated. Checks for p-hacking, multiple comparisons, power issues.

AI-Pattern Detector: Every sentence scanned for all 41 patterns. Density must be < 2/1000 words. Any em dash = reject.

Phase 7: Adversarial Review

10 reviewer personas read the complete paper independently:

Persona Focus
Theorist Formal proofs, mathematical rigor
Empiricist Experimental design, baselines
Pragmatist Practical applicability
Skeptic "Your results are wrong"
Historian Prior art, citation accuracy
Methodologist Statistical correctness
Ethicist Societal implications
Competitor Novelty relative to existing work
Student Clarity and accessibility
Dreamer "What if you went further?"

Each returns score (1-10), strengths, weaknesses, and recommendation. All 10 must pass.

Phase 8: Revision

Writers receive annotated critiques and revise. Loop continues until all 10 reviewers accept or the orchestrator determines critiques are adequately addressed.

Phase 9: Style Audit

The Style Auditor runs the complete paper through:

  • All 41 Humanizer patterns scan
  • Em dash count (must be ZERO)
  • Pattern density (< 1 per 2000 words)
  • Voice consistency with author profile

PASS → Formatting. FAIL → Back to Writer with line-level annotations.

Phase 10: Formatting → Submission

The Formatter:

  1. Loads venue-specific LaTeX template
  2. Generates BibTeX from verified citations
  3. Embeds figures at 300 DPI
  4. Compiles to PDF
  5. Outputs: paper.tex, references.bib, paper.pdf

Supported venues: NeurIPS, ICML, ICLR, Nature, arXiv (templates are stub — contributions welcome!)