Skip to content

ROADMAP 260304 #23

@chenhaot

Description

@chenhaot
  • Evaluation
    One of the big problem in this space is that there is no public benchmark for what thorough reviews should look like. We should have a scalable way to collect this benchmark.

  • Algorithm design
    The current approach uses incremental summarization. It will have trouble for long-term dependency.
    Test recursive language modeling for this task.
    Turn the key idea into a claude skill
    Integrate with https://github.com/ChicagoHAI/MechEvalAgent/ to have execution-grounded evaluation

  • Interaction
    This tool would be much more useful if the users can address the comments interactively.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions