Skip to content

CASSANDRA-21301: add AGENTS.md and CLAUDE.md#4734

Open
maoling wants to merge 1 commit intoapache:trunkfrom
maoling:CASSANDRA-21301
Open

CASSANDRA-21301: add AGENTS.md and CLAUDE.md#4734
maoling wants to merge 1 commit intoapache:trunkfrom
maoling:CASSANDRA-21301

Conversation

@maoling
Copy link
Copy Markdown
Member

@maoling maoling commented Apr 12, 2026

Comment thread AGENTS.md

## Environment

- Java 11 (default) or 17.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support 21 in runtime now as well

Comment thread AGENTS.md
- Commit messages should reference the JIRA issue. Disclose that AI assistance was used in the PR description.

```
CASSANDRA-XXXXX: Brief description of the change
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format does not match the agreed commit message described in https://cassandra.apache.org/_/development/how_to_commit.html

Comment thread AGENTS.md

```bash
# Run a single unit test class
ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest -Dtest.methods=testGetAllRangesEmpty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differentiating how to run unit vs. integration could be useful. Though I think it's more likely that just adding a SKILL.md for running tests that has a distilled extraction of guidance from build.xml is probably the better way to do that.

So - maybe a quick example here + pointing to build.xml w/some string bread crumbs to know what to search for could be a decent 1st step.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an area i feel we should have a script. For example the above is 100% wrong, and is what 90% of Cassandra dev use =D

CI uses testclasslist which has different JVM arguments, so if you test using testsome and it fails in CI you will struggle to figure out why... they are not the same!

We also have different test scopes and they have their own command or special flags... this is way too much to put in a agents file and really should just be a script to make it idiot proof (LLMs do stupid things, try to make this deterministic when possible)

Comment thread AGENTS.md
Comment on lines +15 to +22
## Build

```bash
ant build # compile all classes (includes Accord submodule)
ant jar # build the main JAR
ant clean # remove locally created artifacts
ant realclean # remove entire build directory and downloaded artifacts
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this PR at all, but i find its best to not let the harnest know / touch ant as its wasteful for tokens. Locally I have 2 scripts

ai-ci-test <test> -- runs the test and strips the output so its "success" *or* the failing task
ai-build -- `ant clean && ant build` but strips the output so its "success" or the failing task

Comment thread AGENTS.md
ant realclean # remove entire build directory and downloaded artifacts
```

Do NOT run `ant build` if you only need to verify a small change compiles.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you test compile then?

Comment thread AGENTS.md
ant testsome -Dtest.name=org.apache.cassandra.service.StorageServiceServerTest -Dtest.methods=testGetAllRangesEmpty
```

- When fixing a bug, first create a regression test that reproduces the failure, then implement the fix and verify.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this patch, but this is an area where skills can help. How to work with CQLTester, how to work with jvm-dtest, how to work with qt or stateful, how to work with the simulator; etc.

first create a regression test that reproduces the failure

do we have any example JIRA where this went well? I find claude makes really horrible tests. It will try to create a fake unit test that makes no sense then it defines success as it made the logic handle its test... but you actually go through Cassandra and everything is broken.

Comment thread AGENTS.md
```

- When fixing a bug, first create a regression test that reproduces the failure, then implement the fix and verify.
- Provide test(s) coverage for all new or modified code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how? this is vague so LLMs are likely to make up random numbers; if we care about this (which we don't track) it should be deterministic

Comment thread AGENTS.md
Comment on lines +41 to +42
ant check # runs checkstyle, RAT license check, and builds
ant checkstyle # checkstyle only
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason the harness should use checkstyle vs check? the more we add the more likely it will do the wrong thing

Comment thread AGENTS.md
Comment on lines +45 to +53
## Code Style
Cassandra enforces style via Checkstyle (`ant checkstyle`). Key rules are included in `checkstyle.xml` file.
General style:
- 4-space indentation, no tabs.
- Braces on a new line below control statements (Allman style).
- Brace-less style for single-line control statements.
- Match existing code style in the file you are editing.
- All new files must include the Apache License 2.0 header.
- Concise English documentation is required for complex classes and methods; trivial ones may not require them.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this is misleading and the actual style guide is at https://cassandra.apache.org/_/development/code_style.html

The only way for a agent to know our style guide is to read that rule; and this is manual and lacking deterministic checks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That style guide comes from the code right? We could just have a .md redirect pointing to that file for digging into specifics around style. Not sure when an LLM would be triggered to check that vs. just inferring from the local context from files.

Copy link
Copy Markdown
Contributor

@dcapwell dcapwell Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this in the website project?

$ # in cassandra dir
$ fd code_s
$

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✓  ~/src/github/apache/cassandra-website git:(trunk) $ fd code_s
site-content/source/modules/ROOT/pages/development/code_style.adoc

Comment thread AGENTS.md
Comment on lines +63 to +66
Co-authored-by: GitHub Copilot
Co-authored-by: Claude
Co-authored-by: gemini-code-assist
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is agreed to yet. Mick proposed in slack (informal) to use the Linux Kernal way of Assisted-by; as of this moment we don't have a syntax agreed to.

Comment thread AGENTS.md
Co-authored-by: gemini-code-assist
```

- Do NOT modify submodule references without understanding the implications. Submodule changes must be committed and pushed before the parent Cassandra commit.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this comment? this tells the LLM to hack around Accord rather than make clean fixes. Why are we asking harnesses to own git?

Comment thread AGENTS.md
- 🚫 Never commit secrets, credentials, or API keys.
- 🚫 Never run the full test suite (`ant test`) — it takes hours. Run targeted tests only.
- 🚫 Never bypass Checkstyle violations without a suppression comment explaining why.
- 🚫 Never create summary or documentation files unless explicitly asked.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like this should be removed; harnesses are better at maintaining docs than us humans... so actually making sure features are documented should be desired?

Comment thread AGENTS.md
- 🚫 Never bypass Checkstyle violations without a suppression comment explaining why.
- 🚫 Never create summary or documentation files unless explicitly asked.
- ⚠️ Ask before modifying the CQL grammar (`src/antlr/Cql.g`) — changes cascade widely.
- ⚠️ Ask before modifying `modules/accord/` — it is a separate repository.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a git thing, this tells the harness to hack when the proper solution is to update accord.

@jmckenzie-dev
Copy link
Copy Markdown
Contributor

Broadly, LLM's aren't as good at being told what not to do vs. being told what to do. So we should try and err on the side of being "positively oriented" when possible.

@dcapwell - do you have some scripts for the CI and testing stuff we could bundle in with this PR?

@dcapwell
Copy link
Copy Markdown
Contributor

yeah, i should be able to add the 3 scripts i use...

@rustyrazorblade
Copy link
Copy Markdown
Contributor

Broadly, LLM's aren't as good at being told what not to do vs. being told what to do. So we should try and err on the side of being "positively oriented" when possible.

There's some truth to this, but I think it's still useful to say what not to do. I've found in my own repos telling the LLM not to write mock-echo tests where it just verifies the mocked thing returns results, to be quite helpful. I think it helps a lot to provide the positive direction in addition to the negative.

Comment thread AGENTS.md

## Git Workflow
- Do NOT commit unless explicitly asked.
- Commit messages should reference the JIRA issue. Disclose that AI assistance was used in the PR description.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly I think we should remove this. Logically this is only needed if the committer pushes this on the author (many don't), and is done after the review.

For agent files you need them to work in 100% of sessions, and this section isn't applicable majority of the time.

For example, how does it learn the JIRA?, how does it learn the reviewers? how does it learn that a reviewer gave code feedback and should be be included as a co-author? what should the commit message be and is it in-sync with JIRA (we are required to push context to JIRA as a source of truth)?

@dcapwell
Copy link
Copy Markdown
Contributor

going to push review branch in a second; here are example usage of the scripts

$ ai-build
BUILD FAILED
==================================================

Output from failed target 'None':
----------------------------------------
Buildfile: /Users/dcapwell/src/github/apache/cassandra/trunk/build.xml
BUILD FAILED
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:223: Unsupported JDK version used: 25
Total time: 0 seconds
$ setjdk 21
$ ai-build
BUILD SUCCESSFUL
$ vim src/java/org/apache/cassandra/config/Config.java # make it not compile
$ ai-build
BUILD FAILED
Failed target: _build_java
==================================================

Output from failed target '_build_java':
----------------------------------------
_build_java:
     [echo] Compiling for Java 21...
    [javac] Compiling 3194 source files to /Users/dcapwell/src/github/apache/cassandra/trunk/build/classes/main
    [javac] Note: Annotation processing is enabled because one or more processors were found
    [javac]   on the class path. A future release of javac may disable annotation processing
    [javac]   unless at least one processor is specified by name (-processor), or a search
    [javac]   path is specified (--processor-path, --processor-module-path), or annotation
    [javac]   processing is enabled explicitly (-proc:only, -proc:full).
    [javac]   Use -Xlint:-options to suppress this message.
    [javac]   Use -proc:none to disable annotation processing.
    [javac] /Users/dcapwell/src/github/apache/cassandra/trunk/src/java/org/apache/cassandra/config/Config.java:18: error: ';' expected
    [javac] package org.apache.cassandra.config
    [javac]                                    ^
    [javac] 1 error
BUILD FAILED
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:693: The following error occurred while executing this line:
/Users/dcapwell/src/github/apache/cassandra/trunk/build.xml:678: Compile failed; see the compiler error output for details.
Total time: 10 seconds

Comment thread CLAUDE.md
@@ -0,0 +1 @@
@AGENTS.md No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a symlink in git, so when you checkout you get

$ cat CLAUDE.md
@AGENTS.md%

@jmckenzie-dev
Copy link
Copy Markdown
Contributor

There's some truth to this, but I think it's still useful to say what not to do. I've found in my own repos telling the LLM not to write mock-echo tests where it just verifies the mocked thing returns results, to be quite helpful. I think it helps a lot to provide the positive direction in addition to the negative.

As is tradition, I communicated the tip of the iceberg on what was in my head. 😭

What I meant to get across: we should absolutely do both, but be aware that this is a current limitation so lean on the positive hard knowing they'll generally comply with that and then all caps "THE WORLD WILL END IF YOU DO THIS BAD THING" + copy/paste multiple times to tweak the attention mechanism and get it to maybe respect what we tell it not to do.

So yeah - quite helpful, but understanding that the "mathematical vibe" is probably like 4:1 in terms of impact on saying what an LLM should do vs. what they should not in terms of their compliance. Plus it varies based on model too.

Wild West.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants