Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: copyfrom: command is too large #121413

Closed
cockroach-teamcity opened this issue Mar 30, 2024 · 5 comments · Fixed by #124637
Closed

roachtest: copyfrom: command is too large #121413

cockroach-teamcity opened this issue Mar 30, 2024 · 5 comments · Fixed by #124637
Assignees
Labels
branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 30, 2024

roachtest.copyfrom/crdb-nonatomic/sf=1/nodes=1 failed with artifacts on release-24.1 @ 5d952f80b3e1efe2e9aaed73f1fd68433880fcb7:

(copyfrom.go:101).runTest: COMMAND_PROBLEM: exit status 1
(monitor.go:154).Wait: monitor failure: monitor user task failed: t.Fatal() was called
test artifacts and logs in: /artifacts/copyfrom/crdb-nonatomic/sf=1/nodes=1/run_1

Parameters:

  • ROACHTEST_arch=amd64
  • ROACHTEST_cloud=gce
  • ROACHTEST_coverageBuild=false
  • ROACHTEST_cpu=4
  • ROACHTEST_encrypted=false
  • ROACHTEST_metamorphicBuild=false
  • ROACHTEST_ssd=0
Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-37235

@cockroach-teamcity cockroach-teamcity added branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Mar 30, 2024
@cockroach-teamcity cockroach-teamcity added this to the 24.1 milestone Mar 30, 2024
@yuzefovich
Copy link
Member

ERROR:  command is too large: 67901396 bytes (max: 67108864)

We might need to adjust the test a bit.

@yuzefovich yuzefovich removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 1, 2024
@yuzefovich yuzefovich changed the title roachtest: copyfrom/crdb-nonatomic/sf=1/nodes=1 failed roachtest: copyfrom: command is too large Apr 1, 2024
@DrewKimball
Copy link
Collaborator

Related to #117070

@DrewKimball
Copy link
Collaborator

Marking this as p-3, since it's an issue with our testing.

@DrewKimball DrewKimball added the P-3 Issues/test failures with no fix SLA label Apr 9, 2024
@rytaft rytaft added X-duplicate Closed as a duplicate of another issue. and removed X-duplicate Closed as a duplicate of another issue. labels Apr 17, 2024
@yuzefovich yuzefovich added P-2 Issues/test failures with a fix SLA of 3 months and removed P-3 Issues/test failures with no fix SLA labels May 9, 2024
@yuzefovich
Copy link
Member

It's interesting to note that in the last 3 failures (earlier ones no longer have artifacts) we have this right before the error:

W240504 06:29:19.370716 3372 kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:731 ⋮ [T1,Vsystem,n1,client=10.142.0.18:51820,hostssl,user=‹importer›] 266  a transaction has hit the intent tracking limit (kv.transaction.max_intents_bytes); is it a bulk operation? Intent cleanup will be slower. txn: "unnamed" meta={id=99bc6814 key=/Table/104/1/‹96192›/‹2›/‹0› iso=Serializable pri=0.00689756 epo=0 ts=1714804155.827999065,0 min=1714804155.827999065,0 seq=0} lock=true stat=PENDING rts=1714804155.827999065,0 wto=false gul=1714804156.327999065,0 ba: ‹31924 CPut, 1 EndTxn, 255392 InitPut›
E240504 06:29:19.380100 3372 9@sql/conn_executor.go:3097 ⋮ [T1,Vsystem,n1,client=10.142.0.18:51820,hostssl,user=‹importer›] 267  error executing ‹CopyIn: COPY lineitem FROM STDIN WITH (FORMAT CSV, DELIMITER '|')›: command is too large: 67901396 bytes (max: 67108864)

I wonder whether hitting this intent tracking limit somehow makes the raft command larger.

@yuzefovich
Copy link
Member

I don't understand why the failure would be non-deterministic, but I think the main problem is that our estimate of using MaxCommandSize / 3 is too inflexible and too aggressive for TPCH lineitem table because it has 8 secondary indexes, so for each input row we produce 9 KV operations. I'll send a patch to make the fraction depend on the number of indexes in the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-queries SQL Queries Team
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants