Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: fails to terminate cleanly when device full #32384

Open
tbg opened this issue Nov 15, 2018 · 6 comments
Open

roachtest: fails to terminate cleanly when device full #32384

tbg opened this issue Nov 15, 2018 · 6 comments
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team

Comments

@tbg
Copy link
Member

tbg commented Nov 15, 2018

I was looking at the history of restore2TB/nodes=10 and wondered why it had a relatively fast passing result on release-2.1. Looking at the logs, I found that roachtest had shot itself since it ran out of space.

For these runs, roachtest itself is in charge of posting issues. This is problematic because nobody watches the watchman. Regarding the discussion about having roachtest be in charge of posting its issues in more places, I think we may want to take the opposite route and not have it post anything any more.

##teamcity[publishArtifacts '/home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20181114-1012425/scaledata/jobcoordinator/nodes=3/** => scaledata/jobcoordinator/nodes=3']
[04:59:09]
schemachange/tpcc/warehouses=1000/nodes=5 (3m:58s)
[04:59:36]
[ 545] rebalance/3to5: waiting for reblance (36m7s)
[04:59:36]
[ 545] restore2TB/nodes=10: running restore (1h8m45s)
[04:59:36]
[ 545] scaledata/jobcoordinator/nodes=6: ??? (0s)
[04:59:36]
[ 545] schemachange/kv: loading fixture (10s)
[04:59:36]
[ 545] schemachange/tpcc/warehouses=1000/nodes=5: ??? (0s)
[04:59:58]
panic: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20181114-1012425/rebalance/3to5/test.log: no space left on device [recovered]
[04:59:58]
	panic: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20181114-1012425/rebalance/3to5/test.log: no space left on device [recovered]
[04:59:58]
	panic: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20181114-1012425/rebalance/3to5/test.log: no space left on device
[04:59:58]
[04:59:58]
goroutine 399624 [running]:
[04:59:58]
main.(*monitor).Go.func1.1(0xc420c27f67, 0xc420c27f88)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1457 +0xed
[04:59:58]
panic(0x25f4500, 0xc421bc50e0)
[04:59:58]
	/usr/local/go/src/runtime/panic.go:502 +0x229
[04:59:58]
main.(*monitor).Go.func1.2(0xc420c27f67)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1476 +0x85
[04:59:58]
panic(0x25f4500, 0xc421bc50e0)
[04:59:58]
	/usr/local/go/src/runtime/panic.go:502 +0x229
[04:59:58]
main.(*logger).Printf(0xc420c77b80, 0x290346d, 0x3, 0xc420c27d40, 0x1, 0x1)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/log.go:219 +0xb3
[04:59:58]
main.waitForRebalance(0x2e2ec40, 0xc421b04700, 0xc420c77b80, 0xc420790640, 0x4045000000000000, 0x0, 0x0)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/allocator.go:232 +0x2de
[04:59:58]
main.registerAllocator.func1.3(0x2e2ec40, 0xc421b04700, 0xc4218d2767, 0x0)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/allocator.go:73 +0xbf
[04:59:58]
main.(*monitor).Go.func1(0x0, 0x0)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1481 +0xd8
[04:59:58]
github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1(0xc421b04740, 0xc420a68460)
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:58 +0x57
[04:59:58]
created by github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go
[04:59:58]
	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:55 +0x66
[04:59:58]
+ exit_status=0
[04:59:58]
++ find artifacts/20181114-1012425 -name stats.json
[04:59:58]
+ for file in '$(find ${artifacts#${PWD}/} -name stats.json)'
[04:59:58]
+ gsutil cp artifacts/20181114-1012425/interleavedpartitioned/8.logs/stats.json gs://cockroach-nightly/artifacts/20181114-1012425/interleavedpartitioned/8.logs/stats.json
[05:00:01]
Process SyncManager-1:
[05:00:01]
Traceback (most recent call last):
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
[05:00:01]
    self.run()
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
[05:00:01]
    self._target(*self._args, **self._kwargs)
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 550, in _run_server
[05:00:01]
    server = cls._Server(registry, address, authkey, serializer)
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 162, in __init__
[05:00:01]
    self.listener = Listener(address=address, backlog=16)
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 127, in __init__
[05:00:01]
    address = address or arbitrary_address(family)
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/connection.py", line 90, in arbitrary_address
[05:00:01]
    return tempfile.mktemp(prefix='listener-', dir=get_temp_dir())
[05:00:01]
  File "/usr/lib/python2.7/multiprocessing/util.py", line 139, in get_temp_dir
[05:00:01]
    tempdir = tempfile.mkdtemp(prefix='pymp-')
[05:00:01]
  File "/usr/lib/python2.7/tempfile.py", line 331, in mkdtemp
[05:00:01]
    dir = gettempdir()
[05:00:01]
  File "/usr/lib/python2.7/tempfile.py", line 275, in gettempdir
[05:00:01]
    tempdir = _get_default_tempdir()
[05:00:01]
  File "/usr/lib/python2.7/tempfile.py", line 217, in _get_default_tempdir
[05:00:01]
    ("No usable temporary directory found in %s" % dirlist))
[05:00:01]
IOError: [Errno 2] No usable temporary directory found in ['/home/agent/temp/buildTmp', '/home/agent/temp/buildTmp', '/home/agent/temp/buildTmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/agent/work/.go/src/github.com/cockroachdb/cockroach']
[05:00:02]
OSError: No space left on device.

Epic CRDB-10428

Jira issue: CRDB-4752

@tbg tbg added the A-testing Testing tools and infrastructure label Nov 15, 2018
@tbg
Copy link
Member Author

tbg commented Nov 15, 2018

In addition to not posting an issue, we're also not marking the tests that were uncompleted as failures in teamcity. Our Go test output parser does this.

@petermattis
Copy link
Collaborator

Well that's not good. Will an external poster be able to post if the disk is full?

@tbg
Copy link
Member Author

tbg commented Nov 15, 2018

Yes, with the external poster it would see Go test output and fail all tests that weren't explicitly terminated. That's the "test ended in panic" message we see in such cases. It would not, however, fail tests that weren't ever mentioned in the logs. To work around that, roachtest should emit a header for all tests that it's going to run and immediately pause them via RUN/PAUSE/CONT.

@petermattis petermattis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 19, 2018
@github-actions
Copy link

github-actions bot commented Jun 5, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Jun 5, 2021

still relevant

@exalate-issue-sync exalate-issue-sync bot added T-testeng TestEng Team and removed T-dev-inf labels Mar 4, 2022
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz knz added this to Triage in Test Engineering via automation Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-testeng TestEng Team
Projects
Development

No branches or pull requests

4 participants