New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: temporary fix for "No space left on device" errors #1335
Conversation
41a13ec fixed a longstanding bug that the lab was relying on. Before the bug was fixed, the get_wwn_id_map function was doing: try: r = remote.run( args=[ 'ls', '-l', '/dev/disk/by-id/wwn-*', ], stdout=StringIO(), ) stdout = r.stdout.getvalue() except Exception: log.info('Failed to get wwn devices! Using /dev/sd* devices...') return dict((d, d) for d in devs) The bug was that "remote.run" was putting single quotes around the string "/dev/disk/by-id/wwn-*" because it wasn't enclosed in Raw(...). The single quotes were causing the command to fail, triggering the except clause, and that was happening 100% of the time. The fix in 41a13ec caused the command to start succeeding, which caused execution to continue. As a result, MON stores and OSDs started getting created on the wrong devices, and tests that were previously succeeding started to fail due to "No space left on device". In short, the wwn devices on today's smithis are not big enough for /var/lib/ceph. This commit "fixes the fix" by dropping the dead code and always returning the value that qa/tasks/ceph.py has come to expect. Fixes: https://tracker.ceph.com/issues/42313 Signed-off-by: Nathan Cutler <ncutler@suse.com>
This is not precise description, in fact since the scratch contents is used in devs parameter the get_wwn_id_map just returns the empty dictionary:
the logs if we revert this behavior with this patch.
|
retest this please |
please remove this line:
because this ticket is not related to this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Disagree -- this fixes the issue described in that ticket. Please stop fighting get teuthology working again. |
It is different problem, please take a look at the log in the description it is using teuthology of the version prior to this change. |
Again -- stop fighting this. I updated the description of the ticket. |
@dillaman thank you, you're my hero. |
41a13ec fixed a longstanding bug that the lab
was relying on. Before the bug was fixed, the get_wwn_id_map function was doing:
The bug was that "remote.run" was putting single quotes around the string
"/dev/disk/by-id/wwn-*" because it wasn't enclosed in Raw(...). The single
quotes were causing the command to fail, triggering the except clause, and that
was happening 100% of the time.
The fix in 41a13ec caused the command to start
succeeding, which caused execution to continue. As a result, MON stores and
OSDs started getting created on the wrong devices, and tests that were
previously succeeding started to fail due to "No space left on device".
In short, the wwn devices on today's smithis are not big enough for
/var/lib/ceph.
This commit "fixes the fix" by dropping the dead code and always returning the
value that qa/tasks/ceph.py has come to expect.
Fixes: https://tracker.ceph.com/issues/42313
Signed-off-by: Nathan Cutler ncutler@suse.com