-
Notifications
You must be signed in to change notification settings - Fork 212
Collect log using SerialConsole when SshShell fails to connect #3845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ssh may not succeed due to many reasons, at present if ssh fails it is either very difficult or impossible to triage the issue without reproducing it. This change uses SerialConsole to run commands for log collection.
@squirrelsc @LiliDeng , this PR is for early design review. This will enhance triaging of failures for both engineers and AI |
|
||
# Prepare the RunCommandInput for Azure | ||
command = RunCommandInput( | ||
command_id="RunShellScript", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Linux specific? Consider whether you can add support for Windows. If not possible, think about how to exclude the Feature for non-Posix systems.
5a994dd
to
ff40ee5
Compare
ff40ee5
to
bc92811
Compare
df59804
to
a69b551
Compare
Introduce NonSshExecutor. It Internally uses SerialConsole/RunCommand
a69b551
to
84d4f4a
Compare
:return: A string containing the output of the executed commands. | ||
""" | ||
out = [] | ||
serial_console = self._node.features[SerialConsole] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if the feature is not available, raise an exception.
self.log.debug( | ||
f"Failed to collect logs using non-ssh executor: {log_error}" | ||
) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise
if self.features.is_supported(NonSshExecutor): | ||
non_ssh_executor = self.features[NonSshExecutor] | ||
out = non_ssh_executor.execute() | ||
return "\n".join(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log could be output in this method directly, instead of return and output in upper level.
execution method for scenarios where SSH connectivity is not possible or desired. | ||
""" | ||
|
||
COMMANDS_TO_EXECUTE = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commands are not related to NonSshExecutor
, they should be moved to node level.
password = self._connection_info.password | ||
if not password: | ||
raise RequireUserPasswordException( | ||
"The password is not set, and generate is False." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The password has not been set and generated.
@@ -190,6 +191,34 @@ def check_initramfs( | |||
f"{initramfs_logs} {filesystem_exception_logs}" | |||
) | |||
|
|||
def login(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure_login
ssh may not succeed due to multiple reasons, at present if ssh fails it is either very difficult or impossible to triage the issue without reproducing it.
This change uses NonSshExecutor to run commands for log collection.