New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add collectInfo command #10596
Add collectInfo command #10596
Conversation
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
This change adds a wrapper to SSH executing a command to ShellUtils, together with SCP commands. `ShellUtils` class refactored with inner classed extracted to standalone classes. PR #10596 relies on this utility as it extensively invokes commands over SSH. pr-link: #10658 change-id: cid-d4d74b1d4b1aff513fe298effee509f620e5821c
b2f984b
to
fc2d17e
Compare
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Good idea. Now |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test PASSed. |
Test PASSed. |
@jiacheliu3 Thanks! Can you update the PR title and description with the updated syntax? |
new AlluxioCommand(mAlluxioPath, "fs mount"), null); | ||
registerCommand("version", | ||
new AlluxioCommand(mAlluxioPath, "version -r"), null); | ||
registerCommand("job", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradyoo is there a better api for getting JobService information instead of dumping the list of JobID's? If not, I'd rather we just collect the number of jobs instead of the jobs themselves. A list of JobId's will have almost not use to an external reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
job ID's might not be immediately useful, but a string dump of the Job type and config for each would be the most useful IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @ns1123 job ls in 2.2 prints Job Type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
registerCommand("job", | ||
new AlluxioCommand(mAlluxioPath, "job ls"), null); | ||
registerCommand("journal", | ||
new AlluxioCommand(mAlluxioPath, String.format("fs ls -R %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the journal is local not UFS. The journal is pretty important for our review so I'd definitely try to handle at least HDFS journal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logic to check if the journal dir is local or hdfs.
Local -> ls -R journalDir
Hdfs -> hdfs dfs -ls -R journalDir
Are there any more logic that can be helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pretty much covers all scenarios that we want to.
shell/src/main/java/alluxio/cli/bundler/command/CollectEnvCommand.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
protected void registerCommands() { | ||
registerCommand("ps", new ShellCommand(new String[]{"ps", "-ef", "|grep alluxio*"}), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to also grab information about a presto/spark/yarn process if it's colocated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added attempts to grab Spark/Yarn/Hdfs/Presto.
Map<String, String> procs = new HashMap<>(); | ||
|
||
// Get Jps output | ||
String[] jpsCommand = new String[]{"jps"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, jstack will only work if run within the same cgroup. Even sudo will not be able to jstack another user process correctly. WRT too many java processes, that is one of the pieces of information we want to be aware of. I think it's perfectly fine to grab all java processes on the machine.
shell/src/main/java/alluxio/cli/bundler/command/CollectMetricsCommand.java
Show resolved
Hide resolved
This is done. Thanks for the catch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Only a couple minor comments.
@ns1123 PTAL now the |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test FAILed. |
Test FAILed. |
Merged build finished. Test PASSed. |
Test PASSed. |
@bf8086 @ns1123 if you have great ideas on improvements that are not low-hanging fruits (like a smarter extra command option), I would propose we leave a mark there and I can address it in the phase 2 improvement on this command. This PR is getting a little too long to keep track and keep focused on all aspects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM. Thanks @jiacheliu3 for doing this! looking forward to getting to use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
alluxio-bot, merge this please. |
Add Alluxio CollectInfo command set to Alluxio command line. This command will collect information on every host in the Alluxio cluster, tarball the information into a local tarball file, then gather all the tarballs back to where the command is issued.
This command is designed to help troubleshooting and collect information from Alluxio clusters. The collected information include logs, config files and it also executes a set of commands on each of the hosts and include outputs in the tarball.
Added commands:
This will run one or all checks on the local machine and write the output into files in
targetDir
. At the end of execution a tarball will be produced containing all the information collected, put in thetargetDir
as well.bin/alluxio collectInfo --local [all/collectConfig/collectEnv/collectLog/collectMetrics/collectAlluxioInfo] targetDir
This will run the corresponding
collectInfo --local
command on all hosts in the Alluxio cluster defined inconf/masters
andconf/workers
via SSH. After that the tarballs generated by each individualinfoBundle
call will be SCPed into a local temp directory, and then put into one final tarball in thetargetDir
.bin/alluxio collectInfo [all/collectConfig/collectEnv/collectLog/collectMetrics/collectAlluxioInfo] targetDir
Phase 2:
This PR is the Phase 1 of CollectInfo feature. This PR provides basic implementations and defines the structure.
Some features are not implemented in this PR and scheduled to a separate Phase 2 PR. These features include the following:
-f
option to force overwrite existing work.-c
or--components
option to copy only certain logs.-h
or--hosts
option to only invokecollectInfo
on certain hosts.conf/
files that contain credentials.