Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4566: Create tool for recursive snapshot analysis #1902

Closed
wants to merge 4 commits into from

Conversation

BukrosSzabolcs
Copy link
Contributor

add a tool to recursively collect and display child count and data
stored in sub-trees

add a tool to recursively collect and display child count and data
stored in sub-trees
}

public void run(String snapshotFileName, String startingNode, int maxDepth) throws IOException {
InputStream is =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATH_TRAVERSAL_IN: This API (java/io/FileInputStream.(Ljava/lang/String;)V) reads a file whose location might be specified by user input

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

InputStream is =
new CheckedInputStream(new BufferedInputStream(new FileInputStream(snapshotFileName)),
new Adler32());
InputArchive ia = BinaryInputArchive.getArchive(is);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RESOURCE_LEAK: resource of type java.util.zip.CheckedInputStream acquired by call to new() at line 69 is not released after line 72.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

DataTree dataTree = new DataTree();
Map<Long,Integer> sessions = new HashMap<Long,Integer>();

fileSnap.deserialize(dataTree, sessions, ia);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RESOURCE_LEAK: resource of type java.io.DataInputStream acquired by call to getArchive(...) at line 72 is not released after line 79.
Note: potential exception at line 79

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
else
. "$ZOOBINDIR"/zkEnv.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC1091: Not following: ./zkEnv.sh was not specified as input (see shellcheck -x).

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC1091: Not following: ./../libexec/zkEnv.sh was not specified as input (see shellcheck -x).

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

. "$ZOOBINDIR"/zkEnv.sh
fi

"$JAVA" -cp "$CLASSPATH" $JVMFLAGS \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2086: Double quote to prevent globbing and word splitting.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor

@symat symat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice tool, thanks for sharing!! A few minor comments:

*/
public static void main(String[] args) throws Exception {
if (args.length != 3) {
System.err.println("USAGE: SnapshotSumFormatter snapshot_file starting_node max_depth");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you be a bit more "helpful" :) here? like:

SnapshotSumFormatter is a tool traversing the whole snapshot and summing up the data in the subtrees.

USAGE:   

SnapshotSumFormatter  <snapshot_file>  <starting_node>  <max_depth>

    snapshot_file:    path to the zookeeper snapshot
    starting_node:    the path in the zookeeper tree where the traversal should begin
    max_depth:        the depth where you would like to get the output (data in paths longer than the depth will be still summed up, but won't be printed out)

Or something like this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw your documentation in the code above the class, that might be even better...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@symat Thank you for your feedback!
I'm trying to follow the patterns in SnapshotFormatter and will apply or ignore sonatypes based on that. If that is acceptable for you.
I have extended the Tools documentation with a section on the new tool. Renamed it to SnapshotRecursiveSummar / zkSnapshotRecursiveSummaryToolkit.sh and added an extended usage based on your suggestion.

what about also printing the number of the children ZNodes?

That is already printed. Please check the example output in the jira ticket. Or you would like it presented in some other way?

}

public void run(String snapshotFileName, String startingNode, int maxDepth) throws IOException {
File snapshotFile = new File(snapshotFileName);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATH_TRAVERSAL_IN: This API (java/io/File.(Ljava/lang/String;)V) reads a file whose location might be specified by user input

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonatype-lift ignore

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recorded this as ignored for this pull request. If you change your mind, just comment @sonatype-lift unignore.

DataTree dataTree = new DataTree();
Map<Long,Integer> sessions = new HashMap<Long,Integer>();

fileSnap.deserialize(dataTree, sessions, ia);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RESOURCE_LEAK: resource of type java.io.DataInputStream acquired by call to getArchive(...) at line 69 is not released after line 76.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this warning message: the inputStream is defined in the try() clause which means it will be automatically released at the end of the block. What's wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryInputArchive.getArchive(is) creates a new DataInputStream based on the inputStream. The try() only closes the inputStream, the DataInputStream is technically never closed. I see this as a non-issue and this is how SnapshotFormatter is written too, so it should be fine like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it's a standalone tool. You can ignore that.

# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2164: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can ignore this.

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
else
. "$ZOOBINDIR"/zkEnv.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC1091: Not following: ./zkEnv.sh was not specified as input (see shellcheck -x).

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC1091: Not following: ./../libexec/zkEnv.sh was not specified as input (see shellcheck -x).

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could try adding:

# shellcheck source=somefile

Based on: https://github.com/koalaman/shellcheck/wiki/SC1091

Otherwise feel free to ignore this one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anmolnar I'm not sure adding shellcheck is helpful here. The libexec folder does not exist at this point so we can not verify if it has the file. I could add # shellcheck source=/dev/null but that basically circumvents the shellcheck and at that point I would prefer not to touch the script at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean

# shellcheck source=/bin/zkEnv.sh

Don't waste too much time on this, feel free to ignore the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, adding it.

. "$ZOOBINDIR"/zkEnv.sh
fi

"$JAVA" -cp "$CLASSPATH" $JVMFLAGS \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SC2086: Double quote to prevent globbing and word splitting.

Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sonatype-lift ignoreall

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ignoreall command is active on this PR, all the existing Lift issues are ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding double-quotes to $JVMFLAGS instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this

Copy link
Contributor

@symat symat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @BukrosSzabolcs for updating the PR!
Generally it looks great now.

Please fix the spotbugs error reported by the CI, and then I think we are good:

[ERROR] Boxing/unboxing to parse a primitive org.apache.zookeeper.server.SnapshotRecursiveSummary.main(String[]) [org.apache.zookeeper.server.SnapshotRecursiveSummary] At SnapshotRecursiveSummary.java:[line 57] DM_BOXED_PRIMITIVE_FOR_PARSING

(p.s. you can run checkstyle / spotbugs locally: mvn verify spotbugs:check checkstyle:check -DskipTests)

@BukrosSzabolcs
Copy link
Contributor Author

@symat I have fixed the spotbug issues. The test failure looks unrelated. Should we re-run it?

@symat
Copy link
Contributor

symat commented Jul 12, 2022

@symat I have fixed the spotbug issues. The test failure looks unrelated. Should we re-run it?

I don't think we need to re-run. Seems to be a flaky test (and totally unrelated).

@eolivelli , @anmolnar - can you take a look? This is a handy tool Szabolcs created to identify "heavy ZK users" in snapshots.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
Please try to fix sonatype issues.

DataTree dataTree = new DataTree();
Map<Long,Integer> sessions = new HashMap<Long,Integer>();

fileSnap.deserialize(dataTree, sessions, ia);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this warning message: the inputStream is defined in the try() clause which means it will be automatically released at the end of the block. What's wrong?

# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can ignore this.

ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could try adding:

# shellcheck source=somefile

Based on: https://github.com/koalaman/shellcheck/wiki/SC1091

Otherwise feel free to ignore this one too.

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
else
. "$ZOOBINDIR"/zkEnv.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

. "$ZOOBINDIR"/zkEnv.sh
fi

"$JAVA" -cp "$CLASSPATH" $JVMFLAGS \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding double-quotes to $JVMFLAGS instead?

@symat
Copy link
Contributor

symat commented Jul 13, 2022

I'm merging it now to master. Thanks @BukrosSzabolcs for the contribution!!

@asfgit asfgit closed this in 05b2159 Jul 13, 2022
anuragmadnawat1 pushed a commit to anuragmadnawat1/zookeeper that referenced this pull request Nov 2, 2022
add a tool to recursively collect and display child count and data
stored in sub-trees

Author: Szabolcs Bukros <szabolcs@cloudera.com>

Reviewers: Andor Molnar <andor@apache.org>, Mate Szalay-Beko <symat@apache.org>

Closes apache#1902 from BukrosSzabolcs/ZOOKEEPER-4566
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants