New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script #13342
Conversation
|
||
if [ ! -z "$1" ]; then | ||
ARTIFACTS=$1 | ||
DEFAULT_ARTIFACT_LISTS=" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we loop over this. But the gradle property uses a comma-separated list of artifacts. Is this just a mistake or is the idea to have a bunch of separate lists that are checked separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comman-separated list: Linkage Checker analyzes a dependency graph generated from the list of artifact, as if a project depends on the list of artifacts. For example, running Linkage Checker for "io.grpc:grpc-context:1.23.0,com.google.guava:guava:28.0-jre", checks a dependency graph of a pseudo project that depends on the two artifacts.
Space-separated list: different invocation of Linkage Checker.
I will improve documentation / comments regarding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand what the comma-separated list and space-separated list do. I just don't understand why we use both. I trust that there is a good reason, but I do not know it. Like if you had two subsets of artifacts that are not expected to work together?
I would expect that we do want to run all the GCP modules together with the core SDK and, separately, run all the hadoop modules together with the core SDK, etc. Or maybe we just run them all together.
So my proposal would be to drop the space-separated and only keep the comma-separated logic. But like I said, I trust there is a reason. So I am just making this proposal so you can tell me why it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't understand why we use both.
In general, running Linkage Check with multiple artifact may hide linkage errors that exist in dependencies of the one of the individual artifacts. (This is the reason to use space-separated list.)
On the other hand, running Linkage Checker with multiple artifacts is quicker than running Linkage Checker for individual artifacts one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR. I see this PR is not dropping the space-separated list.
R: @suztomo Especially curious if you see the question about having a bash for loop vs doing the linkage check on a list of artifacts. Do we need both? |
if [ ! -z "$(git diff)" ]; then | ||
echo "Uncommited change detected. Please commit changes and ensure 'git diff' is empty." | ||
exit 1 | ||
fi | ||
|
||
STARTING_REF=$(git rev-parse --abbrev-ref HEAD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so it can be run from a different branch than either of the changes. For example I used this to test this branch. More generally, this is helpful to test any changes to the script. We can run them against known other refs with changes.
fi | ||
|
||
if [ "$BRANCH_NAME" = "master" ]; then | ||
echo "Please run this script on a branch other than master" | ||
if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess these should actually get rev-parse
to check that they are different hashes.
echo "Please run this script in the Beam project root:" | ||
echo " /bin/bash sdks/java/build-tools/beam-linkage-check.sh" | ||
exit 1 | ||
echo "Directory 'buildSrc' not found. Please run this script from the root directory of a clone of the Beam git repo." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also possibly do a smarter check using git status
or ./gradlew model
|
||
BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml | ||
for ARTIFACT_LIST in $ARTIFACT_LISTS; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a space-separated list of comma-separated lists seems weird. I kept it, but maybe we can simplify?
if [ "$MODE" = "validate" ]; then | ||
echo "`date`:" "Done: ${RESULT}" | ||
ACCUMULATED_RESULT=$((ACCUMULATED_RESULT | RESULT)) | ||
fi | ||
done | ||
} | ||
|
||
echo "Establishing baseline linkage for $(git rev-parse --abbrev-ref $BASELINE_REF)" | ||
git -c advice.detachedHead=false checkout $BASELINE_REF | ||
runLinkageCheck baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only nontrivial shared code between runLinkageCheck baseline
and runLinkageCheck validate
is the loop over the artifact lists. So if we didn't need the loop, we may as well just inline the functions.
for ARTIFACT_LIST in $ARTIFACT_LISTS; do | ||
echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}" | ||
|
||
BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be cleaner if the checkLinkage
task just put something in build/reports/linkageChecker
and this script copied it out to a temp dir for the purpose of comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you elaborate on the benefit of using a temp dir, rather than the build/linkagecheck
directory?
Current:
- checkJavaLinkage with baseline mode writes the baseline XML to
build/linkagecheck
. - checkJavaLinkage with validate mode reads the baseline XML to filter existing errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a stylistic "nit" really. My expectation of gradle commands is that they operate on the current source tree. There are exceptions, sure. But my thinking was:
:checkJavaLinkage
always writes tobuild/linkagecheck
and the report is the linkage report for the current state of the worktree.beam-linkage-check.sh
operates at level above that. It produces the above file for two separate states of the source tree and compares them.
It is not that important. Just recording my thoughts so we can discuss the script a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for explanation. The setup came from my preference of handling XML files in Java rather than in shell scripts. (The operation is simple. Linkage Checker is just calculating the difference of two Set
s.)
|
||
if [ ! -z "$1" ]; then | ||
ARTIFACTS=$1 | ||
DEFAULT_ARTIFACT_LISTS=" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comman-separated list: Linkage Checker analyzes a dependency graph generated from the list of artifact, as if a project depends on the list of artifacts. For example, running Linkage Checker for "io.grpc:grpc-context:1.23.0,com.google.guava:guava:28.0-jre", checks a dependency graph of a pseudo project that depends on the two artifacts.
Space-separated list: different invocation of Linkage Checker.
I will improve documentation / comments regarding this.
fi | ||
|
||
if [ "$BRANCH_NAME" = "master" ]; then | ||
echo "Please run this script on a branch other than master" | ||
if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice enhancement.
function cleanup() { | ||
git checkout $STARTING_REF | ||
} | ||
trap cleanup EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
It seems @suztomo is the perfect reviewer here and you are already advancing on this so I -me from the review. The approval of Tomo is enough for this to be merged. |
|
||
if [ ! -z "$1" ]; then | ||
ARTIFACTS=$1 | ||
DEFAULT_ARTIFACT_LISTS=" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR. I see this PR is not dropping the space-separated list.
Thanks! Thanks for answering my questions, too! |
This updates the beam linkage check script to be simpler to use and also removes incorrect assumptions. Now you just pass the refs you want to compare and that is that. A few minor fixes, too.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.