Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script #13342

Merged
merged 1 commit into from Dec 3, 2020

Conversation

kennknowles
Copy link
Member

This updates the beam linkage check script to be simpler to use and also removes incorrect assumptions. Now you just pass the refs you want to compare and that is that. A few minor fixes, too.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.


if [ ! -z "$1" ]; then
ARTIFACTS=$1
DEFAULT_ARTIFACT_LISTS=" \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we loop over this. But the gradle property uses a comma-separated list of artifacts. Is this just a mistake or is the idea to have a bunch of separate lists that are checked separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comman-separated list: Linkage Checker analyzes a dependency graph generated from the list of artifact, as if a project depends on the list of artifacts. For example, running Linkage Checker for "io.grpc:grpc-context:1.23.0,com.google.guava:guava:28.0-jre", checks a dependency graph of a pseudo project that depends on the two artifacts.

Space-separated list: different invocation of Linkage Checker.

I will improve documentation / comments regarding this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand what the comma-separated list and space-separated list do. I just don't understand why we use both. I trust that there is a good reason, but I do not know it. Like if you had two subsets of artifacts that are not expected to work together?

I would expect that we do want to run all the GCP modules together with the core SDK and, separately, run all the hadoop modules together with the core SDK, etc. Or maybe we just run them all together.

So my proposal would be to drop the space-separated and only keep the comma-separated logic. But like I said, I trust there is a reason. So I am just making this proposal so you can tell me why it does not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why we use both.

In general, running Linkage Check with multiple artifact may hide linkage errors that exist in dependencies of the one of the individual artifacts. (This is the reason to use space-separated list.)

On the other hand, running Linkage Checker with multiple artifacts is quicker than running Linkage Checker for individual artifacts one by one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR. I see this PR is not dropping the space-separated list.

@kennknowles
Copy link
Member Author

R: @suztomo

Especially curious if you see the question about having a bash for loop vs doing the linkage check on a list of artifacts. Do we need both?

if [ ! -z "$(git diff)" ]; then
echo "Uncommited change detected. Please commit changes and ensure 'git diff' is empty."
exit 1
fi

STARTING_REF=$(git rev-parse --abbrev-ref HEAD)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so it can be run from a different branch than either of the changes. For example I used this to test this branch. More generally, this is helpful to test any changes to the script. We can run them against known other refs with changes.

fi

if [ "$BRANCH_NAME" = "master" ]; then
echo "Please run this script on a branch other than master"
if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these should actually get rev-parse to check that they are different hashes.

echo "Please run this script in the Beam project root:"
echo " /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
exit 1
echo "Directory 'buildSrc' not found. Please run this script from the root directory of a clone of the Beam git repo."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also possibly do a smarter check using git status or ./gradlew model


BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
for ARTIFACT_LIST in $ARTIFACT_LISTS; do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a space-separated list of comma-separated lists seems weird. I kept it, but maybe we can simplify?

if [ "$MODE" = "validate" ]; then
echo "`date`:" "Done: ${RESULT}"
ACCUMULATED_RESULT=$((ACCUMULATED_RESULT | RESULT))
fi
done
}

echo "Establishing baseline linkage for $(git rev-parse --abbrev-ref $BASELINE_REF)"
git -c advice.detachedHead=false checkout $BASELINE_REF
runLinkageCheck baseline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only nontrivial shared code between runLinkageCheck baseline and runLinkageCheck validate is the loop over the artifact lists. So if we didn't need the loop, we may as well just inline the functions.

for ARTIFACT_LIST in $ARTIFACT_LISTS; do
echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}"

BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner if the checkLinkage task just put something in build/reports/linkageChecker and this script copied it out to a temp dir for the purpose of comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you elaborate on the benefit of using a temp dir, rather than the build/linkagecheck directory?

Current:

  • checkJavaLinkage with baseline mode writes the baseline XML to build/linkagecheck.
  • checkJavaLinkage with validate mode reads the baseline XML to filter existing errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a stylistic "nit" really. My expectation of gradle commands is that they operate on the current source tree. There are exceptions, sure. But my thinking was:

  • :checkJavaLinkage always writes to build/linkagecheck and the report is the linkage report for the current state of the worktree.
  • beam-linkage-check.sh operates at level above that. It produces the above file for two separate states of the source tree and compares them.

It is not that important. Just recording my thoughts so we can discuss the script a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation. The setup came from my preference of handling XML files in Java rather than in shell scripts. (The operation is simple. Linkage Checker is just calculating the difference of two Sets.)


if [ ! -z "$1" ]; then
ARTIFACTS=$1
DEFAULT_ARTIFACT_LISTS=" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comman-separated list: Linkage Checker analyzes a dependency graph generated from the list of artifact, as if a project depends on the list of artifacts. For example, running Linkage Checker for "io.grpc:grpc-context:1.23.0,com.google.guava:guava:28.0-jre", checks a dependency graph of a pseudo project that depends on the two artifacts.

Space-separated list: different invocation of Linkage Checker.

I will improve documentation / comments regarding this.

fi

if [ "$BRANCH_NAME" = "master" ]; then
echo "Please run this script on a branch other than master"
if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice enhancement.

Comment on lines +74 to +77
function cleanup() {
git checkout $STARTING_REF
}
trap cleanup EXIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@iemejia
Copy link
Member

iemejia commented Nov 18, 2020

It seems @suztomo is the perfect reviewer here and you are already advancing on this so I -me from the review. The approval of Tomo is enough for this to be merged.

@iemejia iemejia removed their request for review November 18, 2020 09:58

if [ ! -z "$1" ]; then
ARTIFACTS=$1
DEFAULT_ARTIFACT_LISTS=" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR. I see this PR is not dropping the space-separated list.

@kennknowles
Copy link
Member Author

Thanks! Thanks for answering my questions, too!

@kennknowles kennknowles merged commit d70dd01 into apache:master Dec 3, 2020
@kennknowles kennknowles deleted the linkage-check-fixes branch December 3, 2020 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants