Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-11260][BEAM-11261] Remove inappropriate assumptions about repo from linkage check script #13342

Merged
merged 1 commit into from Dec 3, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
80 changes: 45 additions & 35 deletions sdks/java/build-tools/beam-linkage-check.sh
Expand Up @@ -36,46 +36,60 @@ set -o pipefail
set -e

# These default artifacts are common causes of linkage errors.
ARTIFACTS="beam-sdks-java-core
beam-sdks-java-io-google-cloud-platform
beam-runners-google-cloud-dataflow-java
beam-sdks-java-io-hadoop-format"

if [ ! -z "$1" ]; then
ARTIFACTS=$1
DEFAULT_ARTIFACT_LISTS=" \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we loop over this. But the gradle property uses a comma-separated list of artifacts. Is this just a mistake or is the idea to have a bunch of separate lists that are checked separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comman-separated list: Linkage Checker analyzes a dependency graph generated from the list of artifact, as if a project depends on the list of artifacts. For example, running Linkage Checker for "io.grpc:grpc-context:1.23.0,com.google.guava:guava:28.0-jre", checks a dependency graph of a pseudo project that depends on the two artifacts.

Space-separated list: different invocation of Linkage Checker.

I will improve documentation / comments regarding this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand what the comma-separated list and space-separated list do. I just don't understand why we use both. I trust that there is a good reason, but I do not know it. Like if you had two subsets of artifacts that are not expected to work together?

I would expect that we do want to run all the GCP modules together with the core SDK and, separately, run all the hadoop modules together with the core SDK, etc. Or maybe we just run them all together.

So my proposal would be to drop the space-separated and only keep the comma-separated logic. But like I said, I trust there is a reason. So I am just making this proposal so you can tell me why it does not work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't understand why we use both.

In general, running Linkage Check with multiple artifact may hide linkage errors that exist in dependencies of the one of the individual artifacts. (This is the reason to use space-separated list.)

On the other hand, running Linkage Checker with multiple artifacts is quicker than running Linkage Checker for individual artifacts one by one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this PR. I see this PR is not dropping the space-separated list.

beam-sdks-java-core \
beam-sdks-java-io-google-cloud-platform \
beam-runners-google-cloud-dataflow-java \
beam-sdks-java-io-hadoop-format \
"

BASELINE_REF=$1
PROPOSED_REF=$2
ARTIFACT_LISTS=$3

if [ -z "$ARTIFACT_LISTS" ]; then
ARTIFACT_LISTS=$DEFAULT_ARTIFACT_LISTS
fi

BRANCH_NAME=$(git symbolic-ref --short HEAD)
if [ -z "$BASELINE_REF" ] || [ -z "$PROPOSED_REF" ] || [ -z "$ARTIFACT_LISTS" ] ; then
echo "Usage: $0 <baseline ref> <proposed ref> [artifact lists]"
exit 1
fi

if [ ! -d buildSrc ]; then
echo "Please run this script in the Beam project root:"
echo " /bin/bash sdks/java/build-tools/beam-linkage-check.sh"
exit 1
echo "Directory 'buildSrc' not found. Please run this script from the root directory of a clone of the Beam git repo."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also possibly do a smarter check using git status or ./gradlew model

fi

if [ "$BRANCH_NAME" = "master" ]; then
echo "Please run this script on a branch other than master"
if [ "$BASELINE_REF" = "$PROPOSED_REF" ]; then
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these should actually get rev-parse to check that they are different hashes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice enhancement.

echo "Baseline and proposed refs are identical; cannot compare their linkage errors!"
exit 1
fi

OUTPUT_DIR=build/linkagecheck
mkdir -p $OUTPUT_DIR

if [ ! -z "$(git diff)" ]; then
echo "Uncommited change detected. Please commit changes and ensure 'git diff' is empty."
exit 1
fi

STARTING_REF=$(git rev-parse --abbrev-ref HEAD)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so it can be run from a different branch than either of the changes. For example I used this to test this branch. More generally, this is helpful to test any changes to the script. We can run them against known other refs with changes.

function cleanup() {
git checkout $STARTING_REF
}
trap cleanup EXIT
Comment on lines +74 to +77
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


echo "Comparing linkage of artifact lists $ARTIFACT_LISTS using baseline $BASELINE_REF and proposal $PROPOSED_REF"

OUTPUT_DIR=build/linkagecheck
mkdir -p $OUTPUT_DIR

ACCUMULATED_RESULT=0

function runLinkageCheck () {
COMMIT=$1
BRANCH=$2
MODE=$3 # "baseline" or "validate"
for ARTIFACT in $ARTIFACTS; do
echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT} in ${BRANCH}"
MODE=$1 # "baseline" or "validate"

BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT}.xml
for ARTIFACT_LIST in $ARTIFACT_LISTS; do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a space-separated list of comma-separated lists seems weird. I kept it, but maybe we can simplify?

echo "`date`:" "Running linkage check (${MODE}) for ${ARTIFACT_LISTS}"

BASELINE_FILE=${OUTPUT_DIR}/baseline-${ARTIFACT_LIST}.xml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cleaner if the checkLinkage task just put something in build/reports/linkageChecker and this script copied it out to a temp dir for the purpose of comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you elaborate on the benefit of using a temp dir, rather than the build/linkagecheck directory?

Current:

  • checkJavaLinkage with baseline mode writes the baseline XML to build/linkagecheck.
  • checkJavaLinkage with validate mode reads the baseline XML to filter existing errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a stylistic "nit" really. My expectation of gradle commands is that they operate on the current source tree. There are exceptions, sure. But my thinking was:

  • :checkJavaLinkage always writes to build/linkagecheck and the report is the linkage report for the current state of the worktree.
  • beam-linkage-check.sh operates at level above that. It produces the above file for two separate states of the source tree and compares them.

It is not that important. Just recording my thoughts so we can discuss the script a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explanation. The setup came from my preference of handling XML files in Java rather than in shell scripts. (The operation is simple. Linkage Checker is just calculating the difference of two Sets.)

if [ "$MODE" = "baseline" ]; then
BASELINE_OPTION="-PjavaLinkageWriteBaseline=${BASELINE_FILE}"
echo "`date`:" "to create a baseline (existing errors before change) $BASELINE_FILE"
Expand All @@ -88,29 +102,25 @@ function runLinkageCheck () {
fi

set +e
./gradlew -Ppublishing -PjavaLinkageArtifactIds=$ARTIFACT ${BASELINE_OPTION} :checkJavaLinkage
set -x
./gradlew -Ppublishing -PskipCheckerFramework -PjavaLinkageArtifactIds=$ARTIFACT_LIST ${BASELINE_OPTION} :checkJavaLinkage
RESULT=$?
set -e
set +x
if [ "$MODE" = "validate" ]; then
echo "`date`:" "Done: ${RESULT}"
ACCUMULATED_RESULT=$((ACCUMULATED_RESULT | RESULT))
fi
done
}

echo "Establishing baseline linkage for $(git rev-parse --abbrev-ref $BASELINE_REF)"
git -c advice.detachedHead=false checkout $BASELINE_REF
runLinkageCheck baseline
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only nontrivial shared code between runLinkageCheck baseline and runLinkageCheck validate is the loop over the artifact lists. So if we didn't need the loop, we may as well just inline the functions.


BRANCH_NAME=`git rev-parse --abbrev-ref HEAD`
BRANCH_COMMIT=`git rev-parse --short=8 HEAD`

git fetch
MASTER_COMMIT=`git rev-parse --short=8 origin/master`
git -c advice.detachedHead=false checkout $MASTER_COMMIT
runLinkageCheck $MASTER_COMMIT master baseline


# Restore original branch
git checkout $BRANCH_NAME
runLinkageCheck $BRANCH_COMMIT $BRANCH_NAME validate
echo "Checking linkage for $(git rev-parse --abbrev-ref $PROPOSED_REF)"
git -c advice.detachedHEad=false checkout $PROPOSED_REF
runLinkageCheck validate

if [ "${ACCUMULATED_RESULT}" = "0" ]; then
echo "No new linkage errors"
Expand Down