Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cloud Bigtable Change Stream integration tests #29127

Merged
merged 8 commits into from
Oct 31, 2023

Conversation

tonytanger
Copy link
Contributor

@tonytanger tonytanger commented Oct 24, 2023

Add a basic set of integration tests for Cloud Bigtable Change Stream.

Add functionality

  • to customize the Bigtable instance name for integration tests.
  • to change the endpoint of Bigtable services under test.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Change-Id: I68a877d5686f1898686b18491c6b4aff5e699862
Change-Id: I490fca7ba2f24b15faa288d6f2c3f209db59f948
… set instanceId

Change-Id: I4d5e5347dbba09a0e1ee170b8aa911d2f0a772ef
…set it explicitly prior to this

Change-Id: I5ee3c663d3120ee85970ae5f24a962b9535323b3
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test. Had a few comment about the overall structure/scope of the change. Not yet reviewed the test (preferably bigtable team could review)

@@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions {
String getInstanceId();

void setInstanceId(String value);

@Description("Project for Bigtable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this file has been moved back and forth: 4833b3f
what is the consideration that moving this back to main?

Also, what is the context that removing "setBigtableProject" option? In theory the project that Bigtable instance locates can be different from the project used to run Dataflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few reasons.

  1. I'm moving it so I can register it in GcpIoPipelineOptionsRegistrar.java for reasons explained in the comment below.
  2. I'm removing setBigtableProject because of the reason below, we can't actually set bigtableProject to pass to the tests anyways. So I don't think anything is depending on this so it should be safely removable.
  3. By registering bigtableProject, it conflicts with some of the tests we have because they also try to register bigtableProject. So it was either remove here or change it in the tests.
  4. I took a brief look at firestore and spanner, neither has an option to set a different project. So it doesn't seem like it's a common use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)

in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.

If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.

ack for removal of setBigtableProject

@@ -38,6 +39,7 @@ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() {
.add(PubsubOptions.class)
.add(FirestoreOptions.class)
.add(TestBigQueryOptions.class)
.add(BigtableTestOptions.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only place that refers to BigtableTestOption class, it's not necessary to move the Option class. We can register the option in the test class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's an alternate solution. I want to be able to pass in "instanceId" as a parameter to the IT. But gcp project defines integrationTest to explicitly take a few parameters and pass them to beam beamTestPipelineOptions. So I added instanceId. But, unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.

So this ended up being the solution. Happy to try something else.

Copy link
Contributor

@Abacn Abacn Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In each IT class, one can get a TestPipelineOptions instance like this

private static PipelineOptions testOptions = TestPipeline.testingPipelineOptions();

and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.

options = TestPipeline.testingPipelineOptions().as(BigQueryIOJsonOptions.class);

No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.

https://github.com/apache/beam/blob/ba0259de7f9f2d9e06108de20ef57b16ef8e3cc4/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/changestreams/it/BigtableChangeStreamIT.java#L80C5-L80C12

unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.

That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.

@@ -144,4 +146,24 @@ private static Cell createCell(ByteString value, long timestamp, String... label
}
return builder.build();
}

public static BigtableIO.ReadChangeStream buildTestPipelineInput(
Copy link
Contributor

@Abacn Abacn Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the only method that org.apache.beam.sdk.io.gcp.bigtable.changestreams.it refers to, it is not necessary to change the Util to public class. Just spin up the pipeline in place or create BigtableChangeStreamTestUtils in the changestream package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem, which we (bigtable team) dug ourselves into, is that endTime is a package private field. We don't want to expose endTime as an option because it's generally not well supported. But endTime is useful to terminate tests.

So we have to create the pipeline in the same package as BigtableIO so we actually can't create the pipeline with endTime in the changestreams package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. If there are other options needed then good to promote the scope of this Util class

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@tonytanger
Copy link
Contributor Author

@jackdingilian to review the integration tests.

…package

Change-Id: I228dc61ca0b27131cd38a3ab24f136d0f924d9f7
Copy link
Contributor

@jackdingilian jackdingilian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests LGTM

dataSettingsBuilder.setInstanceId(instanceId);
tableAdminSettingsBuilder.setInstanceId(instanceId);
dataSettingsBuilder.setAppProfileId(appProfileId);
// TODO: Remove this later. But for now, disable direct path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this? We should get this from the default config now I think, but I don't remember 100% why it was necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still disable direct path in BigtableChangeStreamAccessor. So it would make sense to keep it disabled when we write/read from bigtable.

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.
R: @ahmedabu98 for label io.
R: @mutianf for label bigtable.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@tonytanger
Copy link
Contributor Author

@Abacn Could you please take a look at the structure again. And if my responses make sense. Happy to make changes. Jack from Cloud Bigtable approved the test content.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, commented about the context of why we wanted test options stay in test and possible solutions to keep the TestOption in test scope. Actually FIreStoreOptions is an antipattern (see above) and deemed change

@@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions {
String getInstanceId();

void setInstanceId(String value);

@Description("Project for Bigtable")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)

in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.

If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.

ack for removal of setBigtableProject

@@ -38,6 +39,7 @@ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() {
.add(PubsubOptions.class)
.add(FirestoreOptions.class)
.add(TestBigQueryOptions.class)
.add(BigtableTestOptions.class)
Copy link
Contributor

@Abacn Abacn Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In each IT class, one can get a TestPipelineOptions instance like this

private static PipelineOptions testOptions = TestPipeline.testingPipelineOptions();

and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.

options = TestPipeline.testingPipelineOptions().as(BigQueryIOJsonOptions.class);

No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.

https://github.com/apache/beam/blob/ba0259de7f9f2d9e06108de20ef57b16ef8e3cc4/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/changestreams/it/BigtableChangeStreamIT.java#L80C5-L80C12

unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.

That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.

@@ -144,4 +146,24 @@ private static Cell createCell(ByteString value, long timestamp, String... label
}
return builder.build();
}

public static BigtableIO.ReadChangeStream buildTestPipelineInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. If there are other options needed then good to promote the scope of this Util class

systemProperty "beamTestPipelineOptions", JsonOutput.toJson([
"--runner=DirectRunner",
"--project=${gcpProject}",
"--tempRoot=${gcpTempRoot}",
"--firestoreDb=${firestoreDb}",
"--host=${host}",
"--instanceId=${instanceId}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abacn Want to bring your attention here. I want to be able to run something like this

./gradlew :sdks:java:io:google-cloud-platform:integrationTest --tests=org.apache.beam.sdk.io.gcp.bigtable.changestreams.it.BigtableChangeStreamIT -PgcpProject=my-test-project -PinstanceId=my-test-instance

BigtableTestOptions has a field instanceId but in order to pass a specific string that's not the default string, the only that I found to do it is by adding it here.

This works fine for ChangeStreamIT because it registers BigtableTestOptions which defines instanceId, so we can pass --instanceId to beamTestPipelineOptions. However, all other integration tests that do not register BigtableTestOptions will fail because we are passing --instanceId to beamTestPipelineOptions but they don't have an option that reigsters instanceId.

That's why I moved the BigtableTestOptions to the main directory and registered it by default.

Is there an alternative way to pass instanceId to my test that does not require explicitly passing instanceId here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine for ChangeStreamIT because it registers BigtableTestOptions which defines instanceId, so we can pass --instanceId to beamTestPipelineOptions. However, all other integration tests that do not register BigtableTestOptions will fail because we are passing --instanceId to beamTestPipelineOptions but they don't have an option that reigsters instanceId.

Thanks, understand. For irrelevant test this was due to --instanceId pipeline option not recognized, as you stated. This happens because currently all gcp integration tests are triggered by a single gradle command. Restructure test is a more substantial change. For now, we can keep it in main if this is the only approach to make it work, but consider to make the option specialized, like

  • use "--bigtableInstanceId" instanceId is too generic. The options in main scope will affect all users including customers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

…ecifying

Change-Id: Ic66c4c061ed2f5979f6a530905e3cbbddd238f18
…es to avoid conflicts

Change-Id: I489850e07812058e8c7ebb3c9878eae9d4bc9f06
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Abacn Abacn merged commit 9fcd0c9 into apache:master Oct 31, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants