-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cloud Bigtable Change Stream integration tests #29127
Changes from 5 commits
cf6e2c7
f7f4161
1196ab0
b092b37
ba0259d
38305e6
1d186d3
19802eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions { | |
String getInstanceId(); | ||
|
||
void setInstanceId(String value); | ||
|
||
@Description("Project for Bigtable") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting that this file has been moved back and forth: 4833b3f Also, what is the context that removing "setBigtableProject" option? In theory the project that Bigtable instance locates can be different from the project used to run Dataflow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A few reasons.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment) in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion. If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level. ack for removal of |
||
@Default.String("") | ||
String getBigtableProject(); | ||
|
||
void setBigtableProject(String value); | ||
} |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -21,6 +21,7 @@ | |||||
import org.apache.beam.sdk.annotations.Internal; | ||||||
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions; | ||||||
import org.apache.beam.sdk.io.gcp.bigquery.TestBigQueryOptions; | ||||||
import org.apache.beam.sdk.io.gcp.bigtable.BigtableTestOptions; | ||||||
import org.apache.beam.sdk.io.gcp.firestore.FirestoreOptions; | ||||||
import org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions; | ||||||
import org.apache.beam.sdk.options.PipelineOptions; | ||||||
|
@@ -38,6 +39,7 @@ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() { | |||||
.add(PubsubOptions.class) | ||||||
.add(FirestoreOptions.class) | ||||||
.add(TestBigQueryOptions.class) | ||||||
.add(BigtableTestOptions.class) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is the only place that refers to BigtableTestOption class, it's not necessary to move the Option class. We can register the option in the test class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe there's an alternate solution. I want to be able to pass in "instanceId" as a parameter to the IT. But gcp project defines So this ended up being the solution. Happy to try something else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In each IT class, one can get a TestPipelineOptions instance like this Line 69 in 8a28100
and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g. Line 372 in 8a28100
No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.
That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail. |
||||||
.build(); | ||||||
} | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,12 +31,14 @@ | |
import com.google.bigtable.v2.Mutation; | ||
import com.google.protobuf.ByteString; | ||
import java.util.List; | ||
import org.apache.beam.sdk.io.gcp.bigtable.changestreams.dao.BigtableClientOverride; | ||
import org.apache.beam.sdk.schemas.Schema; | ||
import org.apache.beam.sdk.values.KV; | ||
import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.collect.ImmutableList; | ||
import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.primitives.Longs; | ||
import org.joda.time.Instant; | ||
|
||
class BigtableTestUtils { | ||
public class BigtableTestUtils { | ||
|
||
static final String BOOL_COLUMN = "boolColumn"; | ||
static final String LONG_COLUMN = "longColumn"; | ||
|
@@ -144,4 +146,27 @@ private static Cell createCell(ByteString value, long timestamp, String... label | |
} | ||
return builder.build(); | ||
} | ||
|
||
// We have to build the pipeline at this package level and not changestreams package because | ||
// endTime is package private and we can only create a pipeline with endTime here. Setting endTime | ||
// allows the tests to predictably terminate. | ||
public static BigtableIO.ReadChangeStream buildTestPipelineInput( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if this is the only method that org.apache.beam.sdk.io.gcp.bigtable.changestreams.it refers to, it is not necessary to change the Util to public class. Just spin up the pipeline in place or create BigtableChangeStreamTestUtils in the changestream package. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem, which we (bigtable team) dug ourselves into, is that So we have to create the pipeline in the same package as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack. If there are other options needed then good to promote the scope of this Util class |
||
String projectId, | ||
String instanceId, | ||
String tableId, | ||
String appProfileId, | ||
String metadataTableName, | ||
Instant startTime, | ||
Instant endTime, | ||
BigtableClientOverride clientOverride) { | ||
return BigtableIO.readChangeStream() | ||
.withProjectId(projectId) | ||
.withInstanceId(instanceId) | ||
.withTableId(tableId) | ||
.withAppProfileId(appProfileId) | ||
.withMetadataTableTableId(metadataTableName) | ||
.withStartTime(startTime) | ||
.withEndTime(endTime) | ||
.withBigtableClientOverride(clientOverride); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Abacn Want to bring your attention here. I want to be able to run something like this
BigtableTestOptions
has a fieldinstanceId
but in order to pass a specific string that's not the default string, the only that I found to do it is by adding it here.This works fine for ChangeStreamIT because it registers
BigtableTestOptions
which definesinstanceId
, so we can pass--instanceId
tobeamTestPipelineOptions
. However, all other integration tests that do not registerBigtableTestOptions
will fail because we are passing--instanceId
tobeamTestPipelineOptions
but they don't have an option that reigstersinstanceId
.That's why I moved the BigtableTestOptions to the main directory and registered it by default.
Is there an alternative way to pass
instanceId
to my test that does not require explicitly passinginstanceId
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, understand. For irrelevant test this was due to
--instanceId
pipeline option not recognized, as you stated. This happens because currently all gcp integration tests are triggered by a single gradle command. Restructure test is a more substantial change. For now, we can keep it in main if this is the only approach to make it work, but consider to make the option specialized, likeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.