[FLINK-3229] Flink streaming consumer for AWS Kinesis #1911

tzulitai · 2016-04-19T10:17:02Z

I've been using this consumer for a while in off-production environments.
I understand we should have good test coverage for each PR, but since Kinesis is a hosted service, reliable integration tests are hard to pull off. To speed up merging Kinesis connector for the next release, I'm submitting the consumer now for some early reviews.
On the other hand, since @rmetzger is submitting a separate PR for Kinesis producer, I'd like to postpone writing more tests for the consumer, as well as corresponding modification to the document until both the consumer and producer are in place.

rmetzger · 2016-04-19T10:23:33Z

Thank you for opening a pull request for the consumer.
How about we proceed like this:

I'm trying to get the producer code merged within the next 24 hours (feel free to test it a bit if you want)
In the meantime, I'm testing and reviewing your code
Once the producer has been merged, we integrate the consumer code into the maven module / code structure from my producer code.
I'll review the consumer again and we merge it ;)

tzulitai · 2016-04-19T10:27:31Z

@rmetzger Sure, that seems reasonable. I'll wait until the producer is merged and resubmit a new PR for the integrated consumer :)

rmetzger · 2016-04-19T10:28:52Z

Cool. You don't need to resubmit a new PR. By pushing new commits to your FLINK-3229 branch, the pull request will update automatically.

rmetzger · 2016-04-19T13:56:00Z

...-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java

+
+		this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION, KinesisConfigConstants.DEFAULT_AWS_REGION);
+		AmazonKinesisClient client = new AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials());
+		client.setRegion(Region.getRegion(Regions.fromName(this.regionId)));


I had to set the endpoint here as well to make it use.
Which AWS region were you using? (Maybe there's something like a default endpoint that works for one region)

I'm using the "ap-northeast-1" region, which isn't the default.
Setting the region on the AmazonKinesisClient should set the endpoint too, no?

I thought so too, but it didn't work for me.
I'll investigate the issue further ...

I found out what I was doing wrong. The code was using the default region ID because I forgot to set it.
I'm currently fixing some issues in the consumer and I'll make the region a required argument.

rmetzger · 2016-04-20T07:55:55Z

Quick update on our plan: I've merged the Kinesis producer. If you want, you can rebase this pull request on the current master.

tzulitai · 2016-04-20T08:39:30Z

@rmetzger Hi Robert,
I'm rebasing my PR, but I could not find the merged Kinesis producer / maven module in the current apache/flink master. Please correct me if I'm missing anything. Thanks :)

zentol · 2016-04-20T08:48:20Z

@tzulitai The producer wasn't merged yet.

tzulitai · 2016-04-20T09:07:48Z

ah ok :) see it now, thanks.

rmetzger · 2016-04-20T09:13:18Z

The problem was that the github mirror needed some time to sync with the commit. But now its there.

…rtiesConfig` to protected for testing purposes

…indentations

…aming.connectors.kinesis.serialization package

tzulitai · 2016-04-20T10:42:59Z

Quick update:

Rebased and integrated the consumer code into the maven module that came with the producer merge.
Appended documentation for the consumer.
Moved the producer's KinesisSerializationSchema into org.apache.flink.streaming.connectors.kinesis.serialization package, where I originally placed deserialization related classes for the consumer.

rmetzger · 2016-04-20T16:35:56Z

Great, thank you. I'll review the PR soon.

rmetzger · 2016-04-20T16:38:48Z

flink-streaming-connectors/pom.xml

@@ -45,6 +45,7 @@ under the License.
 		<module>flink-connector-rabbitmq</module>
 		<module>flink-connector-twitter</module>
 		<module>flink-connector-nifi</module>
+		<module>flink-connector-kinesis</module>


I think we have to remove this line again. The module is included in the profile below (you have to activate the "include-kinesis" maven build profile)

Thanks, I missed the "include-kinesis" profile defined below. We'll probably need a more general profile name in the future though (ex. include-aws-connectors), for example when we start including more Amazon licensed libraries for other connectors such as for DynamoDB.

… module (only include when profile "include-kinesis" is activated)

rmetzger · 2016-04-22T12:30:24Z

...rc/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java

+
+						GetRecordsResult getRecordsResult = kinesisProxy.getRecords(nextShardItr, 100);
+
+						final List<Record> fetchedRecords = getRecordsResult.getRecords();


The records returned here might be aggregated: http://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html
So we need to add code to deaggregate them here.

Noted, thanks for pointing this out as I did not realize this. Will include in the implementation.

https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/types/UserRecord.java

@rmetzger It seems like to determine whether or not a record is aggregated, we will need to rely on some protobuf magic. The KCL has implemented this implicitly in the above mentioned code, starting from line 201. I don't really understand the reason for why the AGGREGATED_RECORD_MAGIC is set this way. Should we simply import the KCL too solely to use this class?

It seems that we have to add the KCL as a dependency for this class, yes.
When you add the dependency, can you exclude the dependency to aws-java-sdk-dynamodb? I don't want to pull in too many dependencies ;) (maybe there are more deps we can safely remove)

Sure, no problem. I'll investigate if there are more deps that we can remove along with the new dependency.

Update:
Ended up excluding aws-java-sdk-dynamodb and aws-java-sdk-cloudwatch.

rmetzger · 2016-04-22T12:35:57Z

I could not get the example to work with the current jackson version. Only after upgrading it to 2.7.3 it was working.
Did you test the kinesis consumer using a separate project (adding the flink-kinesis-consumer as a dependency) ?

rmetzger · 2016-04-22T12:43:57Z

...inesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java

+
+		this.deserializer = checkNotNull(deserializer, "deserializer can not be null");
+
+		this.shards = new KinesisProxy(configProps).getShardList(streams);


I wonder if we need to query the list of shards from the client. The problem (also of the Kafka consumer) is that we expect the client to be able to connect to Kinesis.
Imagine somebody submitting a Flink job from their laptop to an EC2 instance running Flink. The laptop would need to be able to access Kinesis as well.

I think we can get the shard list on the parallel tasks as well, and then modulo by the hash of the shard id, so that there is a defined assignment to a parallel worker.
This would also allow us to elegantly handle reshards (if a worker detects a reshard, it queries again for the shard list and assigns the appropriate shards)

My main concern when deciding to implement getting complete shard list at the client is due to Amazon's limitation of the DescribeStream operation (need to use this to access shard list) at 10 transactions per second: http://docs.aws.amazon.com/kinesis/latest/APIReference/API_DescribeStream.html.

Many parallel tasks calling this operation simultaneously for big Kinesis streams might cause issues, which will make it hard to decide on an appropriate backfire time & retry limit for the DescribeStream operation (current default settings in the consumer is 1 second backfire and 3 retry limit).

Other than this concern, I think it will be absolutely fine to implement this only on the parallel tasks. And certainly the implementation will be much cleaner and friendly to future enhancements for Kinesis-side resharding, as you mentioned :)

Oh, that limit is indeed a problem when running Flink with a parallelism higher than 10. Lets leave this then as it is.
Thanks for the explanation.

I'm currently trying to mock the KinesisProxy here for unit testing the whole FlinkKinesisConsumer.
Currently considering two ways of modifying this part to make the class testable:

Introduce a serializable concrete KinesisProxyFactory class which will be instantiated as a default field within FlinkKinesisConsumer. The KinesisProxyFactory will have a non-static method to help create the proxy. This line will this be changed to,
this.shards = this.kinesisProxyFactory.createProxy(configProps).getShardList(streams) which is much easier to mock.

Change this line to,
this.shards = createProxy(configProps).getShardList(streams), where createProxy is a protected class method within FlinkKinesisConsumer that does the actual instantiation of KinesisProxy. In our tests, we can write a TestableFlinkKinesisConsumer that extends FlinkKinesisConsumer and overrides the protected createProxy to return a mocked KinesisProxy.

Downside of approach 1 is that FlinkKinesisConsumer will be carrying an extra KinesisProxyFactory field instance. On the other hand, approach 2 is also quite tedious because TestableFlinkKinesisConsumer will need to also implement dummies for all variants of FlinkKinesisConsumer's constructors ...

Which one might be better? Or is there better solutions besides the above?
I'm not quite familiar with writing good tests, so any suggestion here will be quite helpful!

I would go for approach 1). I think its cleaner for the production code.

Thanks for the advice :)

Update for this:
I ended up leaving this line as it is, and use Power Mock for the tests. I tried approach 1, but still ended up in hard testing since the factory still resides in the constructor which was hard to inject mocks for either the factory or the returned KinesisProxy.

…'s Shard as a field This change also includes, 1) Remove regionName from KinesisStreamShard since we won't be needing it for the consumer as all streams are limited to be in the same AWS region. 2) Introduce access methods getStartingHashKey() and getEndingHashKey(), since this info is carried as well with AWS's Shard. 3) Reimplement ReferenceKinesisShardTopologies in the tests according to the changes to how KinesisStreamShard is instantiated.

1) Update AWS Java SDK version to 1.10.71 2) Add AWS KCL library dependency, exluding dependency for DynamoDB and Cloudwatch 3) Rename property `kinesis-producer.version` to `aws.kinesis-kpl.version` for more consistent property naming pattern with the Java SDK and KCL library.

…PL, should deaggregate after fetching

…ly on AWS SDK responses and not on timeout implementation

… it's purpose

…erializable classes

1) Remove javadoc for non-public facing classes / methods 2) Finish off any WIP javadoc messages 3) Make note in javadocs of some classes of the fact that the AWS libraries have similar implementations

tzulitai · 2016-04-23T12:55:47Z

@rmetzger I've addressed your comments with the latest commits. Thanks in advance for your help on reviewing them :) Please let me know if there is anything else to address!

tzulitai · 2016-04-24T07:32:52Z

I wonder why one of the Travis CI builds is failing on flink-ml? My current FLINK-3229 branch is a few commits behind current apache/master, does it have anything to do with that?

rmetzger · 2016-04-24T08:23:45Z

The build failure is unrelated to your changes. Its just an instability of the testing infrastructure.

…ucerTest from rmetzger/FLINK-3229-pr-after-rework add manual exactly once test

…public class for easier testing

…ink.streaming.connectors.kinesis.manualtests

rmetzger · 2016-04-28T12:04:41Z

I'm currently busy with some other ongoing tasks. I hope to get back to this PR soon.

…asier mocking in tests

Changes include: 1. Consolidate all FlinkKinesisConsumer related tests within FlinkKinesisConsumerTest 2. Add a new TestableFlinkKinesisConsumer that helps with tests in FlinkKinesisConsumerTest 3. Add tests for KinesisDataFetcher and ShardConsumerThread that uses mocked KinesisProxy behaviours

…() call to shards user configurable

…alue configurations are given parsable values

tzulitai · 2016-04-28T18:04:11Z

@rmetzger
No problem :) In the meantime, I've made a few more commits, including more test coverage for the consumer, minor refactoring mainly to ease testing, and user configuration changes.

rmetzger · 2016-05-03T12:28:37Z

I'm currently working on a custom branch based on this pull request.
It seems that we are running into some dependency issues when using the kinesis-connector in AWS EMR.

It seems that there is a clash with the protobuf versions (kinesis needs 2.6.x, but Flink has 2.5.0 in its classpath).

I keep you posted

tzulitai · 2016-05-05T16:26:08Z

Thanks Robert. I'll keep notice of your FLINK-3229-review branch for the changes (I'm assuming your working on FLINK-3229-review for the protobuf problem, please tell me if I'm wrong :))

Also, if there is anything I can do / help with (etc. tests on other environments) to further improve the PR, please don't hesitate to let me know :)

rmetzger · 2016-05-05T16:30:20Z

Yep, that's the right branch.
I tried working on different approaches, but its just an annoying problem with protobuf.
I'll probably work on it tomorrow again.

rmetzger · 2016-05-13T13:30:02Z

As discussed in the JIRA, I'm going to follow the "relocation approach" for fixing the protobuf issue. But we won't release the kinesis connector to mvn central.
In the meantime, we'll try to come up with a better solution regarding the protobuf issue.

This closes apache#1911

rmetzger reviewed Apr 19, 2016
View reviewed changes

tzulitai added 7 commits April 20, 2016 17:17

[FLINK-3229] Initial working version for FlinkKinesisConsumer.

8a948b5

[FLINK-3229] Change access level of assignShards and `validatePrope…

9c30433

…rtiesConfig` to protected for testing purposes

[FLINK-3229] Fix coding stype violations regarding leading spaces in …

d8e1e43

…indentations

[FLINK-3229] Change scope of flink-streaming-java module to provided

b289b68

[FLINK-3229] Basic unit test for stable shard-to-consumer assignment

0043ded

[FLINK-3229] Move KinesisSerializationSchema to org.apache.flink.stre…

215778c

…aming.connectors.kinesis.serialization package

[FLINK-3229] Add user documentation for the consumer

0c991c2

tzulitai force-pushed the FLINK-3229 branch from fc454ef to 0c991c2 Compare April 20, 2016 10:35

rmetzger reviewed Apr 20, 2016
View reviewed changes

[FLINK-3229] Remove duplicate definition of 'flink-connector-kinesis'…

e507555

… module (only include when profile "include-kinesis" is activated)

rmetzger reviewed Apr 22, 2016
View reviewed changes

tzulitai added 9 commits April 23, 2016 17:24

[FLINK-3229] Remove unused class field

bddeee7

[FLINK-3229] Since fetched Kinesis records may be aggregated by the K…

9f177da

…PL, should deaggregate after fetching

[FLINK-3229] Modify the nextShardIterator update strategey to rely on…

b27e722

…ly on AWS SDK responses and not on timeout implementation

[FLINK-3229] Add example code for FlinkKinesisConsumer usage

e5c45a7

[FLINK-3229] Rename ManualTest to ManualProducerTest to better convey…

2fb970f

… it's purpose

[FLINK-3229] Generate serialVersionUID's according to serialver for S…

76b5dbd

…erializable classes

[FLINK-3229] Finalize javadocs for the consumer

2d50ddc

1) Remove javadoc for non-public facing classes / methods 2) Finish off any WIP javadoc messages 3) Make note in javadocs of some classes of the fact that the AWS libraries have similar implementations

rmetzger and others added 4 commits April 26, 2016 14:51

add manual exactly once test

65d45cf

[FLINK-3229] Merge in Robert's ExactlyOnceTest and ManualConsumerProd…

8e5c2cb

…ucerTest from rmetzger/FLINK-3229-pr-after-rework add manual exactly once test

[FLINK-3229] Move ShardConsumerThread out of KinesisDataFetcher as a …

84691b1

…public class for easier testing

[FLINK-3229] Move all manual Kinesis connector tests to org.apache.fl…

b0c36a9

…ink.streaming.connectors.kinesis.manualtests

tzulitai added 4 commits April 29, 2016 00:44

[FLINK-3229] Refactor the use of UserRecords.deaggregateRecords for e…

a232e92

…asier mocking in tests

[FLINK-3229] Make the maximum number of records to get per getRecords…

0f0b66c

…() call to shards user configurable

[FLINK-3229] validatePropertiesConfig should also check that number v…

73c2583

…alue configurations are given parsable values

asfgit closed this in 8673cee May 18, 2016

mbode pushed a commit to mbode/flink that referenced this pull request May 27, 2016

[FLINK-3229] Initial working version for FlinkKinesisConsumer.

6653e0a

This closes apache#1911

rmetzger added the component=Connectors/Common label Mar 14, 2019


		GetRecordsResult getRecordsResult = kinesisProxy.getRecords(nextShardItr, 100);

		final List<Record> fetchedRecords = getRecordsResult.getRecords();


		this.deserializer = checkNotNull(deserializer, "deserializer can not be null");

		this.shards = new KinesisProxy(configProps).getShardList(streams);

[FLINK-3229] Flink streaming consumer for AWS Kinesis #1911

[FLINK-3229] Flink streaming consumer for AWS Kinesis #1911

Conversation

tzulitai commented Apr 19, 2016

rmetzger commented Apr 19, 2016

tzulitai commented Apr 19, 2016 • edited Loading

rmetzger commented Apr 19, 2016

rmetzger Apr 19, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmetzger commented Apr 20, 2016

tzulitai commented Apr 20, 2016 • edited Loading

zentol commented Apr 20, 2016

tzulitai commented Apr 20, 2016

rmetzger commented Apr 20, 2016

tzulitai commented Apr 20, 2016

rmetzger commented Apr 20, 2016

Choose a reason for hiding this comment

tzulitai Apr 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzulitai Apr 23, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmetzger commented Apr 22, 2016

Choose a reason for hiding this comment

tzulitai Apr 22, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzulitai Apr 25, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzulitai commented Apr 23, 2016

tzulitai commented Apr 24, 2016

rmetzger commented Apr 24, 2016

rmetzger commented Apr 28, 2016

tzulitai commented Apr 28, 2016 • edited Loading

rmetzger commented May 3, 2016

tzulitai commented May 5, 2016

rmetzger commented May 5, 2016

rmetzger commented May 13, 2016

tzulitai commented Apr 19, 2016 •

edited

Loading

rmetzger Apr 19, 2016 •

edited

Loading

tzulitai commented Apr 20, 2016 •

edited

Loading

tzulitai Apr 21, 2016 •

edited

Loading

tzulitai Apr 23, 2016 •

edited

Loading

tzulitai Apr 22, 2016 •

edited

Loading

tzulitai Apr 25, 2016 •

edited

Loading

tzulitai commented Apr 28, 2016 •

edited

Loading