Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-7762, FLINK-8167] Clean up and harden WikipediaEditsSource #5102

Closed
wants to merge 2 commits into from

Conversation

uce
Copy link
Contributor

@uce uce commented Nov 29, 2017

What is the purpose of the change

This pull requests addresses two related issues with the WikipediaEditsSource. It makes the WikipediaEditsSourceTest a proper test instead of unnecessarily starting a FlinkMiniCluster and addresses a potential test instability.

In general, the WikipediaEditsSource is not in good shape and could benefit from further refactoring. One potential area for improvement is integration with the asynchronous channel listener that reports events like errors or being kicked out of a channel, etc.

I did not do this due to time constraints and the fact that this is not a production source. In general, it is questionable whether we should keep the test as is or remove it since it depends on connectivity to an IRC channel.

Brief change log

  • Harden WikipediaEditsSource with eager sanity checks
  • Make WikipediaEditsSourceTest proper test

Verifying this change

This change is a rework/code cleanup without any new test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes, but only to flink-test-utils-junit
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

- Minor eager sanity checks
- Use UUID suffix for nickname. As reported in FLINK-8167, the current
  nickname suffix can result in nickname clashes which lead to test
  failures.
…er test

The WikipediaEditsSourceTest unnecessarily implements an integration
test that starts a FlinkMiniCluster and executes a small Flink program.

This simply creates a source and executes run in a separate thread until
a single WikipediaEditEvent is received.
@aljoscha
Copy link
Contributor

The changes look good! But you comment about needing that IRC channel seem valid. If we don't have the test, however, we would have no way of knowing that the code at least works. (We could also drop the source completely)

@uce
Copy link
Contributor Author

uce commented Nov 29, 2017

I would be in favour of removing this or moving it to Bahir, but it is currently used in a doc example. Besides that I doubt that this is of much value to users.

If you think it's OK with it, let's merge this for now and think about the example thing really needs it.

@tzulitai
Copy link
Contributor

LGTM.

+1 to merge this now, and we think about removing the source as well as reworking the doc example to not use it (I think it might be nice to directly kick off the docs with an intro example that read / writes to Kafka), as a separate issue.

@zentol
Copy link
Contributor

zentol commented Dec 4, 2017

merging.

asfgit pushed a commit that referenced this pull request Dec 4, 2017
…er test

The WikipediaEditsSourceTest unnecessarily implements an integration
test that starts a FlinkMiniCluster and executes a small Flink program.

This simply creates a source and executes run in a separate thread until
a single WikipediaEditEvent is received.

This closes #5102.
@asfgit asfgit closed this in e7ca105 Dec 4, 2017
glaksh100 pushed a commit to lyft/flink that referenced this pull request Jun 6, 2018
…er test

The WikipediaEditsSourceTest unnecessarily implements an integration
test that starts a FlinkMiniCluster and executes a small Flink program.

This simply creates a source and executes run in a separate thread until
a single WikipediaEditEvent is received.

This closes apache#5102.
@uce uce deleted the 7762-8167-wikiedits branch May 19, 2019 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants