Skip to content

Conversation

@ArtHustonHitachi
Copy link

Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.

Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.

Contribution Checklist

  • Make sure that the pull request corresponds to a JIRA issue. Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.

  • Name the pull request in the form "[FLINK-XXXX] [component] Title of the pull request", where FLINK-XXXX should be replaced by the actual issue number. Skip component if you are unsure about which is the best component.
    Typo fixes that have no associated JIRA issue should be named following this pattern: [hotfix] [docs] Fix typo in event time introduction or [hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator.

  • Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.

  • Make sure that the change passes the automated tests, i.e., mvn clean verify passes. You can set up Travis CI to do that following this guide.

  • Each pull request should address only one issue, not mix up code from multiple issues.

  • Each commit in the pull request has a meaningful commit message (including the JIRA id)

  • Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.

(The sections below can be removed for hotfixes of typos)

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

  • The TaskInfo is stored in the blob store on job creation time as a persistent artifact
  • Deployments RPC transmits only the blob storage reference
  • TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (100MB)
  • Extended integration test for recovery after master (JobManager) failure
  • Added test that validates that TaskInfo is transferred only once across recoveries
  • Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

aljoscha and others added 30 commits November 13, 2017 12:05
Before, we had it in places that require it. This doesn't work when
running mvn javadoc:aggregate because this will only run for the root
pom and can then not find the "bundle" dependencies.
- do not shade everything, especially not JDK classes!
-> instead define include patterns explicitly
- do not shade core Flink classes (only those imported from flink-hadoop-fs)
- hack around Hadoop loading (unshaded/non-relocated) classes based on names in
  the core-default.xml by overwriting the Configuration class (we may need to
  extend this for the mapred-default.xml and hdfs-defaults.xml):
-> provide a core-default-shaded.xml file with shaded class names and copy and
  adapt the Configuration class of the respective Hadoop version to load this
  file instead of core-default.xml.

Add checkstyle suppression pattern for the Hadoop Configuration classes

Also fix the (integration) tests not working because they tried to load the
relocated classes which are apparently not available there

Remove minimizeJar from shading of flink-s3-fs-presto because this was
causing "java.lang.ClassNotFoundException:
org.apache.flink.fs.s3presto.shaded.org.apache.commons.logging.impl.LogFactoryImpl"
since these classes are not statically imported and thus removed when
minimizing.

Fix s3-fs-presto not shading org.HdrHistogram

Fix log4j being relocated in the S3 fs implementations

Add shading checks to travis
This uses traps to ensure that we properly do cleanups, remove config
values and shutdown things.
…ateBackend and MemoryStateBackend.

(cherry picked from commit 2906698)
This a walkaround strange javaassist bug. The issue should go away
once we upgrade netty dependency.

Please check the ticket for more information.

This closes #5007.
As of FLINK-4500 the Cassandra connector will wait for pending updates to finish upon checkpoint.

This closes #5002.
tzulitai and others added 29 commits February 6, 2018 18:30
…store

Previously, the key and namespace serializers for the
HeapInternalTimerService were not reconfigured on restore to be compatible
with previously written serializers.

This caused an immediate error to restore savepoints in Flink 1.4.0,
since in Flink 1.4.0 we changed the base registrations in the Kryo
serializer. That change requires serializer reconfiguration.

This commit fixes this by writing also the serializer configuration
snapshots of the key and namespace serializer into savepoints, and use
them to reconfigure the new serializers on rrestore. This improvement also
comes along with making the written data for timer service snapshots
versioned. Backwards compatibility with previous non-versioned formats
is not broken.
… AbstractEventTimeWindowCheckpointingITCase

After adding the TypeSerializerConfigSnapshots of timer serializers to
the timers snapshots, the size of the timer snapshots have potentially
doubled. This caused the AbstractEventTimeWindowCheckpointingITCase to
be failing, because the configured max memory state size and Akka
framesize were too small. This commit doubles those sizes.

This closes #5362.
…askManagerRunner

Previously, the YarnTaskManagerRunner contained a code path that exists
for the sole purpose of injecting mock runners. Having code paths just
to utilize tests in production code is in general a bad idea.

This commit fixes this be making YarnTaskManagerRunner a factory-like
class, which creates a Runner that contains all the runner’s properties,
such as configuration. Unit tests can than test against the contained
configuration in the created Runner to validate that everything is
configured properly.

This closes #5172.
Before, the condition was being read via in.read() and not
in.readFully()
…onnector shading

- Do not shade Elasticsearch dependencies
- Do not shade Flink Elasticseach Connector classes
- Also shade log4j-api dependency in Elasticsearch 5 connector. This is
  required for the log4j-to-slf4j bridge adapter to work properly.
- Add NOTICE files for license statements for all ES connectors

This closes #5426.
This closes #5243.
…s a data stream as keyed stream (backport from 1.5 branch)

This closes #5439.
…rterIsClosed()

The test is inherently unstable as it will always fail if any other
server is started on the port between the closing of the reporter and
the polling of metrics.

This closes #5473.
…aultRegistry

It appeared as if the HTTPServer wasn't actually doing anything, but it
internally accessed the singleton registry that we also access to
register metrics.
This is preparation for modifying a new ITCase to use modern state
features.
This new test does not pretend to use legacy state but now instead uses
the more modern operator state varieties.
…aConsumer

This commit fixes incorrectly using the parent of the user code class
loader. Since Kafka 010 / 011 versions directly reuse 09 code, this fix
fixes the issue for all versions.

This commit also extends the Kafka010Example, so that is uses a custom
watermark assigner. This allows our end-to-end tests to have caught this
bug.
This allows the test to perform the cleanup procedure (as well as
printing any error logs) if an interruption occurred while waiting for
the test data to be written to Kafka, therefore increasing visibility of
reasons to why the test was stalling.

This closes #5568.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.