New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-79] merge gearpump-runner into master #3611
Conversation
They have apparently deleted the SNAPSHOT jar and now builds are failing.
The only way it's safe to split a compressed file is if the file is not compressed. This can only happen when the source itself is splittable, and that in turn will result in the inner source's reader being returned. A CompressedReader will only be created in the event that the file is NOT splittable. So remove all the logic handling splittable compressed readers, and instead go with the logic when we know/assume the file is compressed. * TextIO: test compression with larger files It is important for correctness that we test with large files because otherwise the compressed file may be larger than the uncompressed file, which could mask bugs * TextIOTest: flesh out more * TextIOTest: add large uncompressed file
If the test hangs due to bugs, the infrastructure should kill it.
If provided with an Unbounded PCollection, Write will fail due to restriction of calling finalize only once. This error message fails in a deep stack trace based on it not being possible to apply a GroupByKey. Instead, throw immediately on application with a specific error message.
These writes should be forbidden based on the boundedness of the input PCollection. As Write explicitly forbids the application of the transform to an Unbounded PCollection, this will be equivalent in most cases; In cases where the input PCollection is Bounded, due to an UnboundedReadFromBoundedSource, the write will function as expected and does not need to be forbidden.
Aggregator is the model level concept. Counter was specific to the Dataflow Runner, and is now not needed as part of Beam.
This cleans up any state stored within the Transform Evaluator Factory.
This is a more focused interface that interacts with a DoFn before it is available for use and after it has completed and the reference is lost. It is required to properly support setup and teardown, as the fields in a ThreadLocal cannot all be cleaned up without additional tracking. Part of BEAM-452.
These tests are not yet functional in all runners, and this makes them easier to ignore.
Until we implement it for Dataflow runner.
Remove unused codes Fix kryo exception Fix PCollectionView translation upgrade to gearpump 0.8.4-SNAPSHOT Fix side input handling in DoFnFunction Respect WindowFn#getOutputTime in gearpump-runner Activate Gearpump local-validates-runner-tests in precommit Update against master changes Update gearpump-runner against master changes Update gearpump-runner against master changes Update gearpump-runner against master changes. [BEAM-972] Add more unit test to Gearpump runner [BEAM-972] Add unit tests to Gearpump runner [BEAM-79] Fix gearpump-runner merge conflicts and test failure enable ParDoTest [BEAM-79] Add SideInput support for GearpumpRunner [BEAM-79] Support merging windows in GearpumpRunner [BEAM-79] Fix PostCommit test confs for Gearpump runner note thread is interrupted on InterruptedException Remove cache for Gearpump on travis reduce timeout to wait for result fix ParDo.BoundMulti translation return encoded key for GroupByKey translation support OutputTimeFn update to latest gearpump dsl function interface fix group by window activate ROS on Gearpump by default update ROS configurations [BEAM-1180] Implement GearpumpPipelineResult [BEAM-79] Upgrade to beam-0.5.0-incubating-SNAPSHOT [BEAM-79] Update to latest Gearpump API Fix NoOpAggregatorFactory Remove print to stdout Skip window assignment when windows don't change Add Window.Bound translator Upgrade Gearpump version [BEAM-79] fix gearpump runner build failure [BEAM-79] update GearpumpPipelineResult [BEAM-79] Port Gearpump runner from OldDoFn to new DoFn upgrade gearpump-runner to 0.4.0-incubating-SNAPSHOT remove "pipeline" in runner name post-merge fix [BEAM-79] fix integration-test failure fix import order !fixup Minor javadoc clean-up Added even more javadoc to TextIO#withHeader and TextIO#withFooter (2). Added even more javadoc to TextIO#withHeader and TextIO#withFooter. Added javadoc to TextIO#withHeader and TextIO#withFooter. Reverted header and footer to be of type String. Revised according to comments following a code review. Add header/footer support to TextIO.Write [BEAM-242] Enable and fix checkstyle in Flink runner examples Remove timeout in JAXBCoderTest Be more accepting in UnboundedReadDeduplicatorTest BigQuery: limit max job polling time to 1 minute [BEAM-242] Enable checkstyle and fix checkstyle errors in Flink runner [BEAM-456] Add MongoDbIO FluentBackoff: a replacement for a variety of custom backoff implementations Remove the DataflowRunner instructions from examples Put classes in runners-core package into runners.core namespace Delegate populateDipslayData to wrapped combineFn's Fixed Combine display data Cloud Datastore naming clean-up DatastoreIO SplitQueryFn integration test Add Latest CombineFn and PTransforms Remove empty unused method in TestStreamEvaluatorFactory Test that multiple instances of TestStream are supported Correct some accidental renames Fix condition in FlinkStreamingPipelineTranslator Address comments of Flink Side-Input PR [BEAM-569] Define maxNumRecords default value to Long.MAX_VALUE in JmsIO Add LeaderBoardTest take advantage of setup/teardown for KafkaWriter Returned KafkaIO getWatermark log line in debug mode [BEAM-572] Remove Spark Reference in WordCount Update Dataflow Container Version [BEAM-313] Provide a context for SparkRunner DataflowRunner: get PBegin from PInput [BEAM-592] Fix SparkRunner Dependency Problem in WordCount Fix javadoc in Kinesis Organize imports in Kinesis kinesis: a connector for Amazon Kinesis [BEAM-589] Fixing IO.Read transformation Query latest timestamp travis.yml: disable updating snapshots Added support for reporting aggregator values to Spark sinks [BEAM-294] Rename dataflow references to beam Modified BigtableIO to use DoFn setup/tearDown methods instead of startBundle/finishBundle checkstyle: prohibit API client repackaged Guava Make WriteTest more resilient to Randomness Update DoFn javadocs to remove references to OldDoFn and Dataflow [BEAM-545] Promote JobName to PipelineOptions Move the samples data to gs://apache-beam-samples/ Cleanup some javadoc that referring Dataflow BigQueryIO.Write: raise size limit to 11 TiB Optimize imports Update checkstyle.xml to put all imports in one group Fix Exception Unwrapping in TestFlinkRunner Make ParDoLifecycleTest Serializable to Fix Test with TupleTag Use AllPanes as the PaneExtractor in IterableAssert ...
…branch Don't call .testingPipelineOptions() a second time GCP IO ITs now all use --project option Select SDK distribution based on the selected SDK name [BEAM-2373] Upgrade commons-compress dependency version to 1.14 Define the projectId in the SpannerIO Read Test (utest, not itest) Use SDK harness container for FnAPI jobs when worker_harness_container_image is not specified. Add a separate image tag to use with the SDK harness container. Ditch apache commons Add PubSub I/O support to Python DirectRunner Only use ASCII 'a' through 'z' for temporary Spanner tables ReduceFnRunner.onTrigger: add short circuit for empty pane, and move inputWM and pane after the short circuit. WindowingStrategy: add OnTimeBehavior to control whether to emit empty ON_TIME pane. Removed OnceTriggerStateMachine Visit composite nodes when checking for picklability. Upgrade beam bigtable client dependency to 0.9.7.1 Add a Combine Test for Sliding Windows without Context [BEAM-2389] moved GcpCoreApiSurfaceTest to corresponding module, adapted exposed packagees Add Experimental annotation to AMQP and refine Kind for the Experimental IOs [BEAM-2488] Elasticsearch IO should read also in replica shards Use PCollectionViews.toAdditionalInputs in Combine Use PCollectionViews.toAdditionalInputs in ParDo Use PCollectionViews.toAdditionalInputs in ParDoMultiOverrideFactory Fix getAdditionalInputs for SplittableParDo transforms Add utility to expand list of PCollectionViews Read api with naive implementation Pre read api refactoring. Extract `SpannerConfig` and `AbstractSpannerFn` Bump spanner version [BEAM-1187] Improve logging to contain the number of retries done due to IOException and unsuccessful response codes. Add WindowFn#assignsToOneWindow Use installed distribution name for sdk name [BEAM-2522] upgrading jackson to 2.8.9 (mitigating apache#1599) Enable grpc controller in fn_api_runner Removed uses of proto builder clone method [BEAM-2514] Improve error message on missing required value [BEAM-1237] Create AmqpIO Implement streaming GroupByKey in Python DirectRunner Bump Dataflow worker to 0623 Reintroduces DoFn.ProcessContinuation (Dataflow worker compatibility part) Remove old deprecated PubSub code Fix a typo in function args Avoid pickling the entire pipeline per-transform. Fix python fn API data plane remote grpc port access [BEAM-2745] Add Jenkins Suite for Python Performance Test [BEAM-2489] Use dynamic ES port in HIFIOWithElasticTest [BEAM-2497] Fix the reading of concat gzip files Allow output from FinishBundle in DoFnTester DataflowRunner: Reject merging windowing for stateful ParDo
… to gearpump 0.8.4 Fix ParDoTest#testPipelineOptionsParameter Upgrade to gearpump 0.8.4 Fix javadoc generation for AmqpIO, CassandraIO and HCatalogIO Simplified ByteBuddyOnTimerInvokerFactory Fix bad merge Made DataflowRunner TransformTranslator public Process timer firings for a window together Ignore processing time timers in expired windows Add timeout to initialization of partition in KafkaIO [BEAM-2534] Handle offset gaps in Kafka messages. Fix PValue input in _PubSubReadEvaluator Update SDK dependencies Disallow Combiner Lifting for multi-window WindowFns [BEAM-2553] Update Maven exec plugin to 1.6.0 to incorporate messaging improvements Website Mergebot Job Update Python SDK version [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare branch release-2.1.0 For GCS operations use an http client with a default timeout value. [BEAM-2530] Fix compilation of modules with Java 9 that depend on jdk.tools Make modules that depend on Hadoop and Spark use the same version property Fix DoFn javadoc: StateSpec does not require a key Add support for PipelineOptions parameters Properly convert milliseconds whether there's less than 3/more than 9 digits. TimeUtil did not properly convert (and returned null) when the number of digits for fractions of seconds was less than 3 digits or more than 9 digits. The solution is to pad with zeros when there is less than 3 digits and to truncate when there is more than 3.
ca6e4f1
to
16334ca
Compare
16334ca
to
daa7566
Compare
0f10e8d
to
025c0e9
Compare
025c0e9
to
49d4ed5
Compare
R: @kennknowles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the poms.
Actually, I suggest you open a pull request from gearpump-runner
to master
both on the main apache/beam fork. This makes it very clearly exactly what is proposed.
Then, you can separately do little PRs from manuzhaing:whatever
to origin:gearpump-runner
to address comments and those commits will automatically show up in the main PR.
Some things can wait until after the merge, but getting the slower tests to be postcommit-only must happen before the merge.
runners/gearpump/pom.xml
Outdated
<profile> | ||
<id>local-validates-runner-tests</id> | ||
<!-- TODO: once on master branch, move to postcommit only --> | ||
<activation><activeByDefault>true</activeByDefault></activation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now this should not be active by default. It should run only on postcommit.
@@ -89,6 +89,18 @@ | |||
</dependencies> | |||
</profile> | |||
|
|||
<!-- Include the Apache Gearpump (incubating) runner with -P gearpump-runner --> | |||
<profile> | |||
<id>gearpump-runner</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look in this file for the jenkins-precommit
bits. You can add an execution for the Gearpump runner.
@kennknowles split this into #3636 and #3637. I will leave it open for a while since it is referenced on mailing list |
Great. Those PRs look just right. Fine to leave this open - but links will keep working even when close. |
d56606d
to
a56c599
Compare
a56c599
to
b0ed584
Compare
Changes Unknown when pulling b0ed584 on manuzhang:gearpump-runner into ** on apache:master**. |
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue.mvn clean verify
to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.