Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync-up from master branch #3648

Closed
wants to merge 390 commits into from
Closed

sync-up from master branch #3648

wants to merge 390 commits into from

Conversation

mingmxu
Copy link

@mingmxu mingmxu commented Jul 26, 2017

BEAM-2672 requires a new function added in #3628

manuzhang and others added 30 commits December 16, 2016 16:48
  Fix NoOpAggregatorFactory
  Remove print to stdout
  Skip window assignment when windows don't change
  Add Window.Bound translator
  Upgrade Gearpump version
Adjustments in gearpump-runner:

  [BEAM-79] Upgrade to beam-0.5.0-incubating-SNAPSHOT
  [BEAM-79] Update to latest Gearpump API

From master:

  Disable automatic archiving of Maven builds
  [BEAM-59] initial interfaces and classes of Beam FileSystem.
  Change counter name in TestDataflowRunner
  More escaping in Jenkins timestamp spec
  Add RunnableOnService test for Metrics
  Fix seed job fetch spec
  Show timestamps on log lines in Jenkins
  [BEAM-1165] Fix unexpected file creation when checking dependencies
  [BEAM-1178] Make naming of logger objects consistent
  [BEAM-716] Fix javadoc on with* methods [BEAM-959] Improve check preconditions in JmsIO
  [BEAM-716] Use AutoValue in JmsIO
  Fix grammar error (repeated for)
  Empty TestPipeline need not be run
  [BEAM-85, BEAM-298] Make TestPipeline a JUnit Rule checking proper usage
  Change counter name in TestDataflowRunner
  BigQueryIO: fix streaming write, typo in API
  [BEAM-853] Force streaming execution on batch pipelines for testing. Expose the adapted source.
  Use empty SideInputReader, fixes NPE in SimpleDoFnRunnerTest
  Test that SimpleDoFnRunner wraps exceptions in startBundle and finishBundle
  Add timer support to DoFnRunner(s)
  Make TimerSpec and StateSpec fields accessible
  View.asMap: minor javadoc fixes
  Revert "Move InMemoryTimerInternals to runners-core"
  Revert "Moves DoFnAdapters to runners-core"
  Revert "Removes ArgumentProvider.windowingInternals"
  Revert "Removes code for wrapping DoFn as an OldDoFn"
  checkstyle: missed newline in DistributionCell
  Make {Metric,Counter,Distribution}Cell public
  Add PTransformOverrideFactory to the Core SDK
  Move ActiveWindowSet and implementations to runners-core
  Update Dataflow worker to beam-master-20161216
  [BEAM-1108] Remove outdated language about experimental autoscaling
  [BEAM-450] Shade modules to separate paths
  [BEAM-362] Port runners to runners-core AggregatoryFactory
  Move InMemoryTimerInternals to runners-core
  Delete deprecated TimerCallback
  Remove deprecated methods of InMemoryTimerInternals
  Don't incorrectly log error in MetricsEnvironment
  Renames ParDo.getNewFn to getFn
  Moves DoFnAdapters to runners-core
  Removes unused code from NoOpOldDoFn
  Removes ArgumentProvider.windowingInternals
  Removes code for wrapping DoFn as an OldDoFn
  Removes OldDoFn from ParDo
  Pushes uses of OldDoFn deeper inside Flink runner
  Remove ParDo.of(OldDoFn) from Apex runner
  Converts all easy OldDoFns to DoFn
  [BEAM-1022] Add testing coverage for BigQuery streaming writes
  Fix mvn command args in Apex postcommit Jenkins job
  [BEAM-932] Enable findbugs validation (and fix existing issues)
  Fail to split in FileBasedSource if filePattern expands to empty.
  [BEAM-1154] Get side input from proper window in ReduceFn
  [BEAM-1153] GcsUtil: use non-batch API for single file size requests.
  Fix NPE in StatefulParDoEvaluatorFactoryTest mocking
  [BEAM-1033] Retry Bigquery Verifier when Query Fails
  Implement GetDefaultOutputCoder in DirectGroupByKey
  SimpleDoFnRunner observes window if SideInputReader is nonempty
  Better comments and cleanup
  Allow empty string value for ValueProvider types.
  starter: fix typo in pom.xml
  Revert "Allow stateful DoFn in DataflowRunner"
  Re-exclude UsesStatefulParDo tests for Dataflow
  Some minor changes and fixes for sorter module
  [BEAM-1149] Explode windows when fn uses side inputs
  Add Jenkins postcommit for RunnableOnService in Apex runner
  Update version from 0.5.0-SNAPSHOT to 0.5.0-incubating-SNAPSHOT
  Update Maven Archetype versions after cutting the release branch
  Move PerKeyCombineFnRunner to runners-core
  Update Dataflow worker to beam-master-20161212
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare branch release-0.4.0-incubating
  Fix version of Kryo in examples/java jenkins-precommit profile
  Revert 91cc606 "This closes #1586": Kryo + UBRFBS
  [BEAM-909] improve starter archetype
  Fix JDom malformed comment in Apex runner.
  [BEAM-927] Fix findbugs and re-enable Maven plugin in JmsIO
  [BEAM-807] Replace OldDoFn with DoFn.
  [BEAM-757] Use DoFnRunner in the implementation of DoFn via FlatMapFunction.
  FileBasedSinkTest: fix tests in Windows OS by using IOChannelUtils.resolve().
  FileBasedSink: ignore exceptions when removing temp output files for issues in Windows OS.
  [BEAM-1142] Upgrade maven-invoker to address maven bug ARCHETYPE-488.
  Add Tests for Kryo Serialization of URFBS
  Add no-arg constructor for UnboundedReadFromBoundedSource
  Revise WindowedWordCount for runner and execution mode portability
  Factor out ShardedFile from FileChecksumMatcher
  Add IntervalWindow coder to the standard registry
  Stop expanding PValues in DirectRunner visitors
  Migrate AppliedPTransform to use AutoValue
  Enable and fix DirectRunnerTest case missing @test
  [BEAM-1130] SparkRunner ResumeFromCheckpointStreamingTest Failing.
  [BEAM-1133] Add maxNumRecords per micro-batch for Spark runner options.
  BigQueryIO.Write: support runtime schema and table
  Fix handling of null ValueProviders in DisplayData
  [BEAM-551] Fix handling of default for VP
  [BEAM-1120] Move some DataflowRunner configurations from code to properties
  [BEAM-551] Fix toString for FileBasedSource
  [BEAM-921] spark-runner: register sources and coders to serialize with java serializer
  [BEAM-551] Fix handling of TextIO.Sink
  ...
  note thread is interrupted on InterruptedException
  Remove cache for Gearpump on travis
  reduce timeout to wait for result
  fix ParDo.BoundMulti translation
  return encoded key for GroupByKey translation
  support OutputTimeFn
  update to latest gearpump dsl function interface
  fix group by window
  activate ROS on Gearpump by default
  update ROS configurations
  [BEAM-1180] Implement GearpumpPipelineResult
  enable ParDoTest
  [BEAM-79] Add SideInput support for GearpumpRunner
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 9088a3e on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 07e8cd5 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d035a34 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 84a2379 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling b0b6421 on master into ** on DSL_SQL**.

szewi and others added 4 commits August 14, 2017 13:46
We should always stage the user's JAR. If we don't find any files and
none were specified, then the pipeline will not execute, and this should
fail early rather than later.
If choosing file load jobs on an unbounded PCollection,
a triggering frequency must be specified to control how
often load jobs are generated.
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling f7e8f88 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling f7e8f88 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 3a8b0b6 on master into ** on DSL_SQL**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 724eda3 on master into ** on DSL_SQL**.

Colin Phipps and others added 2 commits August 16, 2017 10:53
The approach used is as described in
https://landing.google.com/sre/book/chapters/handling-overload.html#client-side-throttling-a7sYUg
. By backing off individual workers in response to high error rates, we relieve
pressure on the Datastore service, increasing the chance that the workload can
complete successfully. This matches the implementation in the Java SDK.
@mingmxu
Copy link
Author

mingmxu commented Aug 16, 2017

close for further tasks

@mingmxu mingmxu closed this Aug 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet