Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Forward-integrate branch 'master' into 'v2'#518

Merged
davorbonaci merged 20 commits intov2from
master
Jan 3, 2017
Merged

Forward-integrate branch 'master' into 'v2'#518
davorbonaci merged 20 commits intov2from
master

Conversation

@davorbonaci
Copy link
Copy Markdown
Contributor

No description provided.

dhalperi and others added 20 commits December 13, 2016 07:59
This is a spiritual backport of apache/beam#1060

Same checkstyle.xml changes, very similar fixes but recreated because of code divergence
checkstyle: improve Javadoc checking
* Templated jobs should use custom IO for BigQuery

* Fix assignment
Backport apache/beam#1230 (#501)

Update PubsubUnboundedSink.java

Update PubsubUnboundedSource.java

Update PubsubIOTest.java

Update PubsubUnboundedSinkTest.java

Update PubsubUnboundedSourceTest.java

Fixups

Fixups

* Fixups
* Update BigQueryIO.java

* Update BigQueryIO.java

* Update BigQueryIO.java

* Update BigQueryIOTest.java

* Fixups

* Fixups

* Fixups

* Fixups
Typically, input file patterns are validated during Pipeline
construction, but standard Read transforms include an option to disable
validation. This is generally useful but can lead to cases where a
Pipeline executes successfully with empty inputs.

This changes the behavior to fail execution on empty file-based inputs
even when validation is disabled.

(cherry picked from commit 9fc9d66212bf26087622ad6e042db982d5232c55)
Backport apache/beam#1327

Includes:

* Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB to prevent int overflow within Hadoop's sorting library
* Fix int overflow for large memory values in InMemorySorter
* Add note about estimated disk use to README.MD
* Fix to make Hadoop's sorting library put all temp files under the specified directory
* Have Hadoop clean up the temp directory on exit
* Stop shading hadoop dependencies. Some context:
** The existing shading is broken (modules that depend on this one cannot use it successfully).
** Hadoop's use of reflection in several instances makes shading the dependency "in a good way" nearly impossible. It requires a couple of rather brittle hacks, and, for clients that depend on certain conflicting versions of hadoop these hacks can mean it doesn't meet its intended goal of preventing conflicts anyway.
** From what I can tell, there's no good way to shade this to make it universally usable, so leaving it unshaded seems like a reasonable default.
** Without shading Hadoop, this module can be successfully used from Beam's wordcount example (which actually does have pre-existing hadoop dependencies already).
* TopWikipediaSessions: remove outdated autoscaling language
* DataflowPipelineWorkerPoolOptions
Version management: prep for 1.9.0 release
@googlebot
Copy link
Copy Markdown

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@davorbonaci davorbonaci merged commit d300bb3 into v2 Jan 3, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants