Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7] Initial Dataflow code drop #1

Merged
merged 1,575 commits into from Feb 26, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1575 commits
Select commit Hold shift + click to select a range
6bb5987
Fix javadoc @link warnings
peihe Jan 14, 2016
86fd527
Adding "DocInclude" metadata comments to the "game" example
devin-donnelly Jan 14, 2016
b298fe1
Adding worker ID to the upload id logging
laraschmidt Jan 14, 2016
79d9892
DataflowAssert: throw when .equals(Object) is called
dhalperi Jan 15, 2016
e821982
Generalize the 'game' example BigQuery write classes
amygdala Jan 15, 2016
848e6a0
Use .jar for staged directory packages
Jan 15, 2016
a904090
DefaultProjectFactory: make it use new gcloud properties files
dhalperi Jan 15, 2016
ddbeb97
Checkstyle: support disabling specific analyzers
dhalperi Jan 15, 2016
2fc26c5
Add README.md for the "game" example series
davorbonaci Jan 16, 2016
cabbfea
Stops holding initializationStateLock while opening the reader
jkff Jan 18, 2016
b2c2214
Upgrade JaCoCo to 0.7.5
andreich Jan 18, 2016
e9bb2a2
Fix typo: wrong table column name
amygdala Jan 19, 2016
608fd21
Fix typo in OutputTimeFn
kennknowles Jan 19, 2016
2713af1
Implement typeFromId(DatabindContext,String) within CoderUtils
lukecwik Jan 19, 2016
02acad4
Fixes a bug in custom unbounded readers
jkff Jan 21, 2016
4114f62
Adapt to be able to upload to maven-central
mrunesson Jan 24, 2016
1c224b6
Aditional info to simplify release.
mrunesson Jan 25, 2016
7fbb206
fix typo
hildrum Jan 27, 2016
096e4c2
Merge pull request #106 from hildrum/patch-1
dhalperi Jan 27, 2016
575b733
Fix review comments.
mrunesson Jan 30, 2016
52e7e0e
Merge pull request #104 from mrunesson/maven-central
davorbonaci Jan 31, 2016
acb83b7
Version management
lukecwik Jan 26, 2016
96ec971
BigQueryTableRowIterator: elide columns with null values
dhalperi Jan 19, 2016
0ed16ff
Rollback "BigQueryTableRowIterator: elide columns with null values"
dhalperi Jan 19, 2016
7d82215
Deterministically choose freshest aggregations in pipeline results
robertwb Jan 20, 2016
73cee98
Split streaming status pages into servlets
Jan 20, 2016
6db9723
Expose dependent realtime watermark via Windmill protos
kennknowles Jan 20, 2016
219d22a
Add ByteStringCoder, a coder for ByteStrings
dhalperi Jan 20, 2016
1e5524a
CustomSources: remove dead code
dhalperi Jan 20, 2016
7509f87
Ignore Eclipse project files in root .gitignore
dhalperi Jan 20, 2016
c144180
BigQueryTableRowIterator: elide columns with null values
dhalperi Jan 20, 2016
d87e2e2
Split out StatusDataProviders
Jan 20, 2016
7248ecf
Proto changes for multi-worker support.
dpmills Jan 20, 2016
cb11062
Use the standard set of status pages for Batch
Jan 20, 2016
46a1ece
Add FinishedTriggers abstraction with BitSet and Set implementations
kennknowles Jan 20, 2016
a46d3bd
CustomSources: add logs and normalize log levels
dhalperi Jan 21, 2016
a923d77
SingletonAssert: add a notEqualTo matcher
sammcveety Jan 21, 2016
6e5d743
Implement Counter#merge
tgroh Jan 21, 2016
054f1e6
Optimize mergeAccumulators by reusing an existing accumulator
peihe Jan 21, 2016
baa8e2f
Handle when Dataflow service tells us that values are sorted in GBK
lukecwik Jan 21, 2016
c41d154
Continues unifying ReaderIterator and Source.Reader
jkff Jan 21, 2016
169e340
Fix Coder.Context equality and hashCode
lukecwik Jan 22, 2016
1c47353
Update prebuilt proto libraries for Dataflow to 2016-01-20 version
kennknowles Jan 22, 2016
b5b8d95
Upgrade Jackson dependency from 2.4.5 to 2.7.0
lukecwik Jan 22, 2016
61a2e27
Expose event time and synchronized upstream processing time
kennknowles Jan 22, 2016
cfda3ff
Update Triggers to new shouldFire() based semantics
kennknowles Jan 22, 2016
4fa7bd3
StreamingWriteFn: check if table exists before creating
dhalperi Jan 22, 2016
25b4e87
Let shuffle reader and writer update counters.
mwegiel Jan 23, 2016
8929f3e
Provide a reasonable default to Window.Bound#apply
tgroh Jan 23, 2016
4ed7f13
Allow 'game' examples to append to existing BigQuery tables.
jeffgardner Jan 25, 2016
ec5dfe8
Create StreamingGroupAlsoByWindowsDoFnRunner
peihe Jan 25, 2016
e468cad
DatastoreIO: do not split when QuerySplitter fails
dhalperi Jan 26, 2016
50f98a7
BigQueryTableInserter: retry rateLimitExceeded API calls
dhalperi Jan 26, 2016
ab733b4
AvroCoder: more efficient use of Avro APIs
dhalperi Jan 26, 2016
46a8082
Modify GameStats window definitions for blog post
amygdala Jan 26, 2016
186a7ff
Add new property names
lukecwik Jan 26, 2016
ede56ef
Add increment support with positive infinity
lukecwik Jan 27, 2016
ec6a695
BigQueryTableRowIterator: handle query failures
dhalperi Jan 27, 2016
e006292
Add CounterSet#merge(CounterSet)
tgroh Jan 28, 2016
b08ec20
Allow unflattened results from a BigQuery query-based export
sammcveety Jan 28, 2016
97cf4f2
Clean up GameStats session branch for blog post
amygdala Jan 28, 2016
76b8689
Account for per-window overhead in WindmillStateCache
dpmills Jan 28, 2016
e64446c
Add support to configure the shuffle client library with a property.
Jan 28, 2016
9b0b395
Create worker maven module
lukecwik Jan 29, 2016
ce8c03f
Add explicit record type for Ism files
lukecwik Jan 29, 2016
8b4c34e
Upgrade protobuf runtime to version 3.0.0-beta-1
dhalperi Jan 29, 2016
61853d1
Refactor LateDataDroppingDoFnRunner
tgroh Jan 29, 2016
f7feb71
BigQueryIO: remove DirectRunner dependence on Dataflow native reader
dhalperi Jan 31, 2016
0adaa91
runners.CountingSource: rename to TestCountingSource
dhalperi Feb 1, 2016
ef71d47
Validate DoFn.createAggregate is not called during pipeline processing
swegner Feb 1, 2016
12be5db
DataflowPipelineRunner: retry source splitting when too many bundles
dhalperi Feb 1, 2016
da4f771
DataflowWorker: elide some for loops for unused log levels
dhalperi Feb 1, 2016
4638244
BigQueryReader: simplify and remove some redundant code
dhalperi Feb 1, 2016
ea0e7bb
Create internal-only classifier for Dataflow SDK
lukecwik Feb 2, 2016
fc98f52
Change BQ mode to append instead of truncate
ravwojdyla Feb 4, 2016
47178ec
Disable surefire forking in Travis-CI
dhalperi Feb 10, 2016
793a652
Merge pull request #110 from dhalperi/pr108-take2
davorbonaci Feb 10, 2016
0166acc
Move Dataflow worker mains to worker maven module
lukecwik Feb 2, 2016
e852da2
Removes dependency on LegacyReaderIterator from more readers
jkff Feb 2, 2016
579c8f3
Add support to export BigQuery files in Avro format
sammcveety Feb 2, 2016
ed87d95
Move some status pages to worker module
lukecwik Feb 2, 2016
748ada7
CombineWithContext support in batch PartialGroupByKeyOperation
peihe Feb 2, 2016
e595281
Fix TimeTrigger#isCompatible
kennknowles Feb 2, 2016
4942eef
Move StructuralByteArray to the coder package
peihe Feb 2, 2016
5f06ee5
AvroUtils: default missing field schema mode to NULLABLE, and test
dhalperi Feb 3, 2016
d476d5b
Add ByteCount for BigQueryIO streaming write
tucosh Feb 3, 2016
4dce3c2
Default to exporting BigQuery files in Avro format
sammcveety Feb 3, 2016
ad60609
Add InProcessPipelineRunner, Evaluation Interfaces
tgroh Feb 4, 2016
d0e28bc
Upgrade maven shade plugin to avoid infinite loop
lukecwik Feb 4, 2016
460dc20
Add InMemoryWatermarkManager
tgroh Feb 4, 2016
b88c2c4
Have worker maven module depend on unshaded/unrelocated test jar
lukecwik Feb 4, 2016
396e0cc
Separate CustomSources into worker and non-worker pieces
kennknowles Feb 4, 2016
cb11e9d
Most methods in com.google.common will soon have @CheckReturnValue ap…
kluever Feb 4, 2016
7d7a49a
Move ReaderFactory.Registry to ReaderRegistry
kennknowles Feb 4, 2016
1ddc6fd
Add Ism file format version 2
lukecwik Feb 4, 2016
bee2bb3
Always explicitly pass ReaderRegistry to MapTaskExecutorFactory
kennknowles Feb 4, 2016
97b9ed3
Move worker tests to worker maven module
lukecwik Feb 4, 2016
2db0f78
CustomSourcesTest: use a smaller byte limit to reduce test runtime
dhalperi Feb 4, 2016
5addd1b
Explicitly pass ReaderFactory from worker classes to SideInputUtils
kennknowles Feb 4, 2016
be3326c
Move MapTaskExecutorFactory and system DoFns and their factories
lukecwik Feb 5, 2016
7794247
Move several operations and dependencies to worker module
lukecwik Feb 5, 2016
085469d
Split InMemoryBundle into Read/Write only interfaces
tgroh Feb 5, 2016
4c1e911
ApiSurfaceTest: whitelist com.google.auth package
dhalperi Feb 5, 2016
f41716a
Make SinkFactory a real class; remove some reflection
kennknowles Feb 5, 2016
91a418f
Marks getAllowedTimestampSkew as deprecated.
robertwb Feb 5, 2016
6100a17
Add the transforms required to create an Ism side input
lukecwik Feb 5, 2016
867afdd
Don't wrap system DoFn exceptions.
swegner Feb 5, 2016
abcbdf4
Move BatchModeExecutionContext and dependents to worker module
kennknowles Feb 6, 2016
64eefa3
Handle SynchronizedProcessingTime in InMemoryWatermarkManager
tgroh Feb 6, 2016
1452daa
Move streaming worker code into worker module
kennknowles Feb 6, 2016
c27412e
Move Windmill-specific code and dependents to worker module
kennknowles Feb 7, 2016
fba2172
Move Reader classes and dependents to worker module
kennknowles Feb 7, 2016
9be2b2d
Move over IO sink factories and dependants
lukecwik Feb 8, 2016
7209750
Render the /threadz page using plain text.
Feb 8, 2016
d6e5cc9
Move remainder of status servlets
lukecwik Feb 8, 2016
9894b42
Decouple StateContext from ReduceFn
kennknowles Feb 8, 2016
314e8dd
Allows specifying if an OffsetBasedSource supports dynamic splitting
jkff Feb 8, 2016
12dd58c
Create a top level GroupingTable interface and its factory class
peihe Feb 8, 2016
a2d8cd5
Experimental Cloud Bigtable sink
dhalperi Feb 8, 2016
a1be256
Tighten BigtableIO's API surface exposure
dhalperi Feb 9, 2016
f087da6
Separate Timers interface from ReduceFn
kennknowles Feb 9, 2016
546ee4f
Merge TimeTrigger into its only subclass
kennknowles Feb 9, 2016
d58d7da
ApiSurface: add debug messages
dhalperi Feb 9, 2016
5ae27f0
Migrate AvroIO to internally use Read.from(AvroSource)
lukecwik Feb 9, 2016
38702c3
Upgrade to SLF4J 1.7.14 so that we can use slf4j-api MDC
lukecwik Feb 9, 2016
c71eb08
Eagerly merge all window state
Feb 9, 2016
26f5dc4
Add the Ism side input reader
lukecwik Feb 9, 2016
c36a2e9
Move over various worker classes to worker maven module
lukecwik Feb 9, 2016
1067af6
Move over several shuffle classes to the worker maven module
lukecwik Feb 10, 2016
478b1de
Add a DoFnRunner for StreamingGroupAlsoByWindows with side inputs
peihe Feb 10, 2016
f833d2e
Handle Timers in InMemoryWatermarkManager
tgroh Feb 10, 2016
62e516b
Move over worker logging classes to worker maven module
lukecwik Feb 10, 2016
38b5bc0
Move worker implementation classes of aggregators to worker maven module
lukecwik Feb 10, 2016
39422b9
Remove Base64Utils which is dead code
lukecwik Feb 10, 2016
a0e2715
Move over several Dataflow worker specific classes to the worker mave…
lukecwik Feb 11, 2016
a6c3e40
ByteKey: a key represented byte[]
dhalperi Feb 11, 2016
35dbb0c
Remove duplicate definition of MetricUpdate utilities
lukecwik Feb 11, 2016
b494fef
Enable new Ism side input format by default for batch Dataflow jobs
lukecwik Feb 11, 2016
3f77c39
Use shaded guava from within worker maven module
lukecwik Feb 12, 2016
3c57e54
ByteKey: add a static instance EMPTY for the empty key
dhalperi Feb 12, 2016
0e204fe
Move miscellaneous worker classes to worker maven module
lukecwik Feb 12, 2016
aaf27d3
Allow StreamingDataflowWorker to run against a remote gRPC windmill
dpmills Feb 12, 2016
2d88e6f
Move CustomSources logic to live in more appropriate places
lukecwik Feb 12, 2016
f0198dd
Add filename-based compression selection to CompressedSource
kennknowles Feb 12, 2016
dbcc0af
Make internal state APIs explicitly keyed
kennknowles Feb 12, 2016
6edc6e0
Add KeyedCombiningValueStateTag containing a KeyedCombineFn
kennknowles Feb 12, 2016
da1a5bf
Explicitly select GroupAlsoByWindow in default GBK
tgroh Feb 13, 2016
7a98381
Add ByteKeyRange and ByteKeyRangeTracker
dhalperi Feb 13, 2016
ced1724
Make some ReduceFnRunner implementation details private
kennknowles Feb 15, 2016
f108866
Treat ReduceFn and Trigger exceptions as system exceptions
kennknowles Feb 16, 2016
841e2a2
Decouple NonEmptyPanes from ReduceFn somewhat
kennknowles Feb 16, 2016
857ee11
Decouple TriggerRunner from ReduceFn
kennknowles Feb 16, 2016
897d744
Support reading uncompressed files when gzip is expected
kennknowles Feb 16, 2016
290d69f
Explicitly provide StateInternals to ReduceFnRunner
kennknowles Feb 16, 2016
0467210
Fix typo in TimerInternals comment.
charlesccychen Feb 16, 2016
18d07e3
Fix propagation of streaming keys
kennknowles Feb 16, 2016
72521c7
Do not assume channel is nonempty in TextIO gzip detection
kennknowles Feb 16, 2016
5f5c26f
Make WatermarkHold package-private
kennknowles Feb 16, 2016
20dd707
Minor revision of ReduceFnRunner docs
kennknowles Feb 16, 2016
6e902d7
Minor revision of ActiveWindowSet javadoc
kennknowles Feb 16, 2016
7f354a1
Make empty flatten work in streaming
dpmills Feb 16, 2016
0ebc999
Minor revision of TriggerRunner javadoc
kennknowles Feb 16, 2016
50721f5
Improve IntraBundleParallelizationTest
tgroh Feb 16, 2016
c1dc715
Move specialized GroupAlsoByWindow implementations
tgroh Feb 16, 2016
c1a2e1f
Declare SideInputs on Combine.GroupedValues
tgroh Feb 17, 2016
75843ea
Update Dataflow API version to v1b3-rev19-1.21.0
tgroh Feb 17, 2016
e305154
Add InProcess Clock abstraction, nanoTime implementation
tgroh Feb 17, 2016
aa7f07f
Improve TypeDescriptor inference of SimpleFunction
Feb 17, 2016
3f24ca3
Fix AfterSynchronizedProcessingTime.java
dpmills Feb 17, 2016
d30c981
Remove debug statement from test
lukecwik Feb 17, 2016
54999c5
Make sure every hold is accompanied by a proximate timer
Feb 17, 2016
36732d4
Add support for splitting a compressed source for uncompressed files
lukecwik Feb 17, 2016
059ca33
Add --worker_harness_container_image PipelineOption
davorbonaci Feb 17, 2016
b31d7c6
Use new keyed state API to remove ReduceFn.Factory
kennknowles Feb 17, 2016
cc0db49
Implement CopyOnAccessInMemoryStateInternals
tgroh Feb 17, 2016
41c04e3
Handle null input bundles in InMemoryWatermarkManager
tgroh Feb 17, 2016
b7505c5
Remove unneccessary type from GroupAlsoByWindowViaWindowSet
tgroh Feb 17, 2016
1415479
Never drop late data in Reshuffle
dpmills Feb 18, 2016
0ad8ed1
BigtableIO: add a bounded source for Google Cloud Bigtable
dhalperi Feb 18, 2016
09d755d
ApproximateUnique[Test]: improve efficiency and cleanup test
dhalperi Feb 18, 2016
ede5bac
Refactor to use try-with-resources
swegner Feb 18, 2016
72e436f
StateSamplerTest: fewer repeats per reused counter name
dhalperi Feb 18, 2016
5708945
Add InProcessSideInputContainer
tgroh Feb 18, 2016
72f8387
Update FileIOChannelFactory.expand() to only return file resources
lukecwik Feb 18, 2016
9877acc
Add validation for single objects in GcsUtil.expand
lukecwik Feb 19, 2016
47eec4a
Rename StateContext to StateAccessor
peihe Feb 19, 2016
c6382a1
ByteKeyRange: parameterize expensive combinatorial tests
dhalperi Feb 19, 2016
4533b90
Migrate TextIO.Read to be a custom source
lukecwik Feb 19, 2016
966b8f0
Make earliest Watermark State available in CopyOnAccessState
tgroh Feb 19, 2016
af9b77c
Add ForwardingPTransform
tgroh Feb 19, 2016
2f627c3
Move test for post-GBK timestamp ordering to worker module
kennknowles Feb 19, 2016
ade6ae8
Move DataflowWorkerLoggingMDC to the worker module
kennknowles Feb 19, 2016
e0d7d0c
Rolling back since this breaks tests
lukecwik Feb 19, 2016
ba55083
Clear empty CopyOnAccessInMemoryStateInternals
tgroh Feb 19, 2016
9005cf5
Add Timers and State to InProcessTransformResult
tgroh Feb 19, 2016
f801524
Migrate TextIO.Read to be a custom source
lukecwik Feb 19, 2016
d9c4b1e
Remove InProcessPipelineRunner evaluator package
tgroh Feb 19, 2016
b968d0a
Proto2Coder: fix static modifier check
dhalperi Feb 19, 2016
bc857b8
Proto2Coder: minor cleanup
dhalperi Feb 20, 2016
8461818
Add InProcessTimerInternals
tgroh Feb 20, 2016
234d5eb
Move VerifyDynamicWorkRebalancing and dependents to worker module
kennknowles Feb 20, 2016
015e1af
Allow TransformEvaluatorFactory#forApplication to throw
tgroh Feb 20, 2016
0e77510
Add TypeDescriptors to Primitive PTransforms in Tests
tgroh Feb 20, 2016
d58b7db
Remove native Dataflow text reader
lukecwik Feb 22, 2016
209364e
Encode elements in InProcessCreate
tgroh Feb 22, 2016
c857afa
Expose base output file name on FileBasedSink
lukecwik Feb 22, 2016
7ff52a0
Move over Google Cloud Dataflow worker utilities to worker module
lukecwik Feb 22, 2016
2e171b5
Add new test proto messages that use map fields
dhalperi Feb 22, 2016
c11af5f
Migrate AvroIO.Write to a custom sink
lukecwik Feb 23, 2016
1d0c6d0
Move Google Cloud Dataflow worker utilities to worker module
lukecwik Feb 23, 2016
7871cbb
Port state to new, future-free API
kennknowles Feb 23, 2016
24288e7
Move some worker-and-example-only dependencies out of sdk
kennknowles Feb 23, 2016
c0a814b
Change visibility of FileBasedSource subclass methods and fix return …
lukecwik Feb 23, 2016
13a042a
Make TestPipeline slightly less DataflowPipelineRunner-centric
kennknowles Feb 23, 2016
6b372ec
Add used-but-undeclared dependency on google-http-client
kennknowles Feb 23, 2016
d7b5189
Migrate TextIO.Write to a custom sink
lukecwik Feb 23, 2016
9f546ef
Move Google Cloud Dataflow native sinks to worker module
lukecwik Feb 23, 2016
51068d1
Reverts "Move Google Cloud Dataflow native sinks to worker module"
sammcveety Feb 23, 2016
45f5951
Add the slf4j-jdk bridge to the integration tests
Feb 23, 2016
3904c90
Revert "Migrate TextIO.Write to a custom sink"
sammcveety Feb 23, 2016
635541a
Add KeyedWorkItemCoder
tgroh Feb 23, 2016
2e89a4b
Revert "Migrate AvroIO.Write to a custom sink"
sammcveety Feb 23, 2016
01a0da0
Resubmit "Migrate AvroIO.Write to a custom sink"
dhalperi Feb 24, 2016
96b02f4
Add GroupByKey InProcess override
tgroh Feb 24, 2016
6926d8e
Proto2Coder: recompute the extension registry when mutated
dhalperi Feb 24, 2016
6613031
Support CombineFnWithContext in GroupAlsoByWindows
peihe Feb 24, 2016
87b28e7
Add InProcess Override for CreatePCollectionView
tgroh Feb 24, 2016
db708bb
Update maven-dependency-plugin to latest
kennknowles Feb 24, 2016
01fd859
ProtoCoder: a Coder for Protocol Buffers Messages
dhalperi Feb 24, 2016
f7fc939
Add used-but-undeclared findbugs JSR305 dependencies
kennknowles Feb 24, 2016
d15d924
Handle PCollectionList.empty() in FlattenEvaluatorFactory
tgroh Feb 24, 2016
1cc0211
Set worker harness container image to INVALID until next release
herohde Feb 24, 2016
045e343
Handle Undeclared Side Outputs in ParDoInProcessEvaluator
tgroh Feb 24, 2016
639e9d9
Rollback revert "Migrate TextIO.Write to a custom sink"
lukecwik Feb 24, 2016
ca98da2
Update Timers and State in the InProcess ParDoEvaluator
tgroh Feb 25, 2016
7b28d23
Rollback reverts "Move Google Cloud Dataflow native sinks to worker m…
lukecwik Feb 25, 2016
8b5257f
Use a static variable for CoderCalled in WriteTest
tgroh Feb 25, 2016
510a55d
Honor user requested shard limits for AvroIO.Write on DirectPipelineR…
lukecwik Feb 25, 2016
6c71040
Adjust dependencies to avoid pulling in unneeded stax-api
kennknowles Feb 25, 2016
3eb3092
Handle multiple requests in InProcess Read Primitives
tgroh Feb 25, 2016
06c8911
Finish Flattenning InProcess package
tgroh Feb 25, 2016
fba9147
Ensure a TypedPValue has a Coder on finishSpecifying
tgroh Feb 25, 2016
3111646
Switch to the start state when lazily initializing
Feb 25, 2016
89e6241
Fix SDK deps and enable strict enforcement
kennknowles Feb 25, 2016
c290b5e
Fix worker dependencies and turn on strict checking
kennknowles Feb 25, 2016
d4dcaaa
Update worker harness container image
davorbonaci Feb 26, 2016
41e5cc9
Revert "Add a first README.md file (at least to trigger the github mi…
francesperry Feb 26, 2016
394390f
Dataflow code drop!
francesperry Feb 26, 2016
3623a23
Update README for initial code drop.
francesperry Feb 26, 2016
2efe761
Remove Google-specific contribution rules
francesperry Feb 26, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
26 changes: 26 additions & 0 deletions .gitattributes
@@ -0,0 +1,26 @@
# The default behavior, which overrides 'core.autocrlf', is to use Git's
# built-in heuristics to determine whether a particular file is text or binary.
# Text files are automatically normalized to the user's platforms.
* text=auto

# Explicitly declare text files that should always be normalized and converted
# to native line endings.
.gitattributes text
.gitignore text
LICENSE text
*.avsc text
*.html text
*.java text
*.md text
*.properties text
*.proto text
*.py text
*.sh text
*.xml text
*.yml text

# Declare files that will always have CRLF line endings on checkout.
# *.sln text eol=crlf

# Explicitly denote all files that are truly binary and should not be modified.
# *.jpg binary
16 changes: 16 additions & 0 deletions .gitignore
@@ -0,0 +1,16 @@
target/

# Ignore IntelliJ files.
.idea/
*.iml
*.ipr
*.iws

# Ignore Eclipse files.
.classpath
.project
.settings/

# The build process generates the dependency-reduced POM, but it shouldn't be
# committed.
dependency-reduced-pom.xml
35 changes: 35 additions & 0 deletions .travis.yml
@@ -0,0 +1,35 @@
language: java

sudo: false

notifications:
email:
recipients:
- dataflow-sdk-build-notifications+travis@google.com
on_success: change
on_failure: always

matrix:
include:
# On OSX, run with default JDK only.
- os: osx
env: MAVEN_OVERRIDE=""
# On Linux, run with specific JDKs only.
- os: linux
env: CUSTOM_JDK="oraclejdk8" MAVEN_OVERRIDE="-DforkCount=0"
- os: linux
env: CUSTOM_JDK="oraclejdk7" MAVEN_OVERRIDE="-DforkCount=0"
- os: linux
env: CUSTOM_JDK="openjdk7" MAVEN_OVERRIDE="-DforkCount=0"

before_install:
- if [ "$TRAVIS_OS_NAME" == "osx" ]; then export JAVA_HOME=$(/usr/libexec/java_home); fi
- if [ "$TRAVIS_OS_NAME" == "linux" ]; then jdk_switcher use "$CUSTOM_JDK"; fi

install:
- travis_retry mvn install clean -U -DskipTests=true

script:
- travis_retry mvn versions:set -DnewVersion=manual_build
- travis_retry mvn $MAVEN_OVERRIDE install -U
- travis_retry travis/test_wordcount.sh
202 changes: 202 additions & 0 deletions LICENSE
@@ -0,0 +1,202 @@

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

1. Definitions.

"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.

"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.

2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.

3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.

4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:

(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and

(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and

(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and

(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.

You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.

5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.

6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.

7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.

8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.

9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work.

To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.