Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7] Initial Dataflow code drop #1

Merged
merged 1,575 commits into from Feb 26, 2016
Merged

[BEAM-7] Initial Dataflow code drop #1

merged 1,575 commits into from Feb 26, 2016

Conversation

@francesperry
Copy link
Member

@francesperry francesperry commented Feb 26, 2016

Initial contribution of the Google Cloud Dataflow Java SDK to Apache Beam.

Caveat: There is still a lot to do before this becomes usable as Apache Beam. In particular:

  • Reorganize directories.
  • Incorporate additional drops by Google, Cloudera, and dataArtisans.
  • Make major backwards incompatible API changes.
  • Rename from Dataflow to Beam.

Beaming with joy ;-D

peihe and others added 30 commits Jan 15, 2016
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112105439
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112118850
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112173826
Users should not need to compare DataflowAssert objects on Java equality.
Instead, it's nearly always a broken test that will silently fail.

Throw an UnsupportedOperationException instead, and direct users to
isEqualTo (Singleton) or containsInAnyOrder (Iterable).

This change caught a broken test.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112200184
Generalize the 'game' example BigQuery write classes to take a map that specifies how
to generate the output fields.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112253306
Some tools don't support .zip in the class path.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112261905
gcloud moved where it stores the credentials configured on the command line.
Since there is still no support in standard libraries to get the default
project, update DefaultProjectFactory to support the new location.

Note that users who have not upgraded gcloud are still supported.

----Release Notes----
The DataflowPipelineRunner will now prefer the default project configuration
produced by newer versions of the gcloud utility. Users with old gcloud clients
are still supported.
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112281533
From: http://stackoverflow.com/a/4023351/1715495

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112287922
Fix Javadoc issue in HourlyTeamScore pipeline.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112311676
initializationStateLock should be held for short, bounded amounts of time,
because it is acquired on the dynamic work rebalancing code path
(requestDynamicSplit) which must be effectively non-blocking.
NativeReader.iterator() can do I/O and thus can take unbounded amount
of time, so it shouldn't be done under the lock.
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112375806
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112415033
----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112466110
----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112480742
This resolves the user issue on SO:
http://stackoverflow.com/questions/34780459/runtimeexception-from-cloud-dataflow-related-to-serializing-coder
Since Jackson 2.3, TypeIdResolvers were meant to implement
this method since typeFromId(String) became deprecated.
This newer versions of Jackson enforce this.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112487029
Custom unbounded readers are read in bundles of at most
10k elements or 10 seconds. A recent change accidentally removed
the 10k element limit. This change reintroduces it and
adds a test.

The previous test also was passing vacuously because
the iteration limit was incorrect (it would always
have only one iteration).
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112723469
Adapt join-library module to be able to upload to maven-central
Updating version numbers from 1.4.0-SNAPSHOT to 1.5.0-SNAPSHOT

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=113022038
As in 6a11a72, this makes BigQueryIO.Read work in the
DirectPipelineRunner as it does in the DataflowPipelineRunner.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112496161
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112515243
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112529131
Also updates /heapz so that it downloads the heapdump rather than just
telling you where on the worker it is.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112535088
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112546981
This is a deterministic coder for ByteString. In the
wholeStream context, it simply writes the string. Otherwise,
it writes the string delimited with its length (encoded as a
VarInt).

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112586805
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112587034
Users who check out and edit the SDK in Eclipse should
use m2e's Eclipse import wizard, and should not want to
commit their actual project configurations.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112597945
charlesccychen pushed a commit to cosmoskitten/beam that referenced this issue Aug 19, 2019
It turned out we tested nothing relevant with the additional 2 tests
that are deleted. The Combine test maps values of different sizes to
Long value. The deleted tests had only different initial size so, in
practice, they were testing the same thing as test apache#1.
charlesccychen pushed a commit to cosmoskitten/beam that referenced this issue Aug 19, 2019
It turned out we tested nothing relevant with the additional 2 tests
that are deleted. The Combine test maps values of different sizes to
Long value. The deleted tests had only different initial size so, in
practice, they were testing the same thing as test apache#1.
VrishaliShah pushed a commit to VrishaliShah/beam that referenced this issue Feb 5, 2020
steveniemitz referenced this issue in twitter-forks/beam May 27, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam May 28, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam May 28, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam May 29, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam May 29, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam Jun 10, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam Jul 1, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
saavannanavati added a commit to saavannanavati/beam that referenced this issue Aug 21, 2020
udim pushed a commit that referenced this issue Aug 25, 2020
…odule of the Python SDK (#12657)

* Add myself to authors

* Add blog post #1: improved annotation support

* Add draft of blog post #2: performance runtime type checking

* Finish blog post #2

* Remove white space

* Resolve PR comments

Co-authored-by: Saavan Nanavati <saavan.nanavati@utexas.edu>
steveniemitz referenced this issue in twitter-forks/beam Sep 16, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
ibzib added a commit to ibzib/beam that referenced this issue Sep 30, 2020
…odule of the Python SDK (apache#12657)

* Add myself to authors

* Add blog post apache#1: improved annotation support

* Add draft of blog post apache#2: performance runtime type checking

* Finish blog post apache#2

* Remove white space

* Resolve PR comments

Co-authored-by: Saavan Nanavati <saavan.nanavati@utexas.edu>
steveniemitz referenced this issue in twitter-forks/beam Nov 7, 2020
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
nikie added a commit to nikie/beam that referenced this issue Nov 17, 2020
pabloem pushed a commit that referenced this issue Dec 1, 2020
…ovider for Python SDK

* Support for NestedValueProvider for Python SDK

* Fix typo

* Update CHANGES.md

* Update value_provider_test.py

* Fix NestedValueProvider docstrings. (#1)

* Fix isort and doc errors. (#2)

* Update CHANGES.md

Co-authored-by: Eugene Nikolaiev <eugene.nikolayev@gmail.com>
kennknowles pushed a commit that referenced this issue Jan 25, 2021
pabloem pushed a commit that referenced this issue Feb 17, 2021
Debeziumio PoC (#7)

* New DebeziumIO class.

* Merge connector code

* DebeziumIO and MySqlConnector integrated.

* Added FormatFuntion param to Read builder on DebeziumIO.

* Added arguments checker to DebeziumIO.

* Add simple JSON mapper object (#1)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

Co-authored-by: osvaldo-salinas <osvaldo.salinas@wizeline.com>
Co-authored-by: Carlos Dominguez <carlos.dominguez@carlos.dominguez>
Co-authored-by: Carlos Domínguez <carlos.dominguez@wizeline.com>

* Add debeziumio tests

* Debeziumio testing json mapper (#3)

* Some code refactors. Use a default DBHistory if not provided

* Add basic tests for Json mapper

* Debeziumio time restriction (#5)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

* Some code refactors. Use a default DBHistory if not provided

* Adding based-time restriction

Stop polling after specified amount of time

* Add basic tests for Json mapper

* Adding new restriction

Uses a time-based restriction

* Adding optional restrcition

Uses an optional time-based restriction

Co-authored-by: juanitodread <juanitodread@gmail.com>
Co-authored-by: osvaldo-salinas <osvaldo.salinas@wizeline.com>

* Upgrade DebeziumIO connector (#4)

* Address comments (Change dependencies to testCompile, Set JsonMapper/Coder as default, refactors) (#8)

* Revert file

* Change dependencies to testCompile
* Move Counter sample to unit test

* Set JsonMapper as default mapper function
* Set String Coder as default coder when using JsonMapper
* Change logs from info to debug

* Debeziumio javadoc (#9)

* Adding javadoc

* Added some titles and examples

* Added SourceRecordJson doc

* Added Basic Connector doc

* Added KafkaSourceConsumer doc

* Javadoc cleanup

* Removing BasicConnector

No usages of this class were found overall

* Editing documentation

* Debeziumio fetched records restriction (#10)

* Adding javadoc

* Adding restriction by number of fetched records

Also adding a quick-fix for null value within SourceRecords
Minor fix on both MySQL and PostgreSQL Connectors Tests

* Run either by time or by number of records

* Added DebeziumOffsetTrackerTest

Tests both restrictions: By amount of time and by Number of records

* Removing comment

* DebeziumIO test for DB2. (#11)

* DebeziumIO test for DB2.

* DebeziumIO javadoc.

* Clean code:removed commented code lines on DebeziumIOConnectorTest.java

* Clean code:removing unused imports and using readAsJson().

Co-authored-by: Carlos Domínguez <74681048+carlosdominguezwl@users.noreply.github.com>

* Debezium limit records (now configurable) (#12)

* Adding javadoc

* Records Limit is now configurable

(It was fixed before)

* Debeziumio dockerize (#13)

* Add mysql docker container to tests

* Move debezium mysql integration test to its own file

* Add assertion to verify that the results contains a record.

* Debeziumio readme (#15)

* Adding javadoc

* Adding README file

* Add number of records configuration to the DebeziumIO component (#16)

* Code refactors (#17)

* Remove/ignore null warnings

* Remove DB2 code

* Remove docker dependency in DebeziumIO unit test and max number of recods to MySql integration test

* Change access modifiers accordingly

* Remove incomplete integration tests (Postgres and SqlServer)

* Add experimenal tag

* Debezium testing stoppable consumer (#18)

* Add try-catch-finally, stop SourceTask at finally.

* Fix warnings

* stopConsumer and processedRecords local variables removed. UT for task stop use case added

* Fix minor code style issue

Co-authored-by: juanitodread <juanitodread@gmail.com>

* Fix style issues (check, spotlessApply) (#19)

Co-authored-by: Osvaldo Salinas <osvaldo.salinas@osvaldo.salinas>
Co-authored-by: alejandro.maguey <alejandro.maguey@wizeline.com>
Co-authored-by: osvaldo-salinas <osvaldo.salinas@wizeline.com>
Co-authored-by: Carlos Dominguez <carlos.dominguez@carlos.dominguez>
Co-authored-by: Carlos Domínguez <carlos.dominguez@wizeline.com>
Co-authored-by: Carlos Domínguez <74681048+carlosdominguezwl@users.noreply.github.com>
Co-authored-by: Alejandro Maguey <alexmaguey1@gmail.com>
Co-authored-by: Hassan Reyes <hassanreyes@users.noreply.github.com>

Add missing apache license to README.md

Enabling integration test for DebeziumIO (#20)

Rename connector package cdc=>debezium. Update doc references (#21)

Fix code style on DebeziumIOMySqlConnectorIT
steveniemitz referenced this issue in twitter-forks/beam Mar 11, 2021
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
steveniemitz referenced this issue in twitter-forks/beam Apr 26, 2021
Co-authored-by: steve <sniemitz@twitter.com>
Co-authored-by: Kanishk Karanawat <kkaranawat@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet