Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions #12489

Merged
merged 18 commits into from
Aug 12, 2020

Conversation

pabloem
Copy link
Member

@pabloem pabloem commented Aug 6, 2020

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels

See CI.md for more information about GitHub Actions CI.

@pabloem
Copy link
Member Author

pabloem commented Aug 7, 2020

retest this please

@pabloem pabloem changed the title Bq no insert ids [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions Aug 7, 2020
@pabloem
Copy link
Member Author

pabloem commented Aug 8, 2020

Run Python PreCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 8, 2020

Run Python2_PVR_Flink PreCommit

1 similar comment
@pabloem
Copy link
Member Author

pabloem commented Aug 8, 2020

Run Python2_PVR_Flink PreCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 8, 2020

Run Python PreCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 10, 2020

Run Python 3.8 PostCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 10, 2020

postcommit failure only in x-lang tests
happy to add an integration test for this functionality if you think that's a good idea @chamikaramj

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

sdks/python/apache_beam/io/gcp/bigquery.py Outdated Show resolved Hide resolved
sdks/python/apache_beam/io/gcp/bigquery.py Outdated Show resolved Hide resolved
@pabloem
Copy link
Member Author

pabloem commented Aug 11, 2020

Run Python 3.8 PostCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 11, 2020

Run Python PreCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 11, 2020

Run Python 3.8 PostCommit

@pabloem
Copy link
Member Author

pabloem commented Aug 12, 2020

Run Portable_Python PreCommit

@pabloem pabloem merged commit 36cf935 into apache:master Aug 12, 2020
@pabloem
Copy link
Member Author

pabloem commented Aug 12, 2020

merging as failures are unrelated, and intending to get this into release branch. sorry about that.

@pabloem pabloem deleted the BQ-no-insert-ids branch August 12, 2020 20:13
manesioz added a commit to manesioz/beam that referenced this pull request Aug 18, 2020
* [BEAM-9421] Add Java snippets to NLP documentation.

* [BEAM-9980] add groovy functions for python versions

* [BEAM-9980] update dataflow test-suites to switch python versions using in tests

* [BEAM-10599] Add documentation about CI on GitHub Action (apache#12405)

[BEAM-10599] Add documentation about CI on GitHub Action (apache#12405)

* Fix link for S3FileSystem (apache#12450)

Link to S3FileSystemRegistrar was incorrectly pointing at the Hadoop filesystem package.

* [BEAM-7390] Add min code snippets

* [BEAM-7390] Add max code snippets (apache#12409)

[BEAM-7390] Add max code snippets (apache#12409)

* [BEAM-7390] Add mean code snippets (apache#12437)

[BEAM-7390] Add mean code snippets (apache#12437)

* [BEAM-10499] Adds a descriptive toString to SamzaRunner KeyedTimerData

* [BEAM-9839] OnTimerContext should not create a new one when processing each element/timer in FnApiDoFnRunner (apache#12391)

* Fix dictionary changes size error in pickler.py (apache#12458)

* [BEAM-9891] TPC-DS module initialization, tables and queries stored (apache#12436)

* [BEAM-9891] TPC-DS module init, table schemas and queries stored

* Address comments.

* Fix SDF/Process input id.

* Interactive: clean up when pipeline is out of scope (apache#12339)

* Interactive: clean up when pipeline is out of scope

1. Completed the cleanup routine for all internal states held by the
   current interactive environment.
2. Utilized the environment inspector to determine whether a pipeline is
   out of scope: not assigned to variable and has no inspectable
   PCollections.
3. Invoked the cleanup every time the user defined pipelines in watched
   scope are refreshed.

Change-Id: Ia0791b865def88e81e7b1595b8430d3a9df9516e

* Fixed a test that didn't start a test stream server. With the new cleaning up routine, all test stream servers held by current interactive environment will be stopped in the test. If the grpc server has never been started (happens in tests), the stop operation will hang for a long time.

Change-Id: I2ae7ecf5e3ac11f32888887d82cd885fc64cc82f

Co-authored-by: Ning Kang <ningk@google.com>

* Merge pull request apache#12331 [BEAM-10601] DICOM API Beam IO connector

* First commit, after modifying codes based on design doc feedbacks 7/20

* fix some comments

* fix style and add license

* fix style lint

* minor fix

* add pagination support

* add file path support to storeinstance

* fix some typos

* removed path support and added fileio supports

* fix bug in client

* add unit tests

* Update dicomio_test.py

fix typo

* fix patching

* remove non-Non-ASCII character

* add google.auth support and fix client

* try inject dependency

* roll back injection

* add dependency

* change place to inject

* change the order

* fix typos and pydocs

* fix style

* fix annoying style

* Add concurrent support

* fixed bugs and docs style, added custom client supports, timestamp recording,  and flush tests

* fix py2 support issues

* fix some minor bugs

* fix style and modify tests

* fix format

* fix test skip

* Update sdks/python/apache_beam/io/gcp/dicomio.py

Co-authored-by: Pablo <pabloem@users.noreply.github.com>

* Update sdks/python/apache_beam/io/gcp/dicomio.py

Co-authored-by: Pablo <pabloem@users.noreply.github.com>

* Update sdks/python/apache_beam/io/gcp/dicomio.py

Co-authored-by: Pablo <pabloem@users.noreply.github.com>

* Update sdks/python/apache_beam/io/gcp/dicomio.py

Co-authored-by: Pablo <pabloem@users.noreply.github.com>

* function name change

Co-authored-by: Pablo <pabloem@users.noreply.github.com>

* [BEAM-10631] Fix performance of Schema#indexOf (apache#12456)

Schema#indexOf uses String.format to prepare error message.
It causes performance issues if schema has options, because
Schema.Options#toString allocates TreeMap.

Use formatter built-in into Preconditions that is lazy, and
doesn't call Schema#toString unless needed.

* [BEAM-10289] Go Dynamic splitting full implementation.

Includes core behavior, documentation, and tests.

* [BEAM-10289] Avoiding blocking when dynamic splitting in Go.

Adds a timeout/default case when getting the splittable unit, a blocking
operation. Also includes some fixup.

* Edit lesson name and task description

* [BEAM-7390] Add sample code snippets

* Fix add field method in SQL walkthrough

* [BEAM-10470] Handle null state from waitUntilFinish

* [BEAM-10545] HtmlView module

1. Added a HtmlView module to render given HTML and execute given
   scripts from the provider model.
2. Integrated react framework to the jest testing framework and eslint.
3. Added line length limit (80) to eslint and prettier. Note prettier's
   printWidth is not a hard limit as max-len rule. Sometimes, the code
   needs to be written in a specific way to meet both eslint and
   prettier. An example, long strings (if not url) should be broken into
   suitable pieces.
4. The jlpm(yarn/npm) installations are:
   jlpm add --dev react-dom @types/react-dom eslint-plugin-react
5. @types/react is resoluted to the version bundled with jupyter
   apputils.

* documentation(xlang-java): updating docs as per comments

* documentation(xlang-python): updating docs as per comments

* documentation(xlang-python): resolving ascii errors

* Merge pull request apache#12203 from [BEAM-6928] Make Python SDK custom Sink the default Sink for BigQuery

* Making Beam sink the default sink for BigQuery

* Sharing the Beam sink update in CHANGES.md

* Fixing precommit

* Fix formatter

* addressing comments

* [BEAM-10635] Fix forward the google-api-core version

1. For the Python container base image, moved the google-api-core
   version from 1.20.0 to 1.21.0 since the google-cloud-bigquery 1.26.1
   has requirement google-api-core<2.0dev,>=1.21.0.
2. The google-cloud-bigquery version advancement happened in commit:
   apache@a315672

* [BEAM-7996] Add map & nil encoding to Go SDK.

* [BEAM-10618] subprocess_server.py: Fallback to AF_INET6 family when finding free port (apache#12438)

* Use IPv6 socket when finding free port

* Only use AF_INET6 as a fallback

* Add ElementLimiters to all Cache Managers.

Change-Id: I093e1dc1f99c31b1e0bb868fc3bcfc507bd24c8d

* [BEAM-10629] Added KnownBuilderInstances to ExternalTransformRegistrar (apache#12454)

Co-authored-by: Scott Lukas <slukas@google.com>

* [BEAM-10637] fix: test stream service start/stop (apache#12464)

[BEAM-10637] fix: test stream service start/stop (apache#12464)

Fixed the start/stop logic so that a controller:

1. can only be started/stopped once;
2. can only be stopped when is started while never stopped before;
3. does not hang indefinitely but noop for redundant or invalid start/stop calls.

Co-authored-by: Ning Kang <ningk@google.com>

* Migrate shared tag from tfx-bsl (apache#12468)

Migrate shared tag from tfx-bsl (apache#12468)

* [BEAM-10633] UdfImpl should be able to return java.util.List.

* [BEAM-10543] Add new parameters to Kafka Read cross language configuration

* [BEAM-10543] Add new parameters to python wrapper of Kafka Read

* [BEAM-10543] Modify Kafka python cross-language integration test to use new parameters

* [BEAM-10543] Run Kafka cross-language integration test in python postcommit suite instead of separate task

* Merge pull request apache#12149: [BEAM-9897] Add cross-language support to SnowflakeIO.Read

* [BEAM-9897] improve credentials mechanism

* [BEAM-9897] add xlang support for SnowflakeIO.read

* [BEAM-9897] fix: python lint

* [BEAM-9897] refactor: revert auth mechanism and add missing docs

* [BEAM-9897] feat: add custom expansion-service

* [BEAM-9897] fix: CI

* Merge pull request apache#12151: [BEAM-9896] Add streaming for SnowflakeIO.Write to Java SDK

* [BEAM-9896] Added Snowflake streaming write with debug mode and unit tests.

* [BEAM-9896] Removed default /data directory in Snowflake write and added parametrized quotation mark.

* [BEAM-9896] Changed default name.

* [BEAM-9896] Added enum for streaming log level.

* [BEAM-9896] Spotless Apply

* [BEAM-9896] Updated javadocs, error messages and parsing filenames in streaming

* [BEAM-9896] Updated CHANGES.md

* [BEAM-9896] Added final keyword to parameters in Snowflake Batch and Streaming configs

* Simplify common patterns for pandas methods.

* Use new infrastructure to simplify pandas implementation.

* [BEAM-9615] Add initial Schema to Go conversions.

* [BEAM-7390] Add sum code snippets

* Update nexmark dashboard links.

* [BEAM-7996] Add support for MapType and Nulls in container types for Python RowCoder (apache#12426)

* Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* Don't specify nested in row coder tests

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* Don't mutate value when reading rows

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* Python: Don't mutate value

* fixup! Add support for encoding Maps and Nulls (in container types) in Python RowCoder

* [BEAM-10646] Remove SparkPortableExecutionTest.testExecution.

Everything SparkPortableExecutionTest.testExecution tests is already covered by validates runner tests.

* [BEAM-10646] Don't wait for test to time out if pipeline fails.

* Additional edits to task description

* [BEAM-10648] Remove unused BigQuery queryTempDataset value

* [BEAM-10258] Support type hint annotations on PTransform's expand() (apache#12009)

* [BEAM-10258] Support type hint annotations on PTransform's expand()

* Fixup: apply YAPF

* Moving PCollectionTypeConstraint to typehints.py

* Uses Generic[T] instead of PCollectionTypeConstraint

* Fixup: apply YAPF

* Remove unused imports

* Force user to wrap typehints in PCollections

* Add unit tests for various usages of typehints on PTransforms

* Add tests that use typehints on real pipelines

* Fixup: apply YAPF

* Fix bad merge

* Support PDone, PBegin, and better handling of error cases

* Fix test syntax

* Refactors strip_pcoll_input() and strip_pcoll_output() to a shared function

* Add unit tests

* Add more tests

* Add website documentation

* Fix linting issues

* Fix linting issue by using multi-line function annotations

* Fix more lint errors

* Fix import order, and other changes for PR

* Fix ungrouped-imports error

* Alphabetically order the imports

* Fixup: apply YAPF

* Fixes a bug where a type can have an empty __args__ attribute

* Fix bug in website snippet code

* Fixup: apply YAPF

* Fixup: apply YAPF

* Fix NoneType error

* Fix NoneType error part 2

* Use classes instead of strings during typecheck, and add tests

* Resolve circular import error and fix readability issues

* Fix lint errors

* Add back accidentally removed test

* Support None as an output annotation

* Show incorrect type in error message

Co-authored-by: Udi Meiri <udim@users.noreply.github.com>

* Allow Pipeline as an input

* Fix import bug

* Alphabetically order imports inside function (but really this is just to force re-run the tests)

* Display warning instead of throwing error for oddly formed type hints

* Convert to Beam types

* Add test for generic TypeVars

* Fix bug by skipping DoOutputsTuple

* Fix typo

* Add test for DoOutputsTuple

* Fix lint errors

Co-authored-by: Udi Meiri <udim@users.noreply.github.com>

* [BEAM-10522] Added SnowflakeIO connector guide (apache#12296)

* Added SnowflakeIO connector guide

* Snowflake IO Connector guide - added requested changes

* Adjusted documentation to the version that is available in Apache Beam

* Added link to guide for SnowflakeIO documentation

* Added link in the table of contents for Snowflake I/O connector

Co-authored-by: Sławomir Andrian <>

* Update stepik course

* [BEAM-10240] Support ZetaSQL DATETIME functions in BeamSQL (apache#12348)

* [BEAM-9615] Improve error handling on schema conv

* Updating changes.md (apache#12424)

Updating changes.md (apache#12424)

* [BEAM-10645] Create context for allowing non-parallel dataframe operations. (apache#12476)

* [BEAM-10647] Fixes get_query_location bug in BigQueryWrapper

* removed duplicate test

* fixed typo in comments

* removed duplicate tests

* [BEAM-10630] Check load tests as part of the release process (apache#12455)

In the past, we have seen performance regressions in releases. We should make
sure that the release guide includes checking available performance
measurements.

* Revert "Merge pull request apache#12408: [BEAM-10602] Display Python streaming metrics in Grafana dashboard"

This reverts commit cdc2475, reversing
changes made to 835805d.

Revert "Merge pull request apache#12451: [BEAM-10602] Use python_streaming_pardo_5 table for latency results"

This reverts commit 2f47b82, reversing
changes made to d971ba1.

* [BEAM-7390] Add top code snippets (apache#12482)

[BEAM-7390] Add top code snippets (apache#12482)

* GH-Actions workflow checks are GCP variables set [depends on BEAM-10599] (apache#12381)

GH-Actions workflow checks are GCP variables set [depends on BEAM-10599] (apache#12381)

* [BEAM-10662] Fix GCP variable check in build python wheels workflow

* fix precommit errors (apache#12500)

* Support NULL query parameters in ZetaSQL and fix nullable ARRAY bug

* Add max count to utils.to_element_list

Change-Id: I9a2fbf1532b3d22a612e7a09f4f1fb2b9635c40b

* Merge pull request apache#12473 from [BEAM-10601] DICOM API Beam IO connector e2e test

* add integration test

* fix lint

* fix style and update changes.md

* resolved comments and fix dependency bug

* fix dependency

* fix dependency

* add documentation, use the right storage  and reduce number of pipelines.

* add comment

* modify client a little

* [BEAM-10653] Modularize BeamSqlDslUdfUdafTest.

Split tests with multiple, independent branches into separate test
cases.

The only part removed was in testUdaf. The two cases in testUdaf were
different in the past, but converged at some point.
https://github.com/apache/beam/blob/d8ff78b65bbe7e3a2239249f034a538ca65b0706/dsls/sql/src/test/java/org/apache/beam/dsls/sql/BeamSqlDslUdfUdafTest.java#L52

* [BEAM-10619] Report ratio of implemented pandas tests (apache#12440)

* pandas_doctest_test now logs a report about the number of skipped vs wont implement vs passing tests

* wont_implement_ok

* fix tests

* yapf

* [BEAM-10289] Fixing bug in Go harness split response.

Should be checking for nil-ness in the primary/residual elements.

* [BEAM-9977] Implement ReadFromKafkaViaSDF

* [BEAM-8460] Exclude category containing failing tests for spark/flink to restore green test signal. (apache#12503)

* OrderedListState API

* [BEAM-10656] Enable bundle finalization within the Java direct runner. (apache#12488)

* [BEAM-10656] Enable bundle finalization within the Java direct runner.

This is towards making all UnboundedSources execute as splittable dofns within the direct runner using the SDF unbounded source wrapper since it relies on bundle finalization to handle checkpoints.

* Move value conversion logic out of ExpressionConverter

* Simplify ZetaSqlBeamTranslationUtils

* [BEAM-10471] change the test condition for testEstimatedSizeBytes to greater than 0 to ensure that the dataset is a least split.

* BEAM-10668 - Replace toLowerCase().equals() with equalsIgnoreCase

* [BEAM-10361] upgrade Kotlin version in example (apache#12497)

* [BEAM-9558] Remove usage of empty data/timers to signify last.

* Upgrade to ZetaSQL 2020.08.1

* [BEAM-9891] Added ZetaSQL planner support and uploaded 100G data (apache#12502)

* [BEAM-9891] Added ZetaSQL planner support

* [BEAM-9891] Added ZetaSQL planner support and uploaded 100G data

* Added instruction comment to run query96 using ZetaSQL

Co-authored-by: Yuwei Fu <fuyuwei@google.com>

* [BEAM-10300] Improve JdbcIOTest.testFluentBackOffConfiguration stability (apache#12517)

* Move commitThread control out of try

* Commit connection on main thread after pipeline has finished.

* [BEAM-10672] Added streaming option to combine test

* [BEAM-10672] Added experimental dataflow param

* [BEAM-10672] Spottless Apply

* [BEAM-10672] Updated streaming commands

* [BEAM-601] Run KinesisIOIT with localstack (apache#12422)

* [BEAM-601] Run KinesisIOIT withtestcontainers with localstack

* [BEAM-601] Add kinesis integration test to Java postcommit

* Fixes after Alexey's code review

* Extending archiveJunit post-commit task with stability history

* Moving /tmp directory cleanup of CI workers to Inventory Jenkins job

* [BEAM-10676] Use the fire timestamp as the output timestamp for timers

By default, the Python SDK adds a timer output timestamp equal to the current
timestamp of an element. This is problematic because:

1. We hold back the output watermark on the current element's timestamp for
   every timer
2. It doesn't match the behavior in the Java SDK which defaults to using the
   fire timestamp as the timer output timestamp (and adds a hold on it)
3. There is no way for the user to influence this behavior because there is no
   user-facing API

We should use the fire timestamp as the default output timestamp.

* Workaround of AutoValueSchema doesn't work with SchemaFieldName

* [BEAM-10572] Eliminate nullability errors from :sdks:java:extensions:sql:datacatalog (apache#12366)

* [BEAM-7996] Add Python SqlTransform test that includes a MAP input and output (apache#12530)

* [BEAM-10679] improving XLang KafkaIO streaming test

* fix formatting

* [BEAM-10663] Disable python kafka integration tests (apache#12526)

* [BEAM-10663] Disable python kafka integration tests

* BEAM-10633 -> BEAM-10663

Co-authored-by: Brian Hulette <hulettbh@gmail.com>

* Change SqlAnalyzer code to use the updated ZetaSQL API

* Use primitive string[] to replace Array<string> type

* Merge pull request apache#12485 from [BEAM-6064] Improvements to BQ streaming insert performance

* [BEAM-6064] Improvements to BQ streaming insert performance

* Fixup

* Fixup

* Fixup

* Fixup

* fixup

* fixup

* [BEAM-8125] Add verifyDeterministic test to SchemaCoderTest (apache#12521)

* Add verifyDeterministic test to SchemaCoderTest

* Use verifyDeterministic

* [BEAM-10571] Use schemas in ExternalConfigurationPayload (apache#12481)

* Update external_transforms.proto to use schemas, implement in Python and Java

* Use map in xlang KafkaIO, Update KafkaIOExternalTest

* Update PubsubIOExternalTest

* Update XVR

* spotless

* spotbugs

* Remove byte array comments

* [BEAM-10644] Mark Beam 2.24.0 as the last release with Py2 and Py35 support. (apache#12525)

* [BEAM-10681] Set metrics supported in Spark portable runner.

* Use new ZetaSQL value create API (apache#12536)

* fix incorrect coder issue

* use Create

* Use input timestamp as the output timestamp for processing timers

* Add test

* lint

* [BEAM-7705] Add BigQuery Java samples (apache#12118)

* [BEAM-10602] Add latency/checkpoint duration as separate panels in ParDo Load Test

The Flink streaming tests were reported in a separate table and made avaible
through this dashboard: https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056

Turns out, this is not optimal for the new Grafana-based dashboard. We have to
change the table name because the query capability of InfluxDb is very limited.

This way the results will be shown together with the other Runners' load test results.

* [BEAM-10686] Simplify names for GCP variables check in GA

* Fix format string in PipelineValidator (apache#12522)

* Fix some typos (apache#12539)

* Linkage Checker 1.5.0 (apache#12545)

* [BEAM-9980] version switchable dataflow tasks to be invoked

* [BEAM-9980] :sdks:python:test-suites:dataflow included in settings.gradle

* [BEAM-10688] Euphoria assumes that all type descriptors are resolvable to coders (and serializable) which isn't true.

Euphoria should be propagating forward coders and not type descriptors as a longer term solution.
I also fixed the tests to specify @RunWith because in certain scenarios JUnit may fail to detect the tests.

* [BEAM-10670] Use fraction of remainder if consumed fraction is unknown

This prevents an NPE for UnboundedSources that returned null.

* [BEAM-10670] Improve splitting logic to prefer splits upto the the desired number of splits and also remove check that isn't possible since the restriction could be done unknowingly with the last tryClaim call.

* [BEAM-10670] Fix passing forward the self-checkpoint from the UnboundedSource to respect the contract of tryClaim

* [BEAM-10684] Fix jdbc cross-language transform (apache#12543)

* [BEAM-10610] Clean logging cruft from loop back.

* Follow the same way that BigQuery handles unspecified or duplicate ZetaSQL STRUCT field name (apache#12550)

* Merge pull request apache#12489 from [BEAM-6064] Add an option to avoid insert_ids on BQ in exchange for faster insertions

* [BEAM-6064] Improvements to BQ streaming insert performance

* Fixup

* Fixup

* Adding support for high-throughput lower-guarantees BQ inserts

* Fixup

* Fixup

* Fixup

* Fixup

* fixup

* fixup

* Fixup

* Renaming option for consistency with Java

* fixup

* Fix BCJ to stop caching when the cache signature has changed.

Change-Id: I52a4f36b09e6f6899d7c59756c9702ba983e083b

* [BEAM-10289] Adding required transform ID to Go channel split.

* [BEAM-2762] Generate Python coverage reports during pre-commit (apache#12257)

* Add pytest-cov dependency to the test dependencies

* Adds call to coverage / codecov integration

* Move codecov to deps

* Move pytest-cov to py38-cloud env

* Add py38-cloud-coverage env

* Remove unused cover env, and add py38-cloud-coverage to envlist

* Add task to gradle

* Only runs testPy38CloudCoverage

* Add dependsOn to gradle

* Move to py38 gradle

* Fix Gradle error

* Fix syntax error

* Move task definition to common

* Change suffix

* Fix naming bug

* Add comment to trigger tests

* Use environmental variable for the Codecov token

* Fix bug that evicting computed PCollections was changing list while iterating.

Change-Id: I271efeef53f99c8083a6d37b89085cddd63bf56a

* Enable dataflow streaming engine when running runner_v2 and streaming.

* Fix formatter.

* Moving to 2.25.0-SNAPSHOT on master branch.

* [BEAM-10694] Work around serialization issue with ReaderContext by memoizing the serialized form and propagating it forward. (apache#12556)

* [BEAM-10694] Work around serialization issue with ReaderContext by memoizing the serialized form and propagating it forward.

* fixup! Address spotbugs failure

* [BEAM-10672] Fixes after review in combine python load test

* [BEAM-9680] Add Filter with ParDo lesson to Go SDK Katas (apache#12506)

* Import WordExtractingDoFn from wordcount_with_metrics

streaming wordcount tests relies on the output counters

* [BEAM-9547] Lift associative aggregations. (apache#12469)

* Merge pull request apache#12427 from [BEAM-2855] nexmark python suite implement queries 0, 1, 2 and 9

* changed parser and serialization code to use the same json format to represent models, added auctionprice model and corrected the behavior of query2, corrected the behavior of query 0 and 1 to align with the nexmark specification. created fieldname file to map the fieldname with string literals. created nexmark_query_util to put transforms that gets reused across different queries.

* changed repr and aligned code style.
Changed repr for all models to eliminate spaces in their json string representation; aligned code style for some files

* refactoring code to have a better code practice

* implemented coder for all models, query9

* yapf style, implemented query0 to use the coder

* yapf style change

* naming changes

* nexmark launcher deserialization, timestamping and result counting and printing to file

* pylint style changes

* fieldname changes to comply with python convention

* resolve issues brought up in code review

* resolve py2.7 lint error for import and python3 type hint

* fix 3.7 lint errors

* resolve issues brought up in code review

* better formatting for nexmark_util docs

* exported StreamCoderImpl to fix issue with pydoc

Co-authored-by: Leiyi Zhang <leiyiz@google.com>

* [BEAM-10500] Make KeyedTimerDataCoder encode output timestamp (apache#12535)

* Extending ApproximateQuantiles functionality to deal with non-uniform weights. (apache#12420)

* Extending ApproximateQuantiles functionality to deal with non-uniform weights.

* Extending ApproximateQuantiles functionality to deal with non-uniform weights.

* Extending ApproximateQuantiles functionality to deal with non-uniform weights.

* Extending ApproximateQuantiles functionality to deal with non-uniform weights.

* Added example to ApproximateQuantiles docstring, made weighted argument the last and modified _interpolate for readability.

* Added example to ApproximateQuantiles docstring, made weighted argument the last and modified _interpolate for readability.

* Added example to ApproximateQuantiles docstring, made weighted argument the last and modified _interpolate for readability.

* Added example to ApproximateQuantiles docstring, made weighted argument the last and modified _interpolate for readability.

* Added example to ApproximateQuantiles docstring, made weighted argument the last and modified _interpolate for readability.

* fix logic issue in metric name namespace filtering (apache#12570)

* fix logic issue in metric name namespace filtering

* refactor boolean logic to be more concise

Co-authored-by: Leiyi Zhang <leiyiz@google.com>

* Fix Py3 incompatibility in stager.py.

* Better error on BQ schema parse (apache#12549)

* Better error on BQ schema parse

* Better error on BQ schema parse

* Fixup

* [BEAM-10702] Do not implicitly decompress artifacts

* Adds a Julia Set test on portable local runner

* Address review comments

* Use unbounded wrapper for Kafka Read.

* [BEAM-10691] Use FlinkStateInternals#addWatermarkHoldUsage for timer output timestamp

* [BEAM-9615] Map user types to Schema reps. (apache#12554)

* fixed a typo in S3TestUtils (apache#12582)

applied spotless

* [BEAM-10612] Add flink 1.11 runner

- Fix deprecated OperatorStateStore.getOperatorState => getListState
- Fix WindowedValue OutputTag in DoFnOperatorTest and ExecutableStageDoFnOperatorTest
- Fix FlinkStreamingTransformTranslatorsTest
- Fix SourceTransformation => LegacySourceTransformation rename
- Fix timeServiceManager access change in AbstractStreamOperator
- Fix RemoteMiniClusterImpl RPC service port work around
- Abstract version specific env and PackagedProgram logic in FlinkRunnerTest
- Suppress checkstyle false alarm on AbstractStreamOperatorCompat

* Remove redundant setMaxNumRecords and consumerFactoryFn.

* Scale progress with respect to windows observation.

* [BEAM-9547] Implement some methods for deferred Series. (apache#12534)

* [BEAM-10670] Make Read execute as a splittable DoFn by default for the Java DirectRunner.

* Fix broken build.

* [BEAM-8025] Update tests to use TemporaryFolder instead of rolling their own implementation.

This prevents cross-test contanimation if multiple instances of the same test are executing at the same time on different executors, simplifies the implementation and migrates it to a reliable mechanism.
Example failure:
https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/2620/testReport/junit/org.apache.beam.runners.samza.runtime/SamzaTimerInternalsFactoryTest/testProcessingTimeTimers/

* Merge pull request apache#12575: [BEAM-10707] Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding Python docs precommit as separate suite.

* Adding to PR template

* Adding to jenkins readme

* indentation

* fix spotless

* [BEAM-10672] Changed jobType variable into mode to declare if it is streaming or batch job

* Improve CI documentation

Remove incorrect sentence

Add link to GCP IAM Roles

* [BEAM-10670] Make key coder deterministic by using upstream PCollection which uses random byte[] as the key.

This is necessary for some runners that require deterministic key encodings.

* [BEAM-10697] Remove testPy2Cython from precommit

Removing this specific target since it has been flaky.
Remainder work for removing Python 2.7 support is tracked
in https://issues.apache.org/jira/browse/BEAM-7372.

* [BEAM-9891] Generate query execution summary table after finishing jobs (apache#12601)

* [BEAM-9891] Generate query execution summary table after finishing jobs

* Print error message using LOG, check PipelineResult's state

Co-authored-by: Yuwei Fu <fuyuwei@google.com>

* [BEAM-9919] Added an External Transform API to Go SDK (apache#12445)

* [BEAM-10715] Update Jet Runner validates runner testing (apache#12567)

* Adapt Jet Runner to recent framework changes

* Add pre-commit checks for Jet Runner

* Make spotless checker happy

* Make checkstyle happy

* Remove Jet Runner validation tests from pre-commit

* Update runners/jet/build.gradle

Co-authored-by: Lukasz Cwik <lcwik@google.com>

* [BEAM-10557] Implemented SchemaIOProvider for DataStoreV1, Refactored tests (apache#12341)

* Implemented SchemaIOProvider for DataStoreV1, refactored tests

* Modified SchemaIOTableProviderWrapper#getTableStatistics

* Update sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/SchemaIOTableProviderWrapper.java

Co-authored-by: Scott Lukas <slukas@google.com>
Co-authored-by: Brian Hulette <hulettbh@gmail.com>

Co-authored-by: Michal Walenia <michal.walenia@polidea.com>
Co-authored-by: yoshiki.obata <yoshiki.obata@gmail.com>
Co-authored-by: Robert Bradshaw <robertwb@google.com>
Co-authored-by: Tobiasz Kędzierski <tobiasz.kedzierski@polidea.com>
Co-authored-by: viktorjonsson <viktor.g.jonsson@gmail.com>
Co-authored-by: Rui Wang <amaliujia@users.noreply.github.com>
Co-authored-by: David Cavazos <dcavazos@google.com>
Co-authored-by: Ahmet Altay <aaltay@gmail.com>
Co-authored-by: Borzoo <borzoo.esmailloo@gmail.com>
Co-authored-by: Rehman Murad Ali <rehmanmuradali0@gmail.com>
Co-authored-by: Yichi Zhang <zyichi@google.com>
Co-authored-by: fuyuwei <50607209+Imfuyuwei@users.noreply.github.com>
Co-authored-by: Boyuan Zhang <boyuanz@google.com>
Co-authored-by: Ning Kang <kawaigin@gmail.com>
Co-authored-by: Ning Kang <ningk@google.com>
Co-authored-by: Pablo <pabloem@users.noreply.github.com>
Co-authored-by: JIahao wu <jiahaowu@google.com>
Co-authored-by: Boyuan Zhang <36090911+boyuanzz@users.noreply.github.com>
Co-authored-by: Gleb Kanterov <kanterov@users.noreply.github.com>
Co-authored-by: Daniel Oliveira <daniel.o.programmer@gmail.com>
Co-authored-by: Maximilian Michels <mxm@apache.org>
Co-authored-by: Alexey Romanenko <33895511+aromanenko-dev@users.noreply.github.com>
Co-authored-by: Damon Douglas <douglas.damon@gmail.com>
Co-authored-by: Israel Herraiz <ihr@google.com>
Co-authored-by: Andrew Pilloud <apilloud@google.com>
Co-authored-by: Chamikara Jayalath <chamikara@apache.org>
Co-authored-by: Kevin Puthusseri <kevinsijo@google.com>
Co-authored-by: Andrew Pilloud <apilloud@users.noreply.github.com>
Co-authored-by: lostluck <13907733+lostluck@users.noreply.github.com>
Co-authored-by: Brian Hulette <bhulette@google.com>
Co-authored-by: Sam Rohde <srohde@google.com>
Co-authored-by: sclukas77 <66493473+sclukas77@users.noreply.github.com>
Co-authored-by: Scott Lukas <slukas@google.com>
Co-authored-by: Harrison Green <harrisonmichaelgreen@gmail.com>
Co-authored-by: amaliujia <amaliujia@163.com>
Co-authored-by: Piotr Szuberski <piotr.szuberski@polidea.com>
Co-authored-by: purbanow <37292156+purbanow@users.noreply.github.com>
Co-authored-by: Kasia Kucharczyk <2536609+kkucharc@users.noreply.github.com>
Co-authored-by: Robert Bradshaw <robertwb@gmail.com>
Co-authored-by: Robert Burke <lostluck@users.noreply.github.com>
Co-authored-by: Tyson Hamilton <tysonjh@google.com>
Co-authored-by: Kyle Weaver <kcweaver@google.com>
Co-authored-by: Filipe Regadas <filiperegadas@gmail.com>
Co-authored-by: Saavan Nanavati <66381097+saavannanavati@users.noreply.github.com>
Co-authored-by: Udi Meiri <udim@users.noreply.github.com>
Co-authored-by: Sławomir Andrian <slawomir.andrian@polidea.com>
Co-authored-by: ZijieSong946 <66277532+ZijieSong946@users.noreply.github.com>
Co-authored-by: Kamil Gałuszka <kamil.galuszka@solution4future.com>
Co-authored-by: Etta Rapp <ettarapp@google.com>
Co-authored-by: Robin Qiu <robinyq@google.com>
Co-authored-by: Lukasz Cwik <lukecwik@gmail.com>
Co-authored-by: Reuven Lax <relax@google.com>
Co-authored-by: Yueyang Qiu <robinyqiu@gmail.com>
Co-authored-by: Etienne Chauchot <echauchot@apache.org>
Co-authored-by: Colm O hEigeartaigh <coheigea@apache.org>
Co-authored-by: Yuwei Fu <fuyuwei@google.com>
Co-authored-by: Kasia Kucharczyk <katarzyna.kucharczyk@polidea.com>
Co-authored-by: Damian Gadomski <damian.gadomski@polidea.com>
Co-authored-by: Jayendra <jayendra0parmar@gmail.com>
Co-authored-by: Heejong Lee <heejong@gmail.com>
Co-authored-by: Brian Hulette <hulettbh@gmail.com>
Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
Co-authored-by: Tomo Suzuki <suztomo@google.com>
Co-authored-by: Luke Cwik <lcwik@google.com>
Co-authored-by: Damon Douglas <damondouglas@users.noreply.github.com>
Co-authored-by: Leiyi Zhang <35821728+leiyiz@users.noreply.github.com>
Co-authored-by: Leiyi Zhang <leiyiz@google.com>
Co-authored-by: Ihor Indyk <ihor.indyk@gmail.com>
Co-authored-by: Valentyn Tymofieiev <valentyn@google.com>
Co-authored-by: Eugene Kirpichov <ekirpichov@gmail.com>
Co-authored-by: Jan Lukavsky <je.ik@seznam.cz>
Co-authored-by: Neville Li <neville.lyh@gmail.com>
Co-authored-by: Kamil Wasilewski <kamil.wasilewski@polidea.com>
Co-authored-by: Udi Meiri <ehudm@google.com>
Co-authored-by: Kevin Sijo Puthusseri <25983646+pskevin@users.noreply.github.com>
Co-authored-by: Jozsef Bartok <jozsi@hazelcast.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants