Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ should handle this. ([#25252](https://github.com/apache/beam/issues/25252)).
* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it is set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
* `upload_graph` as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK ([PR#28621](https://github.com/apache/beam/pull/28621).
* `upload_graph` as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK ([PR#28621](https://github.com/apache/beam/pull/28621)).
* state amd side input cache has been enabled to a default of 100 MB. Use `--max_cache_memory_usage_mb=X` to provide cache size for the user state API and side inputs. (Python) ([#28770](https://github.com/apache/beam/issues/28770)).
* Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the [README](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md).

Expand Down
18 changes: 9 additions & 9 deletions sdks/python/apache_beam/io/gcp/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,8 @@
or a table. Pipeline construction will fail with a validation error if neither
or both are specified.

When reading via `ReadFromBigQuery`, bytes are returned decoded as bytes.
When reading via `ReadFromBigQuery` using `EXPORT`,
bytes are returned decoded as bytes.
This is due to the fact that ReadFromBigQuery uses Avro exports by default.
When reading from BigQuery using `apache_beam.io.BigQuerySource`, bytes are
returned as base64-encoded bytes. To get base64-encoded bytes using
Expand Down Expand Up @@ -2597,6 +2598,8 @@ def expand(self, input):


class ReadFromBigQuery(PTransform):
# pylint: disable=line-too-long,W1401

"""Read data from BigQuery.

This PTransform uses a BigQuery export job to take a snapshot of the table
Expand Down Expand Up @@ -2653,8 +2656,7 @@ class ReadFromBigQuery(PTransform):
:data:`None`, then the temp_location parameter is used.
bigquery_job_labels (dict): A dictionary with string labels to be passed
to BigQuery export and query jobs created by this transform. See:
https://cloud.google.com/bigquery/docs/reference/rest/v2/\
Job#JobConfiguration
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfiguration
use_json_exports (bool): By default, this transform works by exporting
BigQuery data into Avro files, and reading those files. With this
parameter, the transform will instead export to JSON files. JSON files
Expand All @@ -2666,11 +2668,10 @@ class ReadFromBigQuery(PTransform):
types (datetime.date, datetime.datetime, datetime.datetime,
and datetime.datetime respectively). Avro exports are recommended.
To learn more about BigQuery types, and Time-related type
representations, see: https://cloud.google.com/bigquery/docs/reference/\
standard-sql/data-types
representations,
see: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types
To learn more about type conversions between BigQuery and Avro, see:
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro\
#avro_conversions
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro\#avro_conversions
temp_dataset (``apache_beam.io.gcp.internal.clients.bigquery.\
DatasetReference``):
Temporary dataset reference to use when reading from BigQuery using a
Expand All @@ -2690,8 +2691,7 @@ class ReadFromBigQuery(PTransform):
(`PYTHON_DICT`). There is experimental support for producing a
PCollection with a schema and yielding Beam Rows via the option
`BEAM_ROW`. For more information on schemas, see
https://beam.apache.org/documentation/programming-guide/\
#what-is-a-schema)
https://beam.apache.org/documentation/programming-guide/#what-is-a-schema)
"""
class Method(object):
EXPORT = 'EXPORT' # This is currently the default.
Expand Down
69 changes: 68 additions & 1 deletion website/www/site/content/en/blog/beam-2.52.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ should handle this. ([#25252](https://github.com/apache/beam/issues/25252)).
* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it is set to true, Flink runner will run batch
jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
* `upload_graph` as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK ([PR#28621](https://github.com/apache/beam/pull/28621).
* `upload_graph` as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK ([PR#28621](https://github.com/apache/beam/pull/28621)).
* state amd side input cache has been enabled to a default of 100 MB. Use `--max_cache_memory_usage_mb=X` to provide cache size for the user state API and side inputs. (Python) ([#28770](https://github.com/apache/beam/issues/28770)).
* Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the [README](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md).

Expand Down Expand Up @@ -69,69 +69,136 @@ as a workaround, a copy of "old" `CountingSource` class should be placed into a
According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!

Ahmed Abualsaud

Ahmet Altay

Aleksandr Dudko

Alexey Romanenko

Anand Inguva

Andrei Gurau

Andrey Devyatkin

BjornPrime

Bruno Volpato

Bulat

Chamikara Jayalath

Damon

Danny McCormick

Devansh Modi

Dominik Dębowczyk

Ferran Fernández Garrido

Hai Joey Tran

Israel Herraiz

Jack McCluskey

Jan Lukavský

JayajP

Jeff Kinard

Jeffrey Kinard

Jiangjie Qin

Jing

Joar Wandborg

Johanna Öjeling

Julien Tournay

Kanishk Karanawat

Kenneth Knowles

Kerry Donny-Clark

Luís Bianchin

Minbo Bae

Pranav Bhandari

Rebecca Szper

Reuven Lax

Ritesh Ghorse

Robert Bradshaw

Robert Burke

RyuSA

Shunping Huang

Steven van Rossum

Svetak Sundhar

Tony Tang

Vitaly Terentyev

Vivek Sumanth

Vlado Djerek

Yi Hu

aku019

brucearctor

caneff

damccorm

ddebowczyk92

dependabot[bot]

dpcollins-google

edman124

gabry.wu

illoise

johnjcasey

jonathan-lemos

kennknowles

liferoad

magicgoody

martin trieu

nancyxu123

pablo rodriguez defino

tvalentyn