Skip to content

Commit

Permalink
Update DataframeTransform docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
TheNeuralBit committed Aug 18, 2020
1 parent c0c552c commit 66d258d
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion sdks/python/apache_beam/dataframe/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,15 @@ class DataframeTransform(transforms.PTransform):
"""A PTransform for applying function that takes and returns dataframes
to one or more PCollections.
For example, if pcoll is a PCollection of dataframes, one could write::
DataframeTransform will accept a PCollection with a schema and batch it
into dataframes if necessary. In this case the proxy can be omitted:
(pcoll | beam.Row(key=..., foo=..., bar=...)
| DataframeTransform(lambda df: df.group_by('key').sum()))
It is also possible to process a PCollection of dataframes directly, in this
case a proxy must be provided. For example, if pcoll is a PCollection of
dataframes, one could write::
pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), proxy=...)
Expand Down

0 comments on commit 66d258d

Please sign in to comment.