Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2103] Documenting Python 3 support #9133

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 62 additions & 1 deletion website/src/_includes/section-menu/documentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,68 @@
</li>
</ul>
</li>
</ul>

<li class="section-nav-item--collapsible">
<span class="section-nav-list-title">Java</span>

<ul class="section-nav-list">
<li><a href="{{ site.baseurl }}/documentation/transforms/java/overview/">Overview</a></li>
<li class="section-nav-item--collapsible">
<span class="section-nav-list-title">Element-wise</span>

<ul class="section-nav-list">
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/filter/">Filter</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/flatmapelements/">FlatMapElements</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/keys/">Keys</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap/">KvSwap</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements/">MapElements</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/pardo/">ParDo</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/partition/">Partition</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/regex/">Regex</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/reify/">Reify</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/tostring/">ToString</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys/">WithKeys</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps/">WithTimestamps</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/values/">Values</a></li>
</ul>
</li>
<li class="section-nav-item--collapsible">
<span class="section-nav-list-title">Aggregation</span>

<ul class="section-nav-list">
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximatequantiles/">ApproximateQuantiles</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique/">ApproximateUnique</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/cogroupbykey/">CoGroupByKey</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combine/">Combine</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext/">CombineWithContext</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/count/">Count</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/distinct/">Distinct</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey/">GroupByKey</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupintobatches/">GroupIntoBatches</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/latest/">Latest</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/max/">Max</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/mean/">Mean</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/min/">Min</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sample/">Sample</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sum/">Sum</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/top/">Top</a></li>
</ul>
</li>
<li class="section-nav-item--collapsible">
<span class="section-nav-list-title">Other</span>

<ul class="section-nav-list">
<li><a href="{{ site.baseurl }}/documentation/transforms/java/other/create/">Create</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/other/flatten/">Flatten</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/other/passert/">PAssert</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/other/view/">View</a></li>
<li><a href="{{ site.baseurl }}/documentation/transforms/java/other/window/">Window</a></li>
</ul>
</li>
</ul>
</li>
</ul>

</li>

<li class="section-nav-item--collapsible">
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
layout: section
title: "ApproximateQuantiles"
permalink: /documentation/transforms/java/aggregation/approximatequantiles/
section_menu: section-menu/documentation.html
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# ApproximateQuantiles
<table align="left">
<a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateQuantiles.html">
<img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
</a>
</table>
<br>
Takes a comparison function and the desired number of quantiles *n*, either
globally or per-key. Using an approximation algorithm, it returns the
minimum value, *n-2* intermediate values, and the maximum value.

## Examples
**Example**: to compute the quartiles of a `PCollection` of integers, we
would use `ApproximateQuantiles.globally(5)`. This will produce a list
containing 5 values: the minimum value, Quartile 1 value, Quartile 2
value, Quartile 3 value, and the maximum value.

## Related transforms
* [ApproximateUnique]({{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique)
estimates the number of distinct elements or distinct values in key-value pairs
* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine)
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
layout: section
title: "ApproximateUnique"
permalink: /documentation/transforms/java/aggregation/approximateunique/
section_menu: section-menu/documentation.html
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# ApproximateUnique
<table align="left">
<a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateUnique.html">
<img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
</a>
</table>
<br>
Transforms for estimating the number of distinct elements in a collection
or the number of distinct values associated with each key in a collection
of key-value pairs.

## Examples
See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates.

## Related transforms
* [Count]({{ site.baseurl }}/documentation/transforms/java/aggregation/count)
counts the number of elements within each aggregation.
* [Distinct]({{ site.baseurl }}/documentation/transforms/java/aggregation/distinct)
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
layout: section
title: "CoGroupByKey"
permalink: /documentation/transforms/java/aggregation/cogroupbykey/
section_menu: section-menu/documentation.html
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# CoGroupByKey
<table align="left">
<a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/join/CoGroupByKey.html">
<img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
</a>
</table>
<br>
Aggregates all input elements by their key and allows downstream processing
to consume all values associated with the key. While `GroupByKey` performs
this operation over a single input collection and thus a single type of
input values, `CoGroupByKey` operates over multiple input collections. As
a result, the result for each key is a tuple of the values associated with
that key in each input collection.

See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#cogroupbykey).

## Examples
**Example**: Say you have two different files with user data; one file has
names and email addresses and the other file has names and phone numbers.

You can join those two data sets, using the username as a common key and the
other data as the associated values. After the join, you have one data set
that contains all of the information (email addresses and phone numbers)
associated with each name.

```java
PCollection<KV<UID, Integer>> pt1 = /* ... */;
PCollection<KV<UID, String>> pt2 = /* ... */;

final TupleTag<Integer> t1 = new TupleTag<>();
final TupleTag<String> t2 = new TupleTag<>();
PCollection<KV<UID, CoGBKResult>> result =
KeyedPCollectionTuple.of(t1, pt1).and(t2, pt2)
.apply(CoGroupByKey.create());
result.apply(ParDo.of(new DoFn<KV<K, CoGbkResult>, /* some result */>() {
@ProcessElement
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
CoGbkResult result = e.getValue();
// Retrieve all integers associated with this key from pt1
Iterable<Integer> allIntegers = result.getAll(t1);
// Retrieve the string associated with this key from pt2.
// Note: This will fail if multiple values had the same key in pt2.
String string = e.getOnly(t2);
...
}));
```

## Related transforms
* [CombineWithContext]({{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext) to combine elements.
* [GroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey) takes one input collection.
90 changes: 90 additions & 0 deletions website/src/documentation/transforms/java/aggregation/combine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
layout: section
title: "Combine"
permalink: /documentation/transforms/java/aggregation/combine/
section_menu: section-menu/documentation.html
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Combine
<table align="left">
<a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Combine.html">
<img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
</a>
</table>
<br>
A user-defined `CombineFn` may be applied to combine all elements in a
`PCollection` (global combine) or to combine all elements associated
with each key.

While the result is similar to applying a `GroupByKey` followed by
counting the number of values in each `Iterable`, there is an impact
on the code you must write as well as the performance of the pipeline.
Writing a `ParDo` that counts the number of elements in each value
would be very straightforward. However, as described in the execution
model, it would also require all values associated with each key to be
processed by a single worker. This introduces a lot of communication overhead.
Using a `CombineFn` requires the code be structured as an associative and
commumative operation. But, it allows the use of partial sums to be precomputed.

See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#combine).

## Examples
**Example 1**: Global combine
Use the global combine to combine all of the elements in a given `PCollection`
into a single value, represented in your pipeline as a new `PCollection` containing
one element. The following example code shows how to apply the Beam-provided
sum combine function to produce a single sum value for a `PCollection` of integers.

```java
// Sum.SumIntegerFn() combines the elements in the input PCollection. The resulting PCollection, called sum,
// contains one value: the sum of all the elements in the input PCollection.
PCollection<Integer> pc = ...;
PCollection<Integer> sum = pc.apply(
Combine.globally(new Sum.SumIntegerFn()));
```

**Example 2**: Keyed combine
Use a keyed combine to to combine all of the values associated with each key
into a single output value for each key. As with the global combine, the
function passed to a keyed combine must be associative and commutative.

```java
// PCollection is grouped by key and the Double values associated with each key are combined into a Double.
PCollection<KV<String, Double>> salesRecords = ...;
PCollection<KV<String, Double>> totalSalesPerPerson =
salesRecords.apply(Combine.<String, Double, Double>perKey(
new Sum.SumDoubleFn()));
// The combined value is of a different type than the original collection of values per key. PCollection has
// keys of type String and values of type Integer, and the combined value is a Double.
PCollection<KV<String, Integer>> playerAccuracy = ...;
PCollection<KV<String, Double>> avgAccuracyPerPlayer =
playerAccuracy.apply(Combine.<String, Integer, Double>perKey(
new MeanInts())));
```

```java
# PCollection is grouped by key and the numeric values associated with each key
# are averaged into a float.
player_accuracies = ...
{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py tag:combine_per_key
%}
```

## Related transforms
* CombineWithContext
* GroupByKey
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
layout: section
title: "CombineWithContext"
permalink: /documentation/transforms/java/aggregation/combinewithcontext/
section_menu: section-menu/documentation.html
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# CombineWithContext
<table align="left">
<a target="_blank" class="button"
href="https://beam.apache.org/releases/javadoc/curent/index.html?org/apache/beam/sdk/transforms/CombineWithContext.html">
<img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px"
alt="Javadoc" />
Javadoc
</a>
</table>
<br>
A class of transforms that contains combine functions that have access to `PipelineOptions` and side inputs through `CombineWithContext.Context`.

## Examples
See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates.

## Related transforms
* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine)
for combining all values associated with a key to a single result
Loading