<a href="https://colab.research.google.com/github/YoheiShinozaki/BeamKatasColab/blob/master/Beam_Katas_01_Core_Transforms_Map_ParDo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Beam Katas

<!--
  ~  Licensed to the Apache Software Foundation (ASF) under one
  ~  or more contributor license agreements.  See the NOTICE file
  ~  distributed with this work for additional information
  ~  regarding copyright ownership.  The ASF licenses this file
  ~  to you under the Apache License, Version 2.0 (the
  ~  "License"); you may not use this file except in compliance
  ~  with the License.  You may obtain a copy of the License at
  ~
  ~      http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~  Unless required by applicable law or agreed to in writing, software
  ~  distributed under the License is distributed on an "AS IS" BASIS,
  ~  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  ~  See the License for the specific language governing permissions and
  ~  limitations under the License.
  -->

<html>
<h2>ParDo</h2>
<p>
  ParDo is a Beam transform for generic parallel processing. The ParDo processing paradigm is
  similar to the “Map” phase of a Map/Shuffle/Reduce-style algorithm: a ParDo transform considers
  each element in the input PCollection, performs some processing function (your user code) on
  that element, and emits zero, one, or multiple elements to an output PCollection.
</p>
<p>
  <b>Kata:</b> Please write a simple ParDo that maps the input element by multiplying it by 10.
</p>
<br>
<div class="hint">
  Override <a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.process">
  process</a> method.
</div>
<div class="hint">
  Use <a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.ParDo">
  ParDo</a> with
  <a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn">DoFn</a>.
</div>
<div class="hint">
  Refer to the Beam Programming Guide
  <a href="https://beam.apache.org/documentation/programming-guide/#pardo">"ParDo"</a> section for
  more information.
</div>
</html>

In [0]:
!pip install apache-beam -qqq

import apache_beam as beam
from apache_beam.runners.interactive import interactive_runner

## Python Collection

In [0]:
class MultiplyByTenDoFn(beam.DoFn):

    def process(self, element):
        yield element * 10

[1, 2, 3, 4, 5] | beam.ParDo(MultiplyByTenDoFn())

[10, 20, 30, 40, 50]

## Beam Pcollection

In [0]:
p = beam.Pipeline(interactive_runner.InteractiveRunner())

class MultiplyByTenDoFn(beam.DoFn):

    def process(self, element):
        yield element * 10
            
(p | 'Create' >> beam.Create([1, 2, 3, 4, 5])
   | 'MultiplyByTen' >> beam.ParDo(MultiplyByTenDoFn()))

p.run()

Running...

Using 0 cached PCollections
Executing 2 of 2 transforms.

MultiplyByTen produced {30, 10, 40, 20, 50}

Create produced {5, 4, 2, 1, 3}

<apache_beam.runners.interactive.interactive_runner.PipelineResult at 0x7f2a431c4250>