## Map with a function

In [2]:
import apache_beam as beam

def strip_header_and_newline(text):
    return text.strip('# \n')

with beam.Pipeline() as pipeline:
    plants = (
        pipeline
        | 'Gardening plants' >> beam.Create([
            '# 🍓Strawberry\n',
            '# 🥕Carrot\n',
            '# 🍆Eggplant\n',
            '# 🍅Tomato\n',
            '# 🥔Potato\n',
        ])
        | 'Strip header' >> beam.Map(strip_header_and_newline)
        | beam.Map(print)
    )






🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato


## Mapping with a lambda function

In [3]:
import apache_beam as beam

with beam.Pipeline() as pipe:
    plants =(
        pipe
        | 'Gardening plants' >> beam.Create([
            '# 🍓Strawberry\n',
            '# 🥕Carrot\n',
            '# 🍆Eggplant\n',
            '# 🍅Tomato\n',
            '# 🥔Potato\n',
        ])
        | 'Strip header' >> beam.Map(lambda text: text.strip('# \n'))
        | beam.Map(print)
    )

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato



## Mapping with Multiple arguments

You can pass functions with multiple arguments to `Map`.
They are passed as additional positional arguments or keyword arguments to the function.

In this example, `strip` takes `text` and `chars` as arguments.

In [4]:
import apache_beam as beam

def strip(text, chars=None):
    return text.strip(chars)

with beam.Pipeline() as pipe:
    plants = (
        pipe
        | beam.Create([
          '# 🍓Strawberry\n',
          '# 🥕Carrot\n',
          '# 🍆Eggplant\n',
          '# 🍅Tomato\n',
          '# 🥔Potato\n',
        ])
        | 'Strip header' >> beam.Map(strip, chars='# \n')
        | beam.Map(print)
    )

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato


## MapTuple for key-value pairs

In [5]:
import apache_beam as beam

with beam.Pipeline() as pipeline:
    plants = (
        pipeline
        | 'Gardening plants' >> beam.Create([
            ('🍓', 'Strawberry'),
            ('🥕', 'Carrot'),
            ('🍆', 'Eggplant'),
            ('🍅', 'Tomato'),
            ('🥔', 'Potato'),
        ])
        | 'Format' >> beam.MapTuple(lambda icon, plant: f'{icon}|{plant}')
        | beam.Map(print)
    )

🍓|Strawberry
🥕|Carrot
🍆|Eggplant
🍅|Tomato
🥔|Potato


## Map with side inputs as singletons

If the `PCollection` has a single value, such as the average from another computation,
passing the `PCollection` as a *singleton* accesses that value.

Side inputs provide a way of passing multiple arguments into the `PTransform` from the result of another pipeline.

In [7]:
import apache_beam as beam

with beam.Pipeline() as pipe:
    char = (pipe | 'Create chars' >> beam.Create(['# \n']))

    plants = (
        pipe
        | 'Gardening plants' >> beam.Create([
            '# 🍓Strawberry\n',
            '# 🥕Carrot\n',
            '# 🍆Eggplant\n',
            '# 🍅Tomato\n',
            '# 🥔Potato\n',
        ])
        | 'Strip header' >> beam.Map(
            lambda text, chars:
            text.strip(chars),
            chars=beam.pvalue.AsSingleton(char),
        )
        | beam.Map(print)
    )

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato


## Map with side inputs as iterators

If the `PCollection` has multiple values, pass the `PCollection` as an *iterator*.
This accesses elements lazily as they are needed,
so it is possible to iterate over large `PCollection`s that won't fit into memory.

> **Note**: You can pass the `PCollection` as a *list* with `beam.pvalue.AsList(pcollection)`,
> but this requires that all the elements fit into memory.


In [13]:
import apache_beam as beam

with beam.Pipeline() as pipe:
    chars = pipe | 'Create chars' >> beam.Create(["#", ' ', '\n'])

    plants = (
        pipe
        | 'Gardening plants' >> beam.Create([
            '# 🍓Strawberry\n',
            '# 🥕Carrot\n',
            '# 🍆Eggplant\n',
            '# 🍅Tomato\n',
            '# 🥔Potato\n',
        ])
        | 'Strip header' >> beam.Map(
            lambda text, chars:
            text.strip(''.join(chars)),
            chars=beam.pvalue.AsIter(chars),
        )
        | beam.Map(print)
    )

🍓Strawberry
🥕Carrot
🍆Eggplant
🍅Tomato
🥔Potato


> ### Map with side inputs as dictionaries

If a `PCollection` is small enough to fit into memory, then that `PCollection` can be passed as a *dictionary*.
Each element must be a `(key, value)` pair.
Note that all the elements of the `PCollection` must fit into memory for this.
If the `PCollection` won't fit into memory, use `beam.pvalue.AsIter(pcollection)` instead.

In [14]:
import apache_beam as beam

def replace_duration(plant, durations):
    plant['duration'] = durations[plant['duration']]
    return plant

with beam.Pipeline() as pipeline:
    durations = pipeline | "Durations" >> beam.Create([
        (0, 'annual'),
        (1, 'biennial'),
        (2, 'perennial'),
    ])

    plants_details = (
        pipeline
        | 'Gardening plants' >> beam.Create([
          {'icon': '🍓', 'name': 'Strawberry', 'duration': 2},
          {'icon': '🥕', 'name': 'Carrot', 'duration': 1},
          {'icon': '🍆', 'name': 'Eggplant', 'duration': 2},
          {'icon': '🍅', 'name': 'Tomato', 'duration': 0},
          {'icon': '🥔', 'name': 'Potato', 'duration': 2},
        ])
        | 'Replace duration' >> beam.Map(
            replace_duration,
            durations=beam.pvalue.AsDict(durations),
        )
        | beam.Map(print)
    )

{'icon': '🍓', 'name': 'Strawberry', 'duration': 'perennial'}
{'icon': '🥕', 'name': 'Carrot', 'duration': 'biennial'}
{'icon': '🍆', 'name': 'Eggplant', 'duration': 'perennial'}
{'icon': '🍅', 'name': 'Tomato', 'duration': 'annual'}
{'icon': '🥔', 'name': 'Potato', 'duration': 'perennial'}
