# Installation
Run the following command to install apache-beam

Note: To run pipeline on the google colab environemnt, no need to install/configure runners. Each session in the colab is assigned with new virtual environment which forces us to install apache beam every time a new session is created

In [0]:
!{'pip install apache-beam'}

# Data encoding and Type safety

When Beam runners execute your pipeline, they often need to materialize the intermediate data in your PCollections, which requires converting elements to and from byte strings. The Beam SDKs use objects called Coders to describe how the elements of a given PCollection may be encoded and decoded.


In [0]:
from apache_beam import coders
coders.registry.get_coder(int)
#coders.registry.register_coder(int, coders.FloatCoder)

FloatCoder

# Type Safety

Type safety is prevention of typed errors in a programming language where type error means when someone attempts to perform an operation on a value that does not support that operation


In [0]:
# import beam module
import apache_beam as beam

p = beam.Pipeline()

@beam.typehints.with_input_types(int)
class FilterEvensDoFn(beam.DoFn):
  def process(self, element):
    if element % 2 == 0:
      yield element

evens = ( p
         | beam.Create(['1','2','3'])
         | beam.ParDo(FilterEvensDoFn()) 
        )
  
p.run()


import apache_beam as beam

p = beam.Pipeline()

evens = ( p 
         | beam.Create(['one','two','three']) 
         | beam.Filter(lambda x: x % 2 == 0).with_input_types(int) 
        )
  
p.run()

TypeCheckError: ignored

# An example for type hints

- **Typehints:**
		○ Inline: Provided during pipeline construction (on Transforms)
		○ Outline: Provided as properties of the DoFn using decorators
		○ Simple type hint
			Inlcudes primitive types like int, str, user defined classes
		○ Parametrized type hint
			Includes nested types, basically for container Python objects. Ex-List, Tuple, List[Tuple[int, str, str]]
		○ Special typehint
			Includes those special types which were introduced in PEP 484

In [0]:
import apache_beam as beam
import typing

class Employee(object):
  def __init__(self, id, name):
    self.id = id
    self.name = name

class EmployeeCoder(beam.coders.Coder):

  def encode(self, employee):
    return ('%s:%s' % (employee.id, employee.name)).encode('utf-8')

  def decode(self, s):
    return Employee(*s.decode('utf-8').split(':'))

  def is_deterministic(self):
    return True

beam.coders.registry.register_coder(Employee, EmployeeCoder)

def split_file(input):
  name, id, salary = input.split(',')
  return Employee(id, name), int(salary)

result = (
    p
	| beam.io.ReadFromText('data.txt')
    | beam.Map(split_file)
    | beam.CombinePerKey(sum).with_input_types(typing.Tuple[Employee, int])
	)
	
p.run()	