<a href="https://colab.research.google.com/github/MJMortensonWarwick/data_engineering_for_data_scientists/blob/main/0_3_Functional_versus_Procedural.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 0_3_Functional versus Procedural

A short notebook showing a procedural version of some code and then a functional version:

In [1]:
# Procedural approach

# We have a list of raw transaction strings
raw_data = ["100,USD", "200,EUR", "invalid", "50,USD"]
clean_data = [] # State mutation!

for row in raw_data:
    if "," in row:
        amount, currency = row.split(",")
        if currency == "USD":
            # Changing the state of 'clean_data'
            clean_data.append(int(amount))

print(sum(clean_data))

150


And now the functional equivalent:

In [4]:
import functools

raw_data = ["100,USD", "200,EUR", "invalid", "50,USD"]

# A Pure Function to extract amount, assuming the row has already been filtered for USD
def get_usd_amount(row):
    return int(row.split(",")[0])

# The Pipeline (Filter/Map/Reduce)
# 1. Filter valid USD rows
filtered_usd_rows = filter(lambda x: "," in x and "USD" in x, raw_data)

# 2. Map (Transform) to numbers using the simplified pure function
amounts = map(get_usd_amount, filtered_usd_rows)

# 3. Reduce (Sum) the amounts
total = functools.reduce(lambda a, b: a + b, amounts)

print(total)

150
