# The lambda keyword is used for functional programming in Python which enables distributed computing on Big Data. 

In [1]:
x = ['Python', 'programming', 'is', 'awesome!']
print(sorted(x))

['Python', 'awesome!', 'is', 'programming']


In [2]:
print(sorted(x, key=lambda arg: arg.lower()))

['awesome!', 'is', 'programming', 'Python']


Other common functional programming functions exist in Python as well, such as filter(), map(), and reduce(). All these functions can make use of lambda functions or standard functions defined with def in a similar manner.

# Filter:

In [3]:
x = ['Python', 'programming', 'is', 'awesome!']

print(list(filter(lambda arg: len(arg) < 8, x)))

['Python', 'is']


In [4]:
# filter can be used to replace function and for-loop:

def is_less_than_8_characters(item):
    return len(item) < 8

x = ['Python', 'programming', 'is', 'awesome!']
results = []

for item in x:
    if is_less_than_8_characters(item):
        results.append(item)

print(results)

['Python', 'is']


# Map: 

In [5]:
x = ['Python', 'programming', 'is', 'awesome!']

print(list(map(lambda arg: arg.upper(), x)))

['PYTHON', 'PROGRAMMING', 'IS', 'AWESOME!']


In [6]:
# map again is simpler than for-loop equivalent piece of code: 

results = []

x = ['Python', 'programming', 'is', 'awesome!']
for item in x:
    results.append(item.upper())

print(results)

['PYTHON', 'PROGRAMMING', 'IS', 'AWESOME!']


# Reduce: 

However, reduce() doesn’t return a new iterable. Instead, reduce() uses the function called to reduce the iterable to a single value:

In [7]:
from functools import reduce

x = ['Python', 'programming', 'is', 'awesome!']

print(reduce(lambda val1, val2: val1 + val2, x))

Pythonprogrammingisawesome!


In [9]:
# reduce?

lambda, map(), filter(), and reduce() are concepts that exist in many languages and can be used in regular Python programs. Soon, you’ll see these concepts extend to the PySpark API to process large amounts of data.

# Sets: 

Sets are another common piece of functionality that exist in standard Python and is widely useful in Big Data processing. Sets are very similar to lists except they do not have any ordering and cannot contain duplicate values. You can think of a set as similar to the keys in a Python dict.

There are two reasons that PySpark is based on the functional paradigm:

1. Spark’s native language, Scala, is functional-based.
2. Functional code is much easier to parallelize.