# NB14: Functional Programming with Collections

## Programming Fundamentals

## L.EIC/2022-23

#### João Correia Lopes$^{1}$, Nuno Macedo$^{1}$, Pedro Vasconcelos$^{2}$
$^{1}$FEUP/DEI & INESC TEC\
$^{2}$FCUP/DCC & LIACC

> “There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

Tony Hoare

## Goals

By the end of this class, the student should be able to:

- Describe the notions of of pure functions, immutable datatypes and functional programming

- Understand the use of combined collections (such as lists of tuples) for organizing data

- Simplify list processing using *sequence processing functions*: `map()`, `filter()` and `reduce()`

- Understand the use of *lambda forms* for defining short functions

- Understand the use of the key argument to specify comparison for list sorting

## Bibliography

- A. M. Kuchling, *Functional Programming HOWTO*, Release 0.32 [[HTML]](https://docs.python.org/3/howto/functional.html)

# 14 Functional programming with collections

## 14.1 Programming paradigms

### Programming paradigms

A *programming paradigm* is a method for decomposing programming problems into simpler components.

**Procedural programming**:
- this is the oldest paradigm is the closest to the machine code
- decompose a program into a sequence of instructions that achive the desired computation
- examples: Pascal, C

**Declarative programming**:
- a program is a  mathematical specification of *what* should be computed and the language figures out the sequence of steps to achieve that goal
- examples: SQL, Prolog

**Object-oriented programming**:
- a program is a collection of objets that maintain internal state and communicate by sending messages
- examples: Smalltalk, Java, C++


**Functional programming**:
- decomposes a problem into a set of functions that perform transformations on immutable values
- examples: Scheme, ML, Haskell


### Multi-paradigm languages

- Specific paradigms are a better fit for specific problems than others
- Some (most?) languages are **multi-paradigm**: they support different programming paradigms in a single language

Examples:
- Python and C++ support objects but do not force object-oriented programming
- Scala and OCaml combine object-oriented and functional programming
- Python and C++ support some functional programming ideas for manipulating collections  


### Functional programming

- Functional decomposes a problem into a set of functions

- Ideally, functions only take inputs and produce outputs, and don’t have any internal state that affects the output produced for a given input

- Well-known functional languages include the ML family (Standard ML, OCaml, and other variants) and Haskell

## 14.2 Functional Programming

> "Programming in a functional language consists in building
>   definitions and using the computer to evaluate expressions."<sup>1</sup>

- The primary role of the programmer is to construct a function to solve a given problem

- This function, which may involve a number of subsidiary functions, is expressed in notation that obeys normal mathematical principles

- The primary role of the computer is to act as an evaluator or calculator: its job is to evaluate expressions and print results

<sup>1</sup> Bird & Wadler, Introduction to Functional Programming, Prentice-Hall, 1988

### Haskell

- Some of Python's features were influenced by Haskell, a purely functional programming language

- To get a better appreciation of what a functional language is, let's look at features in Haskell:

  - **Pure Functions** --- do not have side effects (that is, they do not change the state of the program; given the same input, a pure function will always produce the same output)

  - **Immutability** --- data cannot be changed after it is created

  - **Higher Order Functions** --- functions can accept other functions as parameters and functions can return new functions as output (this allows us to abstract over actions, giving us flexibility in our code's behavior)

- Haskell has also influenced *iterators* and *generators* in Python through its **lazy evaluation**

## 14.3 Advanced collection concepts

- We can use *sequences of tuples* to model data from common problems  

- `map`, `filter` and `reduce` are functions that can simplify common sequence processing

- Later we will also look at *list comprehensions* that do similar things

- *Lambda forms* are a nice feature for defining short functions that simplify using functions over collections (but aren't essential for Python programming)

### Sequence of tuples

- We can use a sequence of tuples to represent structured data

- For example:
a *color* can be represented by three integers `(r,g,b)` corresponding to the *red*, *green* and *blue* components

- Therefore we use a list of triples to represent a list of colors

```
colors = [(0,0,0), (0,127,255), (127,127,127), (255,255,255)]
```

### Working with lists of tuples

- We've already seen that can use a for loop to iterate over lists of tuples
- We can also take advantage of *unpacking* to extract the red, green and blue components

In [None]:
colors = [(0,0,0), (128,0,0), (127,127,127), (255,255,255)]
for r,g,b in colors:
   print(f'red={r}, green={g}, blue={b}')

### Combining lists of tuples

- We can use the `zip` function to combine two lists into a list of pairs
- For example: let us combine the colors with their names
- zip produces a *lazy* sequence: we need to convert into a list to view the results


In [None]:
colors = [(0,0,0), (128,0,0), (127,127,127), (255,255,255)]
names = ["black",  "maroon", "lightgrey", "white"]
print(zip(names, colors))
print(list(zip(names,colors)))

However, we don't need to convert into a list if we only want to iterate:

In [None]:
for (name,(r,g,b)) in zip(names,colors):
     print(f'color {name} has red={r},green={g},blue={b}')

If the lists have different lengths, then zip truncates the result to the *shorter* list:

In [None]:
some_names = ["black", "maroon"]
print(list(zip(some_names,colors)))

## 14.4 Sequence Processing Functions

### The `map` function

- `map(function, sequence)` applies a function to every element in a sequence

- The result is *new sequence* of the transformed elements

- `map` behaves similarly to the following definition:

```
  def map(function, sequence):
      result = []
      for v in sequence:
        result.append(function(v))
      return result  
```

Example:

```
 >>> list(map(int, ["10", "12", "14", 3.1415926, 5]))
 [10, 12, 14, 3, 5]
```


### Example 1: complementary colors

The *complementary* of an RGB color is obtained by subtracting each component from 255.

Let us write a function to compute this.

In [None]:
def rgb_compl(color):
     r,g,b = color
     return (255-r, 255-g, 255-b)

We can apply this transformation to every color in a list using map:

In [None]:
print(colors)
print(list(map(rgb_compl, colors)))

Note that the complement of `(0,0,0)` (black) is `(255,255,255)` (white) and vice-versa.

### Example 2: converting RGB to grayscale

We can convert a color in RGB format into grayscale using the [NTSC color model](https://www.mathworks.com/help/matlab/ref/rgb2gray.html) by the formula

$$ y = 0.299\times \frac{R}{255} + 0.587\times \frac{G}{255} + 0.114 \times \frac{B}{255} $$

The result $y$ is a value between 0 and 1: 0 corresponds to full black (darkest color) and 1 to full white (lightest color).

In [None]:
def rgb2grayscale(color):
    r,g,b = color
    return 0.299*(r/255) + 0.587*(g/255) + 0.114*(b/255)

print(list(map(rgb2grayscale, colors)))

### Processing Pipeline

- Now that we have seen simple transformations, we can combine
them to obtain more sophisticated transformations

- For example: let use first transform each color to its complement and then convert it to grayscale


In [None]:
compl_colors = map(rgb_compl, colors)
gray_colors = map(rgb2grayscale, compl_colors)
print(list(gray_colors))

Alternatively, we can perform the two transformations as a pipeline:

In [None]:
gray_colors = map(rgb2grayscale, map(rgb_compl, colors))
print(list(gray_colors))

### List Comprehensions (next!)

- A popular Python feature that appears prominently in Functional Programming Languages is list comprehensions

- A list comprehension is an expression that combines a function, a `for` statement, and an optional `if` statement

- The most important thing about a list comprehension is that it is an iterable that applies a calculation to another iterable

```
gray_colors = [rgb2grayscale(c) for c2 in [rgb_compl(c1) for c1 in colors]]
```

- Watch out for next NB!

### The `filter` function

- `filter(function, sequence)` selects elements from a sequence for which the function gives `True`

- `filter` behaves as if it had the following definition:

```
  def filter(function, sequence):
     result = []
     for v in sequence:
        if function(v):
            result.append(v)
     return result
```


### Example 1: filtering colors

Let's come back to the RGB colors example.

In [None]:
colors = [(0,0,0), (128,0,0), (127,127,127), (255,255,255)]

Here is a function to check if a color has all components greater than zero:

In [None]:
def greater_than_0(color):
  r, g, b  = color
  return (r > 0 and g > 0 and b > 0)

Now let us find all the colors that satisfy this condition i.e. have all components greater than zero.

In [None]:
print(list(filter(greater_than_0, colors)))

### Example 2: Dice rolls

Let's roll some dice:

In [None]:
import random

rolls = []
for _ in range(100):
    d1 = random.randint(1,6)
    d2 = random.randint(1,6)
    rolls.append((d1,d2))
print(rolls)

Let us now find all [hardway combinations](https://www.bestonlinecasinos.com/craps/bets/hardway/), that is an exact pair of doubles that sums 4, 6, 8 or 10.


In [None]:
def is_hardways(pair):
    (d1, d2) = pair
    return d1 == d2 and d1+d2 in (4, 6, 8, 10)

Now, filter the hardways from the original rolls:

In [None]:
hardways = filter(is_hardways, rolls)
print(list(hardways))

What if we try to count them afterwards:

In [None]:
print(len(list(hardways)))

What happened?

Lazy iterators have an internal state: **they can only be used once**.

We need to convert the generator into a list to be able to use more than once.

In [None]:
hardways = list(filter(is_hardways, rolls))
print(hardways)
print(len(hardways))

### The `reduce` function

- A very general sequence processing function

- Not predefined in Python; part of the `functools` library

- `reduce(function, sequence)` applies a given function to an accumulator and each item of a sequence, from left to right, so as to reduce the sequence to a single value

- The initial value for the accumulator is the first element of the sequence

- Sometimes a `for` loop maybe be more readable!



### Reducing a List

- If `seq = [x1, x2, x3, x4]` then
  `reduce(f, seq)` computes
  `f(f(f(x1,x2),x3),x4)`

- More generally, for a sequence `[x1, x2 ..., xn]`
  then `reduce(f, seq)` computes
  `f(f(...f(f(x1,x2),x3)..), xn)`

- `reduce` behaves similarly to the following definition:

```
   def reduce(function, sequence):
       acc = sequence[0]
       for s in sequence[1:]:
           acc = function(acc, s)
       return acc
```





### Example 1: summing all list elements

We can use `reduce` to combine all values in a list of numbers
using a function `plus` which adds two arguments.



In [None]:
import functools

def plus(a, b):
    return a+b

print(functools.reduce(plus, [47, 11, 42, 13]))

The following diagram shows the intermediate steps of the calculation:

![reduce](https://raw.githubusercontent.com/fp-leic/public/main/notebooks/14/reduce.png)


This is similar to the built-in function `sum`:

In [None]:
print(sum([47, 11, 42, 13]))

However, simply by changing the function we can:
* perform the multiplication of all values
* compute the maximum
* ...

In [None]:
def times(a, b):
   return a*b

print(functools.reduce(times, [47, 11, 42, 13]))

In [None]:
print(functools.reduce(max, [47, 11, 42, 13]))

### Other functions similar to `reduce`

The built-in functions `sum`, `any` and `all` are all special cases of `reduce`.

$\Rightarrow$
<https://github.com/fp-leic/public/tree/master/lectures/14/reduce.py>


Let's define some functions:

### MapReduce programming model

- Google popularized a parallel programming model called *MapReduce*
  [[Google, 2004]](https://ai.google/research/pubs/pub62)

- Many practical large-scale computations can be expressed as a combination of a *map* followed by a *reduce*

- The program can be automatically parallelized to run on the cloud

- Definitions: [[wiki]](https://en.wikipedia.org/wiki/MapReduce)

## 14.5 Advanced List Sorting

### List Sorting

Consider a list of tuples that came from a spreadsheet `csv` file:

```
   # (county_code, county_name, state, jobs)
   job_data = [
       ('121','Wyoming','NY',8722),
       ('123','Yates','NY',5094)
       ...
       ('001,'Albany','NY',162692),
       ('003','Allegany','NY',11986),
   ]
```



- We can sort this list with the `list.sort()` method

```
   job_data.sort()
```

- By default, sort will  compare each element in the list against others

- For tuples this will be the *lexicographical ordering*:
   *   `('001', 'Albany', 'NY', 162692) < ('121', 'Wyoming', 'NY', 8722)` because `'001' < '121'`
`
- In this case: first we sort by county number, then county name, then state and finally jobs

- What if we want to sort by some other criteria?

### Sorting with a key

- The `sort()` method of a list can accept a keyword parameter, `key`, that provides a key extraction function

- This function returns a value which should be used for comparison purposes

- To sort our `job_data` by the second field (county name), we can use a function like the following:

```
   def by_county(row):
       return row[1]

   job_data.sort(key=by_row)
```

$\Rightarrow$
<https://github.com/fp-leic/public/tree/master/lectures/14/sort.py>

Our "database":

In [None]:
job_data = [
   ('121', 'Yates', 'NY', 5094),
   ('122', 'Wyoming', 'NY', 8722),
   ('001', 'Albany', 'NY', 162692),
   ('003', 'Allegany', 'NY', 11986),
]

Let's do the default sort:

In [None]:
print(job_data)
job_data.sort()
print(job_data)

Now let use sort by increasing number of jobs:

In [None]:
def by_jobs(row):
    return row[3]   # 4th element of the tuple

job_data.sort(key=by_jobs)
print(job_data)

Sort by county name:

In [None]:
def by_county(row):
  return row[1]     # 2nd element of the tuple

job_data.sort(key=by_county)
print(job_data)

## 14.6 Lambda forms

- The functions `map()`, `filter()` and the `list.sort()`  are often used  with small functions as arguments

- Instead of comming up with a new name for each such function, Python allows us to define an *anonymous* function using a *lambda form*

- This is useful when we a need a short function that is used only once

- A *lambda form* is like a defined function: it has parameters and computes a value

- The body of a *lambda*, however, can only be a single expression, limiting it to relatively simple cases



In [None]:
lambda x: 2*x+1   # a function with argument x and result 2*x+1

What can we do with a lambda? We can apply it to a single value just a like a named function:

In [None]:
(lambda x: 2*x+1)(3)

More interestingly, we can apply it to every element in a list using map:

In [None]:
list(map(lambda x: 2*x+1, [1,2,3,4,5]))

We can also use a lambda with sort; here is our previous example of sorting by increasing jobs:

In [None]:
job_data.sort(key=lambda row: row[3])
print(job_data)

### Why lambda?

The name *lambda* comes from the *lambda calculus*, a mathematical model of computation that is the foundation for functional programming. [Wikipedia](https://en.wikipedia.org/wiki/Lambda_calculus)



### Map, Filter, and Reduce Functions

Python Tutorial || Learn Python Programming -- Socratica

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('hUes6y2b--0')

-- João Correia Lopes, Nuno Macedo & Pedro Vasconcelos