In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("ps5.ipynb")

### Question 1
This question will continue where we left off in lecture, by using the MapReduce class to perform data processing.

In [None]:
from functools import reduce
import itertools
import gzip


class MapReduce:
    @property
    def reduce_init(self):
        # override as necessary if the init parameter needs to change
        return None

    def mapper(self, x):
        raise NotImplementedError()

    def reducer(self, accum, x):
        raise NotImplementedError()

    def postprocess(self, reduced):
        # override if necessary
        return reduced

    def run(self, iterable):
        mapped = map(self.mapper, iterable)
        reduced = reduce(self.reducer, mapped, self.reduce_init)
        processed = self.postprocess(reduced)
        return processed

(To make things more flexible, we have also added an optional `.postprocess()` method that can be used to do additional processing after the reduction step.)

Questions 1(a)-1(c) concern the Enron dataset seen in lecture:

In [None]:
def enron(n=None):
    i1 = gzip.open("email-Enron.txt.gz", "rt")
    i2 = itertools.islice(i1, 4, None)  # slice off header
    return itertools.islice(i2, n)


For each question below, implement a subclass of MapReduce such that calling `.run(enron(n))` produces the desired output. For example, if the question asked you to calcluate the total number of e-mails, your solution could be:

In [None]:
class NumEmails(MapReduce):
    @property
    def reduce_init(self):
        return 0

    def mapper(self, x):
        return 1

    def reducer(self, accum, x):
        return accum + x


NumEmails().run(enron(100))

**1(a)** (3 pts) Define a user's *importance* to be the number of unique people who e-mailed them (not including themself). Write a MapReduce class that returns a `collections.Counter` mapping each user ID to their importance when run.

In [None]:
class Importance(MapReduce):
    ...

    def postprocess(self, reduced):
        from collections import Counter

        return Counter({k: len(v) for k, v in reduced.items()})

In [None]:
grader.check("q1a")

**1(b)** (4 pts) Define a user's *forgetfulness* to be the number of times they e-mailed themself. Write a MapReduce class that returns a `Counter` that maps each user who e-mailed themself at least once to their forgetfulness score.

In [None]:
class Forgetful(MapReduce):
    ...

In [None]:
grader.check("q1b")

**1(c)** (5 pts) Define a user's *professor score* to be the number of unique individuals who e-mailed that user and never got a response back. Write a MapReduce class that returns the a `Counter()` mapping each user with a nonzero professor score to their score.

In [None]:
class ProfessorScore(MapReduce):
    ...

In [None]:
grader.check("1c")

Questions 1(d)-1(e) concern the following Facebook dataset:

In [None]:
import gzip
import itertools


def fb(n=None):
    return itertools.islice(gzip.open("fb.txt.gz", "rt"), n)

Each line of the file contains a list of integers. The first integer is a user id, and the remaining integers and the ids of all that user's Facebook friends. User 0 has a lot of friends:

In [None]:
next(fb())

Notes:
- It is *not* necessarily the case that the friend of every user is also present in the dataset.
- In this dataset, friendship is symmetric: $A$ is a friend of $B$ implies that $B$ is a friend of $A$. However, this is only indicated *once* in the data: if $A$ is a friend of $B$ and $A < B$, then $B$ appears in the friend list of $A$.

**1(d)** (5 pts) Define a *triangle* to be any set of three users $(A, B, C)$ such that they are all friends. For example, `126` is friends with both `0` and `1`, so `{0, 1, 126}` is a triangle. Write a MapReduce class that returns the set of all triangles in the dataset. Each triangle should be represented as a `frozenset` of three user ids.

In [None]:
class Triangles(MapReduce):
    ...

In [None]:
grader.check("1d")

**1(e)** (5 pts) Social networks love to remind you how many friends you have in common with other people. Write a MapReduce class that returns, for every pair friends in the dataset, the number of friends that they have in common (not including each other). The returned data structure should be a `dict` whose keys are `frozenset()`s containing two userids, and values are integers giving the number of friends in common.

In [None]:
class CommonFriends(MapReduce):
    ...

In [None]:
grader.check("1e")

## Question 2: Conway's game of life

[Conway's Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) 
is a game devised by the late mathematician John Conway.

The game is played on an $m$-by-$n$ board, which we
will represent as an $m$-by-$n$ Numpy array. 
Each cell of the board (i.e., entry of our matrix), is
either alive (which we will represent as a $1$) or dead (which we will
represent as a $0$). 

The game proceeds in steps. At each step of the game, the board evolves according to the following rules:

-   A live cell with fewer than two live neighbors becomes a dead cell.

-   A live cell with more than three live neighbors becomes a dead cell.

-   A live cell with two or three live neighbors remains alive.

-   A dead cell with *exactly* three live neighbors becomes alive.

-   All other dead cells remain dead.

The neighbors of a cell are the 8 cells adjacent to it, i.e., left,
right, above, below, upper-left, lower-left, upper-right and
lower-right. Thus, in a $5\times 5$ game of life, the neighors of the middle cell are (shown in black):
```
⬜⬜⬜⬜⬜
⬜⬛⬛⬛⬜
⬜⬛⬜⬛⬜
⬜⬛⬛⬛⬜
⬜⬜⬜⬜⬜
```
For cells that are on the boundary, we will follow the convention that the board is 
*toroidal*, meaning that it wraps around the other side. So the neighbors of the top-left most
square are (again shown in black):
```
⬜⬛⬜⬜⬛
⬛⬛⬜⬜⬛
⬜⬜⬜⬜⬜
⬜⬜⬜⬜⬜
⬛⬛⬜⬜⬛
```
In matrix notation, this means that the top-left neighbor of cell $(i,j)$ is $(i-1 \mod m, j-1 \mod n)$, etc.

Write a class `GameOfLife` that plays this game:

In [None]:
import numpy as np


class GameOfLife:
...

Instances of the class should behave as follows:

**2(a)** (2 pts) The constructor of `GameOfLife` should accept a single argument, which is a two-dimensional Numpy integer array, and perform validation. A starting board is valid if it contains only zero and ones. (You may assume that the input is a 2-d Numpy integer array, but your constructor should check the part about zeros and ones.)

In [None]:
grader.check("2a")

**2(b)** (3 pts) Instances of `GameOfLife` should return a string representation of the current state of the game:
```
>>> I = np.eye(5, dtype=int)  # 5x5 identity matrix
>>> g = GameOfLife(I)
>>> print(g)
⬛⬜⬜⬜⬜
⬜⬛⬜⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬜⬛⬜
⬜⬜⬜⬜⬛
```
In the string representation, use the Unicode characters "⬛" to denote a live cell and "⬜" to denote a dead cell. In Python, these can be inputted as:

In [None]:
"\u2b1c", "\u2B1B"

In [None]:
grader.check("2b")

**2(c)** (10 pts) `GameOfLife` instances should be iterable. Calling `next` on the instance should return the next state of the game, represented as a Numpy array, which is determined by applying the rules stated above. If the game terminates, meaning that the board remains the same from one turn to the next, then the iterator terminates (by raising a `StopIteration`).

Here is an example of the game played using a $5 \times 5$ grid and a pattern that oscillates back and forth:

```
>>> blinker = np.zeros((5, 5), dtype=int) 
>>> blinker[1:4, 2] = 1
>>> g = GameOfLife(blinker)
>>> print(g)
⬜⬜⬜⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬜⬜⬜
>>> next(g)
[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 1 1 1 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
>>> print(g)
⬜⬜⬜⬜⬜
⬜⬜⬜⬜⬜
⬜⬛⬛⬛⬜
⬜⬜⬜⬜⬜
⬜⬜⬜⬜⬜
>>> next(g)
[[0 0 0 0 0]
 [0 0 1 0 0]
 [0 0 1 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]]
>>> print(g)
⬜⬜⬜⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬛⬜⬜
⬜⬜⬜⬜⬜
```

*Hint*: the main challenge to this exercise is in dealing with the toroidal matrix. To avoid having to consider special cases, use the fact that the neighbors of cell $(i,j)$ are $(i\pm 1 \mod m, j\pm 1 \mod n)$ where $a \mod b$ denotes the modulus (i.e. `a % b` in Python.)

In [None]:
grader.check("2c")

**2(d)** (just for fun) The following code will print out a Game of Life as it runs:

In [None]:
import ipywidgets as widgets
import time
import itertools

out = widgets.Output()


def play_gol(game, stop=None, wait=0.5):
    "play game of life for stop steps and show the output, waiting wait between each frame."
    for step in itertools.islice(game, stop):
        with out:
            print(game)
            out.clear_output(wait=True)
        time.sleep(wait)


out

Example:

In [None]:
glider = np.zeros((10, 10), dtype=int)
glider[5, 3:6] = 1
glider[4, 5] = 1
glider[3, 4] = 1
g = GameOfLife(glider)
# uncomment next line to view game
# play_gol(g, 10, 0.1)  

Try running it with your own invented initial state. Can you make anything interesting happen? Some people have developed extremely elaborate games, for example:

In [None]:
tr = str.maketrans(".X", "01")
gosper = np.array(
    [list(map(int, row.strip().translate(tr))) for row in open("gosper-glider-gun.txt")]
)
# uncomment next line to view game
# play_gol(GameOfLife(gosper), 100, wait=0.01)

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Upload this .zip file to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)