# Project: Conway's Game of Life - CPU vs GPU Implementation 

In this project, we are going to implement **Conway's Game of Life**, a classic cellular automaton in two ways: first using NumPy (to run on the CPU) and then using CuPy (to run on the GPU). We'll also visualise the evolution of the Game of Life grid to see the computation in action. 

## What is Conway's Game of Life?
It's a zero-player game devised by John Conway, where you have a grid of cells that live or die based on a few simple rules:
Each cell can be "alive" (1) or "dead" (0).
At each time step (generation), the following rules apply to every cell simultaneously:
Any live cell with fewer than 2 live neighbours dies (underpopulation).
Any live cell with 2 or 3 live neighbours lives on to the next generation (survival).
Any live cell with more than 3 live neighbours dies (overpopulation).
Any dead cell with exactly 3 live neighbours becomes a live cell (reproduction).
Neighbours are the 8 cells touching a given cell horizontally, vertically, or diagonally.
From these simple rules emerges a lot of interesting behaviour – stable patterns, oscillators, spaceships (patterns that move), etc. It's a good example of a grid-based simulation that can benefit from parallel computation because the state of each cell for the next generation can be computed independently (based on the current generation).

## Visualisation of Game of Life
To make this project more visually engaging, below is an **animated GIF** showing an example of a Game of Life simulation starting from a random initial configuration. White pixels represent live cells, and black pixels represent dead cells. You can see patterns forming, moving, and changing over time:
An example evolution of Conway's Game of Life over a few generations (white = alive, black = dead).
(The animation demonstrates how random initial clusters of cells can evolve into interesting patterns. Notice some cells blink on and off or form moving patterns.)
Now, let's implement the simulation ourselves.

## Implementation using NumPy (CPU) 
We will use a 2D grid (NumPy array) to represent the Game of Life board. A straightforward way to update the grid is 
For each cell, count the number of living neighbours around it. 
Apply the game of life rules to determine if the cell will be alive or dead in the next generation. 

A naive implementation might use nested loops to check neighbours, but that would be slow in Python. Instead, we can use NumPy's array operations to do this more efficiently by operating on whole arrays at once. 

### NumPy Code for Game of Life


```python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import Image, display

def life_step(grid):
    neighbours = (
        np.roll(np.roll(grid, 1, axis=0), 1, axis=1) +
        np.roll(np.roll(grid, 1, axis=0), -1, axis=1) +
        np.roll(np.roll(grid, -1, axis=0), 1, axis=1) +
        np.roll(np.roll(grid, -1, axis=0), -1, axis=1) +
        np.roll(grid, 1, axis=0) +
        np.roll(grid, -1, axis=0) +
        np.roll(grid, 1, axis=1) +
        np.roll(grid, -1, axis=1)
    )
    # Alive next if exactly 3 neighbours, or stays alive with 2
    return np.where((neighbours == 3) | ((grid == 1) & (neighbours == 2)), 1, 0)

# --- Parameters ---
N = 100         # grid size
frames = 50     # number of frames in GIF
interval = 200  # ms between frames

# Initialize random grid
grid = np.random.choice([0, 1], size=(N, N), p=[0.8, 0.2])

# Set up plot
fig, ax = plt.subplots(figsize=(6, 6))
im = ax.imshow(grid, cmap='binary')
ax.set_axis_off()

def update(frame):
    global grid
    grid = life_step(grid)
    im.set_data(grid)
    return [im]

# Build the animation
anim = animation.FuncAnimation(
    fig, update,
    frames=frames,
    interval=interval,
    blit=True
)

# Save to GIF (requires pillow)
anim.save('game_of_life_cpu.gif', writer='pillow', dpi=80)

# Display in notebook
display(Image(filename='game_of_life_cpu.gif'))

```

We use `np.roll` to shift the grid up, down, left, right and on diagonals to get the neighbour counts. For instance, `np.roll(grid, 1, axis=0)` shifts everything up by one (so element [i,j] moves to [i+1,j], meaning it was a neighbour below), and `np.roll(grid, 1, axis=1)` shifts everything left. By combining these, we get all eight neighbours.
The sum of those rolled arrays in `neighbours` gives, for each cell, the number of alive neighbours around it.
Then, we use `np.where` to apply the rules vectorized across the array:
`(neighbours == 3)` corresponds to the reproduction rule (dead with three neighbours becomes alive).
`(grid == 1) & (neighbours == 2)` corresponds to a live cell with 2 neighbours staying alive.
The logical OR of those indicates cells that should be alive. We set those to 1 and all others to 0.

This approach avoids explicit Python loops and is quite efficient on the CPU for moderately large grids (NumPy uses C under the hood).

**Boundary conditions**: The above code uses np.roll, which means the edges wrap around (the grid is conceptually on a torus). This is a common approach for Game of Life. If you wanted edges to be always dead or non-wrapping, you’d have to handle boundaries separately (e.g., pad the array with zeros, count, then remove padding).

## Implementation using CuPy (GPU)
the power of **CuPy** is that we can write code very similar to NumPy, but the arrays live on the GPU, and the computations happen on the GPU. In many cases, you can take the NumPy code and switch to CuPy by just changing the import. 

```python
import cupy as cp
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import Image, display

def life_step(grid):
    # count neighbours on GPU
    neighbours = (
        cp.roll(cp.roll(grid, 1, axis=0), 1, axis=1) +   # up-left
        cp.roll(cp.roll(grid, 1, axis=0), -1, axis=1) +  # up-right
        cp.roll(cp.roll(grid, -1, axis=0), 1, axis=1) +  # down-left
        cp.roll(cp.roll(grid, -1, axis=0), -1, axis=1) + # down-right
        cp.roll(grid, 1, axis=0) +    # up
        cp.roll(grid, -1, axis=0) +   # down
        cp.roll(grid, 1, axis=1) +    # left
        cp.roll(grid, -1, axis=1)     # right
    )
    # apply rules on GPU
    return cp.where((neighbours == 3) | ((grid == 1) & (neighbours == 2)),
                    1, 0)

# Parameters
N = 100         # grid size
frames = 50     # number of frames
interval = 200  # ms between frames

# Initialize a random grid on GPU (20% alive)
grid = (cp.random.random((N, N)) < 0.2).astype(cp.int32)

# Set up Matplotlib figure (needs initial CPU array)
fig, ax = plt.subplots(figsize=(6, 6))
im = ax.imshow(cp.asnumpy(grid), cmap='binary')
ax.set_axis_off()

def update(frame):
    global grid
    # step on GPU
    grid = life_step(grid)
    # copy back to CPU for display
    im.set_data(cp.asnumpy(grid))
    return [im]

# Create GPU-backed animation
anim = animation.FuncAnimation(
    fig, update,
    frames=frames,
    interval=interval,
    blit=True
)

# Save and display
anim.save('game_of_life_gpu.gif', writer='pillow', dpi=80)
display(Image(filename='game_of_life_gpu.gif'))
```

You’ll notice this code is almost identical to the NumPy version. The key differences:
- We use `cupy as cp` instead of numpy.
- We create the initial grid with `cp.random.choice` (note: CuPy’s random works a bit differently; in this case it should work similarly to NumPy’s).
- After running on GPU, if we want to inspect or visualize the final array in Python, we use `cp.asnumpy` to transfer the data back to the CPU as a NumPy array.

This GPU version will likely run faster than the CPU version for sufficiently large grids (especially if N is large, like 1000 or more). However, keep in mind a few things:
- For very small grids (say 10x10), the GPU overhead might not be worth it.
- We are using `cp.roll`, which, under the hood, launches GPU kernels. There might be more optimized ways (like a single custom kernel to count neighbours) that could be even faster, but our goal here was to keep it simple and similar to the NumPy approach.
- We haven’t explicitly measured time here, but you could use Python’s time module or `%timeit` in a Jupyter notebook to compare the CPU and GPU versions. Just remember to synchronize GPU calls if timing (CuPy operations are asynchronous by default, so use `cp.cuda.Stream.null.synchronize()` before stopping the timer, or use CuPy’s built-in benchmarking utilities).

You will notice that before visualising the GPU simulation, we need to transfer the data to the CPU (`cp.asnumpy`) and use `matplotlib` as usual. There are some GPU-aware visualisation tools, such as DirectX, that could display GPU data directly, but that is outside of the scope of this course. 

## Inefficient NumPy Implementation
```python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import Image, display

def life_step_naive(grid):
    N, M = grid.shape
    new_grid = np.zeros((N, M), dtype=int)
    # For each cell, count neighbours by explicit loops
    for i in range(N):
        for j in range(M):
            count = 0
            # Examine the 8 neighbours manually
            for di in (-1, 0, 1):
                for dj in (-1, 0, 1):
                    if di == 0 and dj == 0:
                        continue
                    ni, nj = i + di, j + dj
                    # wrap around edges
                    if ni < 0:
                        ni = N - 1
                    elif ni >= N:
                        ni = 0
                    if nj < 0:
                        nj = M - 1
                    elif nj >= M:
                        nj = 0
                    count += grid[ni, nj]
            # apply the Game of Life rules one cell at a time
            if grid[i, j] == 1:
                # stays alive only if 2 or 3 neighbours
                new_grid[i, j] = 1 if (count == 2 or count == 3) else 0
            else:
                # dead cell becomes alive only if exactly 3 neighbours
                new_grid[i, j] = 1 if (count == 3) else 0
    return new_grid

# --- Parameters (same as before) ---
N = 100
frames = 50
interval = 200

# Initialize random grid
grid = np.random.choice([0, 1], size=(N, N), p=[0.8, 0.2])

# Set up plot
fig, ax = plt.subplots(figsize=(6, 6))
im = ax.imshow(grid, cmap='binary')
ax.set_axis_off()

def update(frame):
    global grid
    grid = life_step_naive(grid)
    im.set_data(grid)
    return [im]

anim = animation.FuncAnimation(
    fig, update,
    frames=frames,
    interval=interval,
    blit=True
)

anim.save('game_of_life_naive.gif', writer='pillow', dpi=80)
display(Image('game_of_life_naive.gif'))
```

The following is very inefficient compared to the vectorised NumPy version introduced earlier. The main changes that have been made causing this include: 
- **Nested Python Loops**: Instead of eight `np.roll` calls and one `np.where`, we make two loops over `i, j` (10^4 iterations) and two more loops over `di, dj` (9 checks each), for roughly 9x10^4 Python level operation per step. 
- **Manual edge-wrapping logic**: Branching (`if ni < 0 … elif …`) for each neighbour check, instead of the single fast shift that `np.roll` does in C. 
- **Per-cell rule application** The game of life rule is applied with Python `if/else` instead of the single vectorised Boolean mask. 
- **Rebuilding a new NumPy array element-by-element**: writing into `new_grid[i, j]` in Python is orders of magnitude slower than one-shot `np.where`. 

Together, these overheads make this version run considerably slower, particularly as `N` begins to increase, and would not leverage any low-level C loops or GPU acceleration. 
