# Profiling and discussion

## Lorenz 96

### Version Histories and Learnings

#### First Version: Understanding Data Types and Profiling
- **Initial Approach**: Started with a basic implementation, focusing on getting the logic correct.
- **Key Learning**: Realized the importance of specifying data types for optimization in NumPy.
- **Performance**: Profiling with `timeit` showed a runtime of 1.03 milliseconds for a certain array size.

#### Second Version: Embracing Vectorization
- **Optimization Attempt**: Replaced nested loops with NumPy's vectorization capabilities.
- **Trade-off**: Noticed slower performance for smaller step sizes but gains in larger simulations.
- **Performance**: Runtime improved to 228 microseconds for larger steps.

#### Third Version: Memory Efficiency
- **Consultation Insights**: ChatGPT suggested improvements in memory management.
- **Change Implemented**: Avoided unnecessary array creation for each iteration.
- **Result**: The performance did not improve as expected, indicating a need for further refinement.

#### Fourth Version: Strategic Vectorization
- **Algorithmic Adjustment**: Divided the computation into sections to optimize the vectorization process.
- **Outcome**: Achieved significant speed improvements without compromising the model's integrity.
- **Performance**: Solidified the gains, maintaining a runtime of 228 microseconds.

### Post-Assignment Enhancements

#### Continued Learning and Application
- **Advanced Techniques**: After the assignment, I delved into more advanced NumPy features to improve performance further.
- **Code Quality**: Refactored the code for better readability and maintainability, aligning with Python's best practices.

#### Algorithmic Refinements
- **Boundary Conditions**: Investigated more efficient methods for handling periodic boundary conditions.
- **Pseudocode/Diagrams**: [Include pseudocode or diagrams of any new algorithmic approaches here.]

#### Emphasizing Code Quality
- **Refactoring**: Modularized the code for better clarity and reuse.
- **Commenting**: Enhanced comments and documentation for future users and contributors.

During the assigment I was not able to specify which array sizes I specifically tested for but now I will test the function at the time of assigment with my improved one post assignment for a small array and a larger one.

Old version of function:

In [1]:
# Add profiling code here
import numpy as np


def lorenz96_old(initial_state, nsteps, constants=(1 / 101, 100, 8)):
    """
    Perform iterations of the Lorenz 96 update.

    Parameters
    ----------
    initial_state : array_like or list
        Initial state of lattice in an array of floats.
    nsteps : int
        Number of steps of Lorenz 96 to perform.

    Returns
    -------
    numpy.ndarray
        Final state of lattice in an array of floats
    """

    alpha, beta, gamma = constants
    state = np.array(initial_state, dtype=float)
    N = len(state)
    new_state = np.empty_like(state)  # Create a new state array

    for _ in range(nsteps):
        new_state[0] = alpha * (
            (beta * state[0]) + (state[N - 2] - state[1]) * state[N - 1] + gamma
        )

        # Compute the second element
        new_state[1] = alpha * (
            beta * state[1] + (state[0] - state[2]) * state[N - 1] + gamma
        )
        
        # Compute the elements between 2 and N-2
        new_state[2:N - 1] = alpha * (
            beta * state[2:N - 1] +
            (state[0:N - 3] - state[3:N]) * state[1:N - 2] +
            gamma
        )

        # Compute the last element
        new_state[N - 1] = alpha * (
            beta * state[N - 1] + (state[N - 3] - state[0]) * state[N - 2] + gamma
        )

        # Update the state array
        state[:] = new_state


    return state

In [2]:
initial_state = np.full(49, 8.0) # Create an array of 49 8.0s
initial_state = np.insert(initial_state, 2, 9.0) # Insert a 9.0 at index 2
nsteps = 50 # Number of steps to perform

In [3]:
from automata import lorenz96
timeit_result = %timeit lorenz96(initial_state, nsteps)

710 µs ± 9.85 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [4]:
%timeit lorenz96_old(initial_state, nsteps)

234 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [11]:
initial_state = np.full(9, 8.0) # Create an array of 49 8.0s
initial_state = np.insert(initial_state, 2, 9.0) # Insert a 9.0 at index 2
nsteps = 1 # Number of steps to perform

In [12]:
print(lorenz96(initial_state, nsteps))
print()
print(lorenz96_old(initial_state, nsteps))

[7.92079208 8.         8.99009901 8.07920792 8.         8.
 8.         8.         8.         8.        ]

[8.         7.92079208 8.99009901 8.         8.07920792 8.
 8.         8.         8.         8.        ]


In [15]:
# Add profiling code here
import cProfile


pr = cProfile.Profile()
pr.enable()
lorenz96(initial_state, nsteps)
pr.disable()

pr.print_stats(sort='time')

         3290 function calls (2690 primitive calls) in 0.003 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  300/150    0.001    0.000    0.002    0.000 numeric.py:1147(roll)
      101    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        1    0.000    0.000    0.003    0.003 automata.py:8(lorenz96)
  450/150    0.000    0.000    0.002    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
  300/150    0.000    0.000    0.002    0.000 <__array_function__ internals>:177(roll)
      150    0.000    0.000    0.000    0.000 numeric.py:1348(normalize_axis_tuple)
      150    0.000    0.000    0.000    0.000 <__array_function__ internals>:177(empty_like)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
      150    0.000    0.000    0.000    0.000 numeric.py:1398(<listcomp>)
      150    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.nda

This was a test with a small array so the effects of vectorisation is not apparent. Lets try with a bigger array:

In [9]:
# Example of larger initial state and more steps
initial_state = np.full(999, 8.0)  # A larger array
initial_state = np.insert(initial_state, 2, 9.0)  # Insert a 9.0 at index 2
nsteps = 50  # More steps

In [10]:
%timeit lorenz96(initial_state, nsteps)

818 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [11]:
%timeit lorenz96_old(initial_state, nsteps)

314 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Game of Life

In [87]:
import numpy as np


# version 2
def life(initial_state, nsteps, rules="basic", periodic=False):
    """
    Perform iterations of Conway’s Game of Life.
    Parameters
    ----------
    initial_state : array_like or list of lists
        Initial 2d state of grid in an array of ints.
    nsteps : int
        Number of steps of Life to perform.
    rules : str
        Choose a set of rules from "basic", "2colour" or "3d".
    periodic : bool
        If True, use periodic boundary conditions.
    Returns
    -------
    numpy.ndarray
         Final state of grid in array of ints.
    """

    # write your code here to replace return statement
    state = np.array(initial_state, dtype=int)

    # Determine if we're working in 2D or 3D
    if rules == "3d":
        if state.ndim != 3:
            raise ValueError("Invalid grid dimension!")
        rows, cols, depth = state.shape
        for _ in range(nsteps):
            next_state = state.copy()

            for i in range(rows):
                for j in range(cols):
                    for k in range(depth):
                        total = 0  # Count the neighbors

                        for x in [-1, 0, 1]:
                            for y in [-1, 0, 1]:
                                for z in [-1, 0, 1]:
                                    if x == 0 and y == 0 and z == 0:
                                        continue  # Skip the current cell
                                    ni, nj, nk = i + x, j + y, k + z
                                    if periodic:  # Handle periodic boundary conditions
                                        ni %= rows
                                        nj %= cols
                                        nk %= depth
                                    elif (
                                        ni < 0
                                        or ni >= rows
                                        or nj < 0
                                        or nj >= cols
                                        or nk < 0
                                        or nk >= depth
                                    ):
                                        continue  # Skip out of bounds

                                    total += state[ni, nj, nk]

                            if state[i, j, k] != 0:  # If cell is alive
                                if total < 5 or total > 6:  # Die
                                    next_state[i, j, k] = 0
                            else:  # Dead cell
                                if total == 4:
                                    next_state[i, j, k] = 1  # Birth

            state = next_state
        return state
    else:
        if state.ndim != 2:
            raise ValueError("Invalid grid dimension!")
        rows, cols = state.shape
        for _ in range(nsteps):
            next_state = state.copy()

            for i in range(rows):
                for j in range(cols):
                    total = 0
                    blue_neighbours = 0
                    red_neighbours = 0
                    for x in [-1, 0, 1]:
                        for y in [-1, 0, 1]:
                            if x == 0 and y == 0:
                                continue
                            ni, nj = i + x, j + y
                            if periodic:
                                ni %= rows
                                nj %= cols
                            elif ni < 0 or ni >= rows or nj < 0 or nj >= cols:
                                continue

                            if state[ni][nj] > 0:
                                total += 1
                                if state[ni][nj] == 1:
                                    blue_neighbours += 1
                                else:
                                    red_neighbours += 1

                    if rules == "basic":
                        if state[i][j] == 1:  # If cell is alive
                            if total < 2 or total > 3:  # Die
                                next_state[i][j] = 0
                        else:  # Dead cell
                            if total == 3:
                                next_state[i][j] = 1

                    elif rules == "2colour":
                        if state[i][j] == 1 or state[i][j] == 2:  # If cell is alive
                            if total < 2 or total > 3:  # Die
                                next_state[i][j] = 0
                        else:  # Dead cell
                            if total == 3:
                                if blue_neighbours > red_neighbours:
                                    next_state[i][j] = 1
                                else:
                                    next_state[i][j] = 2

            state = next_state

        return state


initial_state = (
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
)

In [89]:
%timeit life(initial_state, 50,rules = "basic",periodic=False)

1.85 ms ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Initial Code Issues:
The first iteration of the code i recieved from chatgpt was riddled with bugs and missing logic such as code for seperate colour in the "2d colour" extension.

## Code Skeleton Development:
I developed a skeleton to validate array inputs for specified extensions, extracting rows, columns, and optionally, depth. The primary focus was on the 2D code version.

## Unified Function Approach:
Despite separate functions from ChatGPT, I centralized all code into one function, reducing overhead from function calls but sacrificing readability. This was aimed at obtaining a lower duration from the timeit function, even as other standard code refinements were applied to enhance and rectify the code.

## Optimization Steps:
With working code validated by tests, several optimization approaches were pursued:

- **Simplifying Conditionals:** Minimize computational complexity by simplifying conditionals and logical paths.
- **Periodic Boundary Management:** Utilize modular arithmetic to manage periodic boundary conditions and decrease the use of conditionals.
- **Utilizing Numpy:** Implement Numpy for array operations, leveraging its superior computational efficiency over Python loops.

## Indexing Issue:
A perplexing issue arose when faster array indexing surprisingly resulted in increased runtime. Despite the test cases running correctly, the cause remains undetermined and unresolved by GPT.