# Introduction to Optimisation

## 1.1 Introduction

- Having found the bottleneck parts if your code, we can consider how to start improving them.
- Big part of knowing how to optimise your code lies in understanding how a computer executes it at a high level. 
- For this reason, we would need to know how some data structures and algorithms work. 
- We also need to know how Python works, because if you had learned how to write code out of necessity to get something to work, it is common that you have not fully learnt Pythonic programming, which can be reflected in your code's performance.  
- In this section we will be learning transferable, more theoretical knowledge which you can use in any language to make informed choices when you write your code, since the data structures and algorithms used commonly in many languages. 

## 1.2 Performance vs Maintainability 

- Premature optimisation is the root of all evil - Donald Knuth.
- Firstly, it is important to understand that you not obsess over optimising code the first time you do it, as it usually comes with iteration.
- Secondly, we need to consider both performance and maintainability, since making code more optimised comes with the trade-off of making it harder to understand for you and other developers on your team
- Sometimes if you don't look at the code for long enough, you might find it difficult to understand later.
- So make sure that, when you use profiling you optimise the most inefficient parts of your code first.

## 1.3 Performance of Python

- Now, we are going to learn a bit more about Python.
- As you might now, Python is an interpreter language.
- There are also type of languages called compiler languages like C.
- You may have heard that python is slow because it is an interpreter language...
- Let's break down why that is

Compiled languages (like C):
- Your source code is translated beforehand into machine code that the CPU can run directly. This means the program runs very fast, because there’s no middleman at runtime. But it also means the program is tied to the platform it was compiled for — if you move to another system, you may need to recompile it.

Interpreted languages (like Python):
- Here, the source code isn’t turned into machine code ahead of time. Instead, the interpreter reads and executes the code as the program runs. This makes things more flexible and portable, but slower, because the interpreter is constantly working in the background.

- In C, integers and other basic types are just raw memory. The programmer must track how data is used, but in return, the compiler can directly optimise the code for the hardware. This gives high performance.

In [None]:
# C code
int a = 1;
int b = 2;
int c = a + b;

- In Python, everything is treated as an object. The interpreter adds extra information to keep track of types and memory at runtime. This makes programming much easier and more flexible, but the extra book-keeping costs time and memory.

In [None]:
# Python code
a = 1
b = 2
c = a + b

- We can test this hands on
- Lets see how much memory strings, arrays and integers take up n C
- All the measurements come in bytes 

In [None]:
# C code to test sizes of different data types
# https://www.onlinegdb.com/online_c_compiler
#include <stdio.h>
#include <string.h>

int main() {
    // Strings in C are just arrays of chars, ending with '\0'
    printf("sizeof(\"\") = %zu\n", sizeof(""));      // 1 (just '\0')
    printf("sizeof(\"a\") = %zu\n", sizeof("a"));    // 2 ('a' + '\0')
    printf("sizeof(\"ab\") = %zu\n", sizeof("ab"));  // 3 ('a','b','\0')
    
    char a = 'a'; // string literal
    printf("sizeof(\"a\") = %zu\n", sizeof(a)); // 1

    // Arrays: size depends on number of elements * size of element
    int arr1[1];
    int arr10[10];

    printf("sizeof(arr1) = %zu\n", sizeof(arr1));   // 4
    printf("sizeof(arr10) = %zu\n", sizeof(arr10)); // 40 (10 * 4)

    // Integers: always 4 bytes
    int x = 1;
    printf("sizeof(x) = %zu\n", sizeof(x)); // 4

    return 0;
}

- Now, lest see the same for python data
- As you can see, each data type takes up more memory because the also store some internal information used by the interpreter 

In [None]:
import sys

sys.getsizeof("")  # 41
sys.getsizeof("a")  # 42
sys.getsizeof("ab")  # 43

sys.getsizeof([])  # 56
sys.getsizeof(["a"])  # 64

sys.getsizeof(1)  # 28

- We gain simplicity in code, however, we lose performance 
- Most of the time, Python is “fast enough” for everyday tasks. But there are some cases where performance really matters — like heavy math, simulations, or processing large datasets.
- To deal with this, Python can “borrow speed” from lower-level languages such as C, Fortran, or Rust. Many libraries take advantage of this trick: they do the heavy lifting in fast, low-level code, and then return the result to you as a normal Python object.
- Example: we’ll see this with NumPy later, but even parts of the Python standard library work this way.

Instead of writing step-by-step low-level instructions yourself, it’s usually best to:

- 1) Tell the interpreter or library what you want done (at a high level).

- 2) Let the library figure out how to do it under the hood.

This has two big benefits:

- 1) The library can run the heavy work in fast low-level code, adding overhead only once when it hands back the Python object.

- 2) Your code becomes more readable: others can see your intent clearly, without being distracted by low-level details.

## 1.4 Ensuring reproducible results + pytest Overview

- When you optimise existing code, especially after your first iteration of some simple code, it is easy to make subtle mistakes
- To make sure your code change does not lead to mistakes, it is crucial to check that the code remains correct
- The best tools for that is testing, which is an integral part of any development process
- It help to catch unexpected behaviour and ensure everything works as intended 

- The most commonly used testing package for is Pytest and it's also great for starters
- It automatically finds and runs your test functions
- Test functions check your codes output against what you expect 

How pytest finds tests
- 1) Put your tests in a `tests/` directory.
- 2) Test file names should be written as:
- 3) test_*.py (e.g. test_math.py), or
- 4) *_test.py (e.g. math_test.py).
- Inside those files, any function whose name starts with test_ will be run by pytest.

In [None]:
# file: test_demonstration.py

# A simple function to be tested, this could instead be an imported package
def squared(x):
    return x**2

# A simple test case
def test_example():
    assert squared(5) == 24

Here, we’re checking whether squared(5) is equal to 24.
- From the terminal, run Pytest
- Pytest will search for tests, run them, and show the results.
- In this case the test fails because squared(5) is actually 25, not 24.

What pytest tells you
- Which tests passed or failed.
- Why a test failed (it shows both values).
- Whilst not designed for benchmarking, it does provide the total time the test suite took to execute.

This is just the beginning! There are other more advanced features common to other testing frameworks such as 
- Fixtures: prepare data or state for your tests.
- Mocking: fake parts of the system, so you can test in isolation.
- Skipping: tell Pytest to skip certain tests.
- You’ll likely explore these features as your projects grow.