In [1]:
%%HTML
<style>

.CodeMirror {
    width: 100vw;
}

.container {
    width: 79% !important;
}

.rendered_html {
  font-size:0.8em;
}
.rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {
     font-size: 100%;
}

</style>

# Performance

http://docs.micropython.org/en/latest/pyboard/reference/constrained.html

[Writing fast and efficient MicroPython (Youtube)](https://www.youtube.com/watch?v=hHec4qL00x0&t=1261s) - Damien George

![innerworkings](images/performance/1_micropython_inner_workings.png)

![innerworkings](images/performance/2_micropython_bytecodes.png)

![innerworkings](images/performance/3_bytecode_example.png)

![innerworkings](images/performance/4_memory_allocation.png)

![innerworkings](images/performance/5_cpu_time.png)

![innerworkings](images/performance/6_ram.png)

# Example: blinking leds

``` python
led = machine.Pin('LED_BLUE')
N = 200000

for i in range(N):
    led.on()
    led.off()
```

57.93 kblinks/sec

## Reasonable optimization (182.39 kblinks/sec)

![innerworkings](images/performance/7_example.png)

## Hardcore optimizations

### @micropython.viper

- directly writing GPIO registers
- 12890 kblinks/sec

### @micropython.asm_thumb

- directly writing GPIO registers - in assembler..
- 27359 kblinks/sec 
- 500 times faster than initial code

![innerworkings](images/performance/8_example_assembler.png)

Compiler can emit machine code

> ahead-of-time (AOT) compilation is the act of compiling a higher-level programming language such as C or C++, or an intermediate representation such as Java bytecode or .NET Framework Common Intermediate Language (CIL) code, into a native (system-dependent) machine code so that the resulting binary file can execute natively.

## Flash

> For reasons connected with the physical architecture of the flash memory part of this capacity may be inaccessible as a filesystem. In such cases this space may be employed by incorporating user modules into a firmware build which is then flashed to the device.


### custom build

![image.png](https://cdn-learn.adafruit.com/assets/assets/000/035/165/medium800/microcontrollers_Screen_Shot_2016-08-27_at_9.49.51_PM.png?1472360111)

**scripts/**: raw Python code stored in the board's flash memory.  

**modules/**: using cross-compiler to store python modules as bytecode




### frozen modules (.mpy)

Frozen modules store the Python source with the firmware. 

### frozen bytecode

Frozen bytecode uses the cross compiler to convert the source to bytecode which is then stored with the firmware.

**Scripts/**: This saves you from having to copy that code onto the board's filesystem, but doesn't save a lot of memory or processing time.


### Steps:

1. Clone the MicroPython repository.
- Acquire the (platform specific) toolchain to build the firmware.
- Build the cross compiler.
- Place the modules to be frozen in a specified directory (dependent on whether the module is to be frozen as source or as bytecode).
- Build the firmware. A specific command may be required to build frozen code of either type - see the platform documentation.
- Flash the firmware to the device.

# RAM

- compilation phase
- execution phase

## Compilation Phase

#### compiling during runtime requires overhead RAM

- import module -> compiled to bytecode -> RAM
- instantiated objects -> RAM
- The compiler itself requires RAM


* **Limit module imports**
* __Avoid global objects in imported modules__

> In general it is best to avoid code which runs on import; a better approach is to have initialisation code which is run by the application after all modules have been imported.


When a module is imported, MicroPython compiles the code to bytecode which is then executed by the MicroPython virtual machine (VM). The bytecode is stored in RAM. The compiler itself requires RAM, but this becomes available for use when the compilation has completed.

### Frozen modules

### Frozen bytecode

on most platforms this saves even more RAM as the bytecode is run directly from flash rather than being stored in RAM.

## Exection phase

## Constants

```
from micropython import const
ROWS = const(33)
_COLS = const(0x10)
a = ROWS
b = _COLS
```

- compiler substitutes identifier with numeric value in bytecode 
  - this avoids a dictionary lookup at runtime.
- anything that evaluates to integer compile time

In both instances where the constant is assigned to a variable the compiler will avoid coding a lookup to the name of the constant by substituting its literal value. This saves bytecode and hence RAM. However the ROWS value will occupy at least two machine words, one each for the key and value in the globals dictionary. The presence in the dictionary is necessary because another module might import or use it. This RAM can be saved by prepending the name with an underscore as in _COLS: this symbol is not visible outside the module so will not occupy RAM.

The argument to const() may be anything which, at compile time, evaluates to an integer e.g. 0x100 or 1 << 8. It can even include other const symbols that have already been defined, e.g. 1 << BIT.

## Constant data structures

- Store data as bytes in frozen bytecode.
- Use ustruct to convert between bytes and python built-in types
    - strings
    - numeric data

The compiler ‘knows’ that bytes objects are immutable and ensures that the objects remain in flash memory rather than being copied to RAM.


Where there is a substantial volume of constant data and the platform supports execution from Flash, RAM may be saved as follows. The data should be located in Python modules and frozen as bytecode. The data must be defined as bytes objects. The compiler ‘knows’ that bytes objects are immutable and ensures that the objects remain in flash memory rather than being copied to RAM. The ustruct module can assist in converting between bytes types and other Python built-in types.

When considering the implications of frozen bytecode, note that in Python strings, floats, bytes, integers and complex numbers are immutable. Accordingly these will be frozen into flash. Thus, in the line

`mystring = "The quick brown fox"`

the actual string “The quick brown fox” will reside in flash. At runtime a reference to the string is assigned to the variable mystring. The reference occupies a single machine word. In principle a long integer could be used to store constant data:

`bar = 0xDEADBEEF0000DEADBEEF`

As in the string example, at runtime a reference to the arbitrarily large integer is assigned to the variable bar. That reference occupies a single machine word.

### String concatenation

```

var1 = "foo" "bar"
var2 = """\
foo\
bar"""
```

Creates at compile time

`var = "foo" + "bar"`

Creates "foo", "bar" and var at runtime

> The best way to create dynamic strings is by means of the string format() method:

`var = "Temperature {:5.2f} Pressure {:06d}\n".format(temp, press)`

## Buffers


When accessing devices such as instances of UART, I2C and SPI interfaces, using pre-allocated buffers avoids the creation of needless objects. Consider these two loops:

```
while True:
    var = spi.read(100)
    # process data

buf = bytearray(100)
while True:
    spi.readinto(buf)
    # process data in buf
```

The first creates a buffer on each pass whereas the second re-uses a pre-allocated buffer; this is both faster and more efficient in terms of memory fragmentation.

## Heap

## Heap fragmentation

<img width="500px" src="https://upload.wikimedia.org/wikipedia/commons/4/4a/External_Fragmentation.svg" align="right" alt="Alt text that describes the graphic" title="https://commons.wikimedia.org/wiki/File:External_Fragmentation.svg">


- <font color="red">Minimise</font> the repeated creation and destruction of objects!
- instantiate large buffers/objects early
- periodic use of `gc.collect()` (few ms)

> The discourse on this is somewhat involved. For a ‘quick fix’ issue the following periodically:

```
gc.collect()
gc.threshold( (gc.mem_free() // 10) + gc.mem_alloc())
```

This will provoke a GC when more than 10% of the currently free heap becomes occupied.

## Commands

In [2]:
import gc
import micropython
gc.collect()
micropython.mem_info()
print('-----------------------------')
print('Initial free: {} allocated: {}'.format(gc.mem_free(), gc.mem_alloc()))
def func():
    a = bytearray(10000)
gc.collect()
print('Func definition: {} allocated: {}'.format(gc.mem_free(), gc.mem_alloc()))
func()
print('Func run free: {} allocated: {}'.format(gc.mem_free(), gc.mem_alloc()))
gc.collect()
print('Garbage collect free: {} allocated: {}'.format(gc.mem_free(), gc.mem_alloc()))
print('-----------------------------')
micropython.mem_info(1)

ModuleNotFoundError: No module named 'micropython'

- gc.collect() Force a garbage collection.
- micropython.mem_info() Print a summary of RAM utilisation.
- gc.mem_free() Return the free heap size in bytes.
- gc.mem_alloc() Return the number of bytes currently allocated.
- micropython.mem_info(1) Print a table of heap utilisation (detailed below).

Error messages can be strange

```
with open("file.txt") as f:
    f.write("hello world")
```

## Performance

- Locally scope your variables
- Avoid floating Point arithmetric
- Use allocate buffers (instead of appending list)

Some MicroPython ports allocate floating point numbers on heap. Some other ports may lack dedicated floating-point coprocessor, and perform arithmetic operations on them in “software” at considerably lower speed than on integers.

- Caching object references

In [None]:
class Tank(object):
    def __init__(self):
        self.targets = bytearray(100)
    def identify_target(self, obj_display):
        targets_ref = self.targets
        fb = obj_display.framebuffer
        (...)