# [10 Techniques to Speed Up Python Runtime](https://towardsdatascience.com/10-techniques-to-speed-up-python-runtime-95e213e925dc)
## Compare good writing style and bad writing style with the code runtime

Python is a scripting language. Compared with compiled languages like C/C++, Python has some disadvantages in efficiency and performance. However, we could use some techniques to speed up the efficiency of Python code. In this article, I will show you the speed-up techniques I usually used in my work.
The test environment is Python 3.7, with macOS 10.14.6, and 2.3 GHz Intel Core i5.

In [1]:
from time import perf_counter, perf_counter_ns
from functools import wraps

def timethis(func):
    """Decorator that reports the execution time."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start_1 = perf_counter()
        start_2 = perf_counter_ns()
        result = func(*args, **kwargs)
        print(f"{func.__name__}: {round(perf_counter() - start_1, 3)} s")
        print(f"{func.__name__}: {perf_counter_ns() - start_2:_} ns")
        return result
    return wrapper

## 0. Optimization principles
Before diving into the details of code optimization, we need to understand some basic principles of code optimization.
1. Make sure that the code can work normally first. Because making the correct program faster is much easier than making the fast program correct.
2. Weigh the cost of optimization. Optimization comes with a cost. For example, less runtime usually needs more space usage, or less space usage usually need more runtime.
3. Optimization cannot sacrifice code readability.

## 1. Proper Data Types Usage in Python
### 1.1 Replace list with set to check whether an element is in a sequence

In [4]:
import random

In [3]:
# Bad
@timethis
def bad():
    randome_elements = random.sample(range(0, 10000000), 1000)
    list_seq = list(range(1_000_000)) # list

    counter = 0
    for ele in randome_elements:
        if ele in list_seq:
            counter += 1

bad()

bad: 10.42s


According to the TimeComplexity of Python, the average case of $x$ in $s$ operation of list is $O(n)$. On the other hand, the average case of $x$ in $s$ operation of set is $O(1)$.

In [6]:
# Good
@timethis
def good():
    randome_elements = random.sample(range(0, 10000000), 1000)
    list_seq = set(range(1_000_000)) # set

    counter = 0
    for ele in randome_elements:
        if ele in list_seq:
            counter += 1

good()

good: 0.06s


### 1.2 Dictionary initialization with defaultdict

In [7]:
# Bad
@timethis
def bad():
    sentence = 'When I originally wrote this section, there were clear situations where one of the first two approaches was faster. It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.'

    for i in range(100_000):
        wdict = {}  ###
        for word in sentence:
            try:
                wdict[word] += 1
            except KeyError:
                wdict[word] = 1
        del wdict

bad()

bad: 2.66s


We should use **defaultdict** for the initialization.

In [9]:
# Good
@timethis
def good():
    sentence = 'When I originally wrote this section, there were clear situations where one of the first two approaches was faster. It seems that all three approaches now exhibit similar performance (within about 10% of each other), more or less independent of the properties of the list of words.'
    for i in range(100_000):
        wdict = defaultdict(int)  ###
        for word in sentence:
            wdict[word] += 1
        del wdict

good()

good: 2.25s


## 2. Replace list comprehension with generator expressions

In [22]:
# Bad
@timethis
def bad():
    nums_sum_list_comprehension = sum([num**2 for num in range(1_000_000)])

bad()

bad: 0.283 s
bad: 283_231_300 ns


In [23]:
# Good
@timethis
def good():
    nums_sum_generator_expression = sum((num**2 for num in range(1_000_000)))

good()

good: 0.267 s
good: 266_883_100 ns


Another benefit of generator expression is that we can get the result without building and holding the entire list object in memory before iteration. In other words, generator expression saves memory usage.

In [28]:
import sys
# Bad
nums_squared_list = [num**2 for num in range(1_000_000)]
print(sys.getsizeof(nums_squared_list))

# Good
nums_squared_generator = (num**2 for num in range(1_000_000))
print(sys.getsizeof(nums_squared_generator))

8448728
112


## 3. Replace global variables with local variables
We should put the global variables into the function. The local variable is fast than the global variable.

In [33]:
import math

In [34]:
# Bad
size = 10000  ###

@timethis
def bad():
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)
bad()

bad: 25.24 s
bad: 25_239_795_500 ns


In [35]:
# Good
@timethis
def good():
    size = 10000  ###
    for x in range(size):
        for y in range(size):
            z = math.sqrt(x) + math.sqrt(y)
good()

good: 25.402 s
good: 25_402_074_500 ns


## 4. Avoid dot operation
### 4.1 Avoid function access
Every time we use . to access the function, it will trigger specific methods, like **__getattribute__()** and **__getattr__()**. These methods will use the dictionary operation, which will cause a time cost. We can use **from xx import xx** to remove such costs.

In [3]:
# Bad
import math ###

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(math.sqrt(i)) ###
    return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

main: 15.035 s
main: 15_034_908_100 ns


In [2]:
# Good 1
from math import sqrt  ###

def computeSqrt(size: int):
    result = []
    for i in range(size):
        result.append(sqrt(i))  ###
    return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

main: 12.83 s
main: 12_830_631_900 ns


According to the technique 3, We also can assign the global function to a local function.

In [2]:
# Good 2
import math

def computeSqrt(size: int):
    result = []
    sqrt = math.sqrt  ###
    for i in range(size):
        result.append(sqrt(i))  ###
    return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

main: 12.747 s
main: 12_747_164_900 ns


Furthermore, we could assign the list.append() method to a local function.

In [2]:
# Good 3
import math

def computeSqrt(size: int):
    result = []
    append = result.append
    sqrt = math.sqrt    ###
    for i in range(size):
        append(sqrt(i))  ###
    return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        result = computeSqrt(size)

main()

main: 10.654 s
main: 10_654_640_700 ns


### 4.2 Avoid class property access

In [2]:
# Bad
import math
from typing import List

class DemoClass:
    def __init__(self, value: int):
        self._value = value  ###
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        for _ in range(size):
            append(sqrt(self._value))  ###
        return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        result = demo_instance.computeSqrt(size)

main()

main: 12.587 s
main: 12_587_650_900 ns


The speed of accessing **self._value** is slower than accessing a local variable. We could assign the class property to a local variable to speed up the runtime.

In [2]:
# Good
import math
from typing import List

class DemoClass:
    def __init__(self, value: int):
        self._value = value   ###
    
    def computeSqrt(self, size: int) -> List[float]:
        result = []
        append = result.append
        sqrt = math.sqrt
        value = self._value  ###
        for _ in range(size):
            append(sqrt(value))  ###
        return result

@timethis
def main():
    size = 10000
    for _ in range(size):
        demo_instance = DemoClass(size)
        demo_instance.computeSqrt(size)

main()

main: 10.347 s
main: 10_347_134_500 ns


## 5. Avoid Unnecessary Abstraction

In [3]:
# Bad
class DemoClass:
    def __init__(self, value: int):
        self.value = value

    @property
    def value(self) -> int:
        return self._value

    @value.setter
    def value(self, x: int):
        self._value = x

@timethis
def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

main: 0.497 s
main: 496_722_000 ns


When use additional processing layers (such as decorators, property access, descriptors) to wrap the code, it will make the code slow. In most cases, it is necessary to reconsider whether it is necessary to use these layers. Some C/C++ programmers might follow the coding style that using the getter/setter function to access the property. But we could use a more simple writing style.

In [4]:
# Good
class DemoClass:
    def __init__(self, value: int):
        self.value = value  ###

@timethis
def main():
    size = 1000000
    for i in range(size):
        demo_instance = DemoClass(size)
        value = demo_instance.value
        demo_instance.value = i

main()

main: 0.273 s
main: 272_911_800 ns


## 6. Avoid Data Duplication
### 6.1 Avoid meaningless data copying

In [5]:
# Bad
@timethis
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        value_list = [x for x in value]  ###
        square_list = [x * x for x in value_list]  ###

main()

main: 5.918 s
main: 5_918_349_300 ns


The *value_list* is meaningless.

In [6]:
# Good
@timethis
def main():
    size = 10000
    for _ in range(size):
        value = range(size)
        square_list = [x * x for x in value]  ###

main()

main: 4.363 s
main: 4_363_241_000 ns


In [7]:
# Good 2
@timethis
def main():
    size = 10000
    for _ in range(size):
        square_list = [x * x for x in range(size)]  ###

main()

main: 4.355 s
main: 4_355_035_200 ns


### 6.2 Avoid temp variable when changing the value

In [8]:
# Bad
@timethis
def main():
    size = 100_000_000
    for _ in range(size):
        a = 3
        b = 5
        temp = a
        a = b
        b = temp

main()

main: 3.952 s
main: 3_952_630_900 ns


The temp is no need.

In [9]:
# Good
@timethis
def main():
    size = 100000000
    for _ in range(size):
        a = 3
        b = 5
        a, b = b, a  ###

main()

main: 3.683 s
main: 3_682_950_700 ns


### 6.3 Replace + with join() when concatenating strings

In [10]:
# Bad
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    result = ''
    for str_i in string_list:
        result += str_i  ###
    return result

@timethis
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

main: 10.27 s
main: 10_269_917_600 ns


When using $a + b$ to concatenate strings, Python will apply for memory space, and copy a and b to the newly applied memory space respectively. This is because the string data type in Python is an immutable object. If concatenating $n$ string, it will generate $n-1$ intermediate results and every intermediate result will apply for memory space and copy the new string.
On the other hand, **join()** will save time. It will first calculate the total memory space that needs to be applied, and then apply for the required memory at one time, and copy each string element into the memory.

In [11]:
# Good
import string
from typing import List

def concatString(string_list: List[str]) -> str:
    return ''.join(string_list)  ###

@timethis
def main():
    string_list = list(string.ascii_letters * 100)
    for _ in range(10000):
        result = concatString(string_list)

main()

main: 0.343 s
main: 343_470_900 ns


## 7. Utilize the Short Circuit Evaluation of if Statement

In [12]:
# Bad
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations1 = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbreviations2 = {'Miss.', 'Mrs'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i not in abbreviations2 and str_i in abbreviations1:  ###
            result += str_i
    return result

@timethis
def main():
    for _ in range(1_000_000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)

main()

main: 0.736 s
main: 735_913_600 ns


Python uses a short circuit technique to speed truth value evaluation. If the first statement is false then the whole thing must be false, so it returns that value. Otherwise, if the first value is true it checks the second and returns that value.
Therefore, to save runtime, we can follow the below rules:

**if a and b**: The variable $a$ should have a high probability of False, so Python won't calculate $b$.
**if a or b**: The variable $a$ should have a higher probability of True, so Python won't calculate $b$.

In [13]:
# Good
from typing import List

def concatString(string_list: List[str]) -> str:
    abbreviations1 = {'cf.', 'e.g.', 'ex.', 'etc.', 'flg.', 'i.e.', 'Mr.', 'vs.'}
    abbreviations2 = {'Miss.', 'Mrs'}
    abbr_count = 0
    result = ''
    for str_i in string_list:
        if str_i in abbreviations1 and str_i not in abbreviations2:  ###
            result += str_i
    return result

@timethis
def main():
    for _ in range(1000000):
        string_list = ['Mr.', 'Hat', 'is', 'Chasing', 'the', 'black', 'cat', '.']
        result = concatString(string_list)

main()

main: 0.623 s
main: 623_331_100 ns


## 8. Loop optimization
### 8.1 Replace while with for

In [14]:
# Bad
def computeSum(size: int) -> int:
    sum_ = 0
    i = 0
    while i < size:
        sum_ += i
        i += 1
    return sum_

@timethis
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

main: 6.138 s
main: 6_138_392_200 ns


**for** loop is faster than **while** loop.

In [15]:
# Good
def computeSum(size: int) -> int:
    sum_ = 0
    for i in range(size):  ### explicit for loop
        sum_ += i
    return sum_

@timethis
def main():
    size = 10000
    for _ in range(size):
        sum_ = computeSum(size)

main()

main: 3.714 s
main: 3_714_053_800 ns


### 8.2 Replace explicit for loop with implicit for loop
We use the above example.

In [16]:
# Good
def computeSum(size: int) -> int:
    return sum(range(size))  ### implicit for loop

@timethis
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

main()

main: 1.45 s
main: 1_449_999_700 ns


### 8.3 Reduce the calculation of inner for loop

In [2]:
# Bad
import math

@timethis
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        for y in range(size):
            z = sqrt(x) + sqrt(y)  ###

main() 

main: 18.71 s
main: 18_709_775_400 ns


We move the *sqrt(x)* from inner **for** loop to outer **for** loop.

In [2]:
# Good
import math

@timethis
def main():
    size = 10000
    sqrt = math.sqrt
    for x in range(size):
        sqrt_x = sqrt(x)  ### 
        for y in range(size):
            z = sqrt_x + sqrt(y)

main() 

main: 9.919 s
main: 9_919_253_800 ns


## 9. Use numba.jit
**Numba** can compile the Python function JIT into machine code for execution, which greatly improves the speed of the code. For more information about **numba**, see the [homepage](http://numba.pydata.org/).

We use the above example.

In [2]:
# Bad
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum += i
    return sum

@timethis
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

main()

main: 4.082 s
main: 4_082_331_600 ns


In [3]:
# Good
import numba

@numba.jit
def computeSum(size: float) -> int:
    sum = 0
    for i in range(size):
        sum += i
    return sum

@timethis
def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

main()

main: 0.451 s
main: 451_180_600 ns


## 10. Use cProfile to Locate Time Cost Function

`cProfile` will output the time usage of each function. So we can find the time cost function.

In [4]:
import cProfile

def computeSum(size: int) -> int:
    return sum(range(size)) 

def main():
    size = 10000
    for _ in range(size):
        sum = computeSum(size)

cProfile.run("main()")

         20004 function calls in 1.648 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.006    0.000    1.646    0.000 <ipython-input-4-0a7c4410d013>:3(computeSum)
        1    0.002    0.002    1.648    1.648 <ipython-input-4-0a7c4410d013>:6(main)
        1    0.000    0.000    1.648    1.648 <string>:1(<module>)
        1    0.000    0.000    1.648    1.648 {built-in method builtins.exec}
    10000    1.640    0.000    1.640    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




## Reference

https://wiki.python.org/moin/PythonSpeed/PerformanceTips

https://realpython.com/introduction-to-python-generators/#building-generators-with-generator-expressions

Writing Solid Python Code 91 Suggestions

Python Cookbook, Third edition

https://zhuanlan.zhihu.com/p/143052860

https://pybit.es/faster-python.html