<a href="https://colab.research.google.com/github/dancher00/HPPL/blob/main/HPPL2024_Lec2_profiling_experiments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Высокопроизводительные вычисления на языке Python.
=====

Дополнения к Лекции 1:

Nice lectures (in Russian, 2015):

https://www.youtube.com/playlist?list=PLlb7e2G7aSpTTNp7HBYzCBByaE1h54ruW

Code styling guidelines (optional, but will help you to write readable code):
https://peps.python.org/pep-0008/


Лекция 2. Профилировка и простейшие методы оптимизации
-----

* Профилирование (профилировка, profiling) — это процесс измерения производительности программы для выявления участков кода, которые работают медленно или неэффективно.
* Цель профилировки - собрать подробную информацию о том, где тратится время и как используются ресурсы (например, ЦПУ, ГПУ, оперативная память) во время выполнения.
* Эта информация помогает разработчикам выявлять узкие места (bottlenecks) и оптимизировать свой код.

В Python существует несколько профилировщиков разной степени сложности и гранулярности. Начнем с самого простого - `timeit`.

`timeit`
------
```
Syntax:
timeit.timeit(stmt, setup, timer, number)

Parameter:

stmt - statement you want to measure; it defaults to ‘pass’.
setup - the code that you run before running the stmt; it defaults to ‘pass’. (We generally use this to import the required modules for our code.)
timer - timeit.Timer object; it usually has a sensible default value so you don’t have to worry about it.
number - number of executions you’d like to run the stmt.

Returns the number of seconds it took to execute the code.
```



**Обсудить:**
* работа с Linux
* загрузка и чтение файлов
* лабораторная работа с ответами

In [None]:
# importing the required module
import timeit

# code snippet to be executed only once
mysetup = "import numpy as np"

# code snippet whose execution time is to be measured
mycode = '''
def sum(a,b):
    return a+b
a=sum(10,20)
'''

# timeit statement
print(timeit.timeit(setup=mysetup,
                    stmt=mycode,
                    number=10000))

print(timeit.repeat(setup=mysetup,
                    stmt=mycode,
                    repeat=5,
                    number=10000))

In [None]:
# через CLI
# python -m timeit -s "import numpy as np" -n 1000 -r 10 "x=10+20"
!python -m timeit -s "import numpy as np" -n 1000 -r 10 "x=10+20"

In [None]:
# более удобно как магия строки (line magic) или магия клетки (cell magic)
# (аргументы -n -r опциональны)
%timeit -n 1000 -r 10 x=10+20

In [None]:
# магия клетки (cell magic)
%%timeit

x = 10 + 20
y = 'hello' + ' world'

Эксперименты
----

In [None]:
import numpy as np

a = np.ones(2**20, dtype=np.int32)

# operator: op  (+)
# unity   : 0
# a[0]  op   a[1]   op    a[2]  . ..
# commutative  a op b  =  b op a
# associative  (a op b) op c  = a op (b op c)


def my_reduce(a):
    res = 0
    for element in a:
        res = res + element
    return res


In [None]:
%%timeit -n 200 -r 2
my_reduce(a)

118 ms ± 3.71 ms per loop (mean ± std. dev. of 2 runs, 200 loops each)


In [None]:
from functools import reduce

def operator(a, b):
    return a+b


In [None]:
%%timeit -n 20 -r 2
reduce(lambda x,y: x+y, a)

155 ms ± 555 µs per loop (mean ± std. dev. of 2 runs, 20 loops each)


In [None]:
def op_sum(a,b):
    return a+b



def my_reduce_2(a, op, unity):
    '''
    op(a,b)
    '''
    res = unity
    for element in a:
        res = op(res, element)
    return res

In [None]:
print(my_reduce(a))
print(my_reduce_2(a, op_sum, 0))
print(my_reduce_2(a, lambda x,y: x+y, 0))
print(my_reduce_2(a, lambda x,y: x*y, 1))

In [None]:
%%timeit
my_reduce(a)

In [None]:
%%timeit
my_reduce_2(a, op_sum, 0)

In [None]:
%%timeit
b = np.sum(a)

In [None]:
# lambda functions
f = lambda x: x + 2
print(type(f))      # это функция
print(f(2.2+1j))

# как можно сделать функцию двух переменных:

f1 = lambda x,y: x+y
f2 = lambda x: lambda y: x+y

print(f1(2,4))
print(f2(2)(4))

<class 'function'>
(4.2+1j)
6
6


In [None]:
sqr_1 = lambda x: x*x
sqr_2 = lambda x: x**2

In [None]:
%%timeit
sqr_1(10.0+2.0j)

73.3 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [None]:
%%timeit
sqr_2(10.0+2.0j)

121 ns ± 27.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [None]:
my_list = list(range(100000))
%timeit list(map(lambda x: x*x, my_list))
%timeit [x*x for x in my_list]

11.7 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.62 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
# Matrix multiplication

nofrows = 100
nofcols = 100
x = np.random.rand(nofrows,nofcols)
y = np.random.rand(nofrows,nofcols)

A = np.matmul(x,y)   # matmul
B = x@y              # matmul
C = x * y            # elementwise

%timeit x@y

109 µs ± 17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [None]:
def stupidmatmul(x,y):
    '''
    Stupid matrix multiplication, assume square matrices
    Don't ever multiply matrices this way
    '''
    z = np.zeros((nofrows,nofcols), dtype=np.double)
    for i in range(x.shape[0]):
        for j in range(x.shape[0]):
            for k in range(x.shape[0]):
                z[i,j]=z[i,j]+x[i,k]*y[k,j]

    return z

In [None]:
%timeit stupidmatmul(x,y)

842 ms ± 229 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
842e-3/100e-6

8420.0

In [None]:
# Monte-Carlo calculation of integrals

N = 2**20
x = np.random.rand(N)
y = np.random.rand(N)

# x**2+y**2 < 1
def StupidPi(x,y):
    counter = 0
    for n in range(N):
        if(x[n]**2+y[n]**2)<1: counter = counter+1
    return counter

print(StupidPi(x,y)/N*4)

3.145538330078125


In [None]:
x**2+y**2<1

array([ True,  True,  True, ..., False,  True,  True])

In [None]:
def BetterPi(x,y):
    return np.sum(x**2+y**2<1)

In [None]:
x**2+y**2<1

In [None]:
%timeit StupidPi(x,y)
%timeit BetterPi(x,y)

640 ms ± 9.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10.4 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
# Numpy array sorting

a = np.random.rand(100).astype(np.float32)
sorted_a = np.sort(a)


#  0.3+0.6 != 0.6+0.3 in machine arithmetics

In [None]:
a.dtype

dtype('float32')

In [None]:
print(np.dot(a,a))
print(np.dot(sorted_a, sorted_a))


np.dot(a,a) == np.dot(sorted_a, sorted_a)    # bad idea

np.isclose(np.dot(a,a), np.dot(sorted_a, sorted_a))

30.333647
30.333645


np.True_

In [None]:
a = np.array([2**32-1], dtype=np.uint32)
print(a)
print(a+1)

[4294967295]
[0]


In [None]:
#     1 1 1 1
#   +
#     0 0 0 1
#     0 0 0 0



