<a href="https://colab.research.google.com/github/aleksejalex/EIEE9E_2025_ZS/blob/main/PyPEF_04_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EIEE9E, lecture 04. Numerical computations in Python: package NumPy.

Prepared by: Aleksej Gaj ( pythonforstudents24@gmail.com )

🔗 Course website: [https://aleksejgaj.cz/pef_python/](https://aleksejgaj.cz/pef_python/)


In this tutorial we familiarize ourselves with
 - library NumPy: general philosophy, arrays, operations on them, functions


### Solution of the task from previous notebook:
**task was:** write a function that will take a list of numbers, test whether list contains numbers only (only type `int` and `float` is allowed) and will return median of those numbers.
  

In [None]:
def calculate_median(numbers:list):
    valid_numbers = []
    for num in numbers:
        if isinstance(num, (int, float)):  # also composed condition comparing types is possible
            valid_numbers.append(num)

    valid_numbers.sort()
    length = len(valid_numbers)

    if length == 0:
        return None
    elif length % 2 == 0:
        middle = length // 2
        return (valid_numbers[middle - 1] + valid_numbers[middle]) / 2
    else:
        return valid_numbers[length // 2]

# Example usage:
number_list = [1000, 2.5, 3, 4, 5, 10, 2000]
#number_list = [1, 2, 3, 4]
median = calculate_median(number_list)
print("Median:", median)


#### 🔥 Strange result (could be) observed:

In [None]:
10*25.37

Why it that? 🤔

Python (like many programming languages) uses a [standard IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) to represent floating-point numbers, which means **represent any numbers in binary format**

The result of `10 * 25.37` can't be represented exactly in binary. The closest binary approximation is the reason why we see a number like 253.70000000000002.


Possible but very *rare* fix:

In [None]:
from decimal import Decimal

result = Decimal('10') * Decimal('25.37')
print(result)

More common fix: **rounding**

In [None]:
result = 10 * 25.37
print(f"{result:.2f}")   # 2 means 2 places after decimal point

...and this is also the way to show the error of binary aproximation:

In [None]:
result = 10 * 25.37
print(f"{result:.40f}")

## 🕓 Recall: containers.
Python uses several containers:
 - list


| Type         | Description                                | Example                   |
|--------------|--------------------------------------------|---------------------------|
| list         | Ordered collection of items                | `my_list = [1, 2, 3]`     |
| tuple        | Immutable ordered collection of items      | `my_tuple = (1, 2, 3)`    |
| set          | Unordered collection of unique items       | `my_set = {1, 2, 3}`      |
| dict         | Collection of key-value pairs              | `my_dict = {'eggs': 6, 'apples': 3, 'cookies': "chocolate"}`|


## 🕓 Technical recall 1: printing objects

We already know that there are (at least) 2 ways to print out variable:
 - via command `print()`
 - by leaving the variable on the last line of a cell

First way is universal, it works always. The second is *specific to jupyter notebooks*.
(It means if you would leave just a variable in a script, nothing will happen.)

But what actually happens? Since we have seen what do the objects look like and we know everything is treated as object in Python, we understand that there should be some function triggered in behind of all this.

Let's have a look:


In [None]:
class Cat:
    def __init__(self, weight = 10):
        self.weight = weight

    def __repr__(self):
        # string representation of the object
        # this fction is called when object is called in ipython (just variable in notebook)
        return f'Just a cat with weight of {self.weight} kg'

    def __str__(self):
        # string representation of object's value
        # this fction is called when object is being printed
        return f'This cat was printed, meow 🐾'

In [None]:
my_cat = Cat()

In [None]:
my_cat

In [None]:
print(my_cat)

So:
 - function `__repr__` is **string representation of the object itself**
 - function `__str__` is **string representation of the object's value**


#### Why to bother? 🤨

Just explaining scenarios when `print(a)` and leaving just `a` at the end of a cell brings different results (see this behaviour with `numpy.array` below).

## 🕓 Technical recall 2: how to import packages and how to use them

In [None]:
# keyword of interactive python (iPython), which runs .ipynb notebooks
%whos

3 ways how to import in Python:

In [None]:
# 1. directly (usually ineficient)
import math

In [None]:
%whos

✨ We see there `math` as object of type `module`.

In [None]:
math.cos(2 * math.pi)

In [None]:
# 2. via reference (most common, highly recomended)
import math as mt

In [None]:
mt.cos(2 * mt.pi)

In [None]:
# 3. selectively (good in cases, when you need only one/two objects from the library)
from math import cos, pi

In [None]:
cos(2*pi)

In [None]:
# 4. everything from library (NOT recomended, unless you import something small or you really need all objects inside)
from math import *

In [None]:
%whos

## NumPy: Numerical Python

<img src="https://numpy.org/doc/stable/_static/numpylogo.svg" alt="logo" width="400">

[documentation](https://numpy.org/)

= library created to work with *numerical data*:

 - effectively stores and operates with high-dimensional data structures (**arrays** - like vectors and matrices)
 - implements mathematical operations on those arrays

In [None]:
# import
import numpy as np

### Array - basic type of variable in NumPy

 - very similar to vector/matrix in algebra
 - type of variable: `ndarray`

| Feature                          | Python Lists (`list`)                                 | NumPy Arrays (`ndarray`)                                 |
|----------------------------------|-----------------------------------------------|----------------------------------------------|
| Data Types                       | Can contain elements of different data types  | Homogeneous data type ⚠️                       |
| Performance                      | Slower for large datasets                     | Faster for large datasets                    |
| Mathematical Operations          | Limited functionality for mathematical ops    | Rich set of mathematical operations         |
| Indexing and Slicing             | Basic indexing and slicing                    | Advanced indexing and slicing                |
| Iteration                        | Basic iteration                                | Vectorized operations                        |
| Memory Efficiency                | Less memory efficient                         | More memory efficient                        |


Example:

$$ \vec{v} = \begin{pmatrix} 1 \\ 2 \\ 3 \\ 4 \end{pmatrix} $$

In [None]:
v = np.array([1,2,3,4])

v

In [None]:
print(v)

In [None]:
type(v)

NumPy's array can be created:
 - from Python container - like `list` or `tuple`
 - via some NumPy function (for example `np.zeros`)
 - by loading from file



In [None]:
mylist = [1,2,3]   # Python's "native" list
arr_from_list = np.array(mylist)

print(arr_from_list)
print(mylist)

🗒️ Note: `list` is printed with commas while `np.array` is printed without them.

2-D array (matrix):

$$
\mathbb{A} = \begin{pmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9 \\
\end{pmatrix} $$

In [None]:
A = np.array([[1,2,3], [4,5,6], [7,8,9]])

A

Matrix transposition:

In [None]:
A.T

Shape of array, size of array

 - shape ... number of elements in each dimension
 - size ... total number of elements in the array

In [None]:
v.shape

In [None]:
A.shape

In [None]:
v.size

In [None]:
A.size

`ndarray` must have all its elements of the same type. Type can be checked via `dtype` method:

In [None]:
A.dtype

Type of elements of a `ndarray` can be set when creating the object:

In [None]:
B = np.array([[1, 2], [101, 102]], dtype='int')

B

Possible options are: `int`, `float`, `complex`, `bool`, `object`.

### Array-generating functions
 = functions that will automatically create an array, so we don't have to specify its elements manually

In [None]:
x = np.arange(start=0, stop=10, step=1)

x

In [None]:
x = np.arange(start=-1, stop=1, step=0.1)

x

In [None]:
zero_mtx = np.zeros([2,3])

zero_mtx

In [None]:
ones_mtx = np.ones([3,2])

ones_mtx

In [None]:
eye_mtx = np.eye(5, dtype = int)

eye_mtx

In [None]:
eye_mtx.dtype

In [None]:
# Array of random numbers - each is generated from uniform distribution on (0,1)
np.random.rand(3,4)

In [None]:
# Array of random numbers - each is generated from Gaussian curve on (0,1)
np.random.randn(3,4)

Just a visualisation - what's the difference:

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(4,3), dpi=120)
plt.plot(np.random.randn(5000), 'b.', alpha = 0.3)
plt.plot(np.random.rand(5000), 'r.', alpha=0.3)
plt.show()

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(4, 3), dpi=120)
plt.hist(np.random.rand(5000), bins=np.linspace(-4, 4, 50), alpha=0.7)  # Set alpha to 0.5 for transparency
plt.hist(np.random.randn(5000), bins=np.linspace(-4, 4, 50), alpha=0.7)  # Set alpha to 0.5 for transparency
plt.show()


(We will learn plotting figures next time.)

In [None]:
# reminder: array with random integers
np.random.randint(0, 10, size=(3, 3))

### Array arithmetics

Let's have these two matrices:

$$ \mathbb{A} =
\begin{pmatrix}
1 & 2 \\
3 & 4 \\
\end{pmatrix} ,
\quad \quad
\mathbb{B} =
\begin{pmatrix}
5 & 6 \\
7 & 8 \\
\end{pmatrix}
$$



In [None]:
A = np.array([[1, 2],[3, 4]])
B = np.array([[5, 6],[7, 8]])

In [None]:
# sum of those matrices:
A+B

In [None]:
A-B

In [None]:
# dot product (cz. maticové násobení)
np.dot(A, B)

In [None]:
# or equivalently:
A @ B

You can test it by hand 😜:

$$
\mathbb{A} \cdot \mathbb{B} =
\begin{pmatrix}
1 & 2 \\
3 & 4 \\
\end{pmatrix}
\cdot
\begin{pmatrix}
5 & 6 \\
7 & 8 \\
\end{pmatrix}
=
\begin{pmatrix}
19 & 22 \\
43 & 50 \\
\end{pmatrix}
$$

In [None]:
# element-wise product (cz. násobení po prvcích)
np.multiply(A, B)

In [None]:
# or equivalently:
A * B

### Intermezzo: speed comparison (proof that it makes sense to use NumPy)

NumPy is much better optimized (in sense of speed of computation). Here I demonstrate it:

 - below is function `my_dot_product` which multiplies two `list`s (not `ndarray`s)
 - then it's called with two anonymous lists with randomly generated numbers

Time measurement is performed in jupyter notebook via "magic command" `%timeit`.

In [None]:
def my_dot_product(list1, list2):
    result = 0
    for i in range(len(list1)):
        result = result + list1[i] * list2[i]

    return result

1. let's measure how quick this function computes:

In [None]:
%timeit my_dot_product(list(np.random.randint(0, 10, size=(10000,1))), list(np.random.randint(0, 10, size=(10000,1))))

2. now let's perform the same multiplication via native functionality in NumPy:

In [None]:
%timeit np.random.randint(0, 10, size=(10000,1)).T @ np.random.randint(0, 10, size=(10000,1))

The difference is obvious 😉

In [None]:
19800 / 360

### Working with array
~ quite similar to lists

In [None]:
matrix = np.array([[1, 2], [3, 4]], dtype=int)
matrix

Access to elements:

In [None]:
matrix[1, 1]

Rewriting elements:

In [None]:
matrix[1, 1] = 100
matrix

Rewriting whole column/row:

In [None]:
matrix[1,:] = -1
matrix

From n-D array to 1-D array ("flattening matrices into long vectors"):

In [None]:
matrix.flatten()

Slicing: allows you to access a portion of an array
> `array[start:end:step]`

In [None]:
arr = np.array([1, 2, 3, 4, 5])
arr[1:4]

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
reshaped_arr = np.reshape(arr, (3, 2))
reshaped_arr

Concatenation of `ndarray`s:

In [None]:
C = np.concatenate([A, B], axis=0)
C

In [None]:
D = np.concatenate([A, B], axis=1)
D

### Mathematical functions

NumPy provides wide range of mathematical functions which can be applied on `int` or `float` and also on `ndarray`. They act on `ndarray` *element-wise*.

*Examples:*

#### Trigonometric functions:

In [None]:
arr = np.array([0, np.pi*2, np.pi])
print("arr = ", arr)

# Sine function
sine_values = np.sin(arr)
print("Sine:", sine_values)

# Cosine function
cosine_values = np.cos(arr)
print("Cosine:", cosine_values)

# Tangent function
tangent_values = np.tan(arr)
print("Tangent:", tangent_values)

#### Exponential and different logarithms:

In [None]:
arr = np.array([1, 2, 3])

# Exponential function (e^x)
exp_values = np.exp(arr)
print("Exponential:", exp_values)

# Natural logarithm (base e)
log_values = np.log(arr)
print("Natural Logarithm:", log_values)

# Common logarithm (base 10)
log10_values = np.log10(arr)
print("Common Logarithm:", log10_values)

# Logarithm of base 2
log2_values = np.log2(arr)
print("Base 2 Logarithm:", log2_values)

#### Basic statistical functions:

In [None]:
arr = np.array([1, 2, 3, 4, 5])

# Mean
mean_value = np.mean(arr)
print("Mean:", mean_value)

# Median
median_value = np.median(arr)
print("Median:", median_value)

# Standard deviation
std_value = np.std(arr)
print("Standard Deviation:", std_value)

# Variance
var_value = np.var(arr)
print("Variance:", var_value)

# Minimum value
min_value = np.min(arr)
print("Minimum Value:", min_value)

# Maximum value
max_value = np.max(arr)
print("Maximum Value:", max_value)


## What was omitted?

 - reading an array from text file --> next time
 - plotting values --> next time (different package)
 - special functions for linear algebra: Fast Fourier Transform, LU decomposition, SVD decomposition, ....
 - ....


 some [tutorials on NumPy](https://numpy.org/numpy-tutorials/index.html)


😵‍💫 If you are a little lost, don't worry : next time we start plotting graphs, so we will need to work with NumPy arrays and some functions -> we will see these things one more time **in practice**.

## ✍️ Try it yourself: *functions again*
*task:* write a function `gauss()` that will return a value of Gaussian probability distribution function.
Make mean value $\mu$ (`mu`) and dispersion $\sigma$ (`sigma`) optional arguments.

Hint/reminder:
$$
    y = \frac{1}{\sqrt{2 \pi \sigma^2} } e^{- \frac{(x - \mu)^2}{2 \sigma^2}}
$$

Default values: $\mu = 0$, $\sigma = 1$.

(more on gaussian curve >> [here](https://en.wikipedia.org/wiki/Normal_distribution))

In [None]:
def gauss():
    pass

x = 1
y = gauss(x)
y = gauss(x, mu=2, sigma=0.2)

## ✍️ Try it yourself:

*task:* write a function, that will take a matrix, and:
 - check if the matrix is square
 - if it's square, return its determinant (`np.det(A)`)
 - if it's *not* rectangular, reshape it so it's square and then return its determinant
 - if reshaping is not possible (odd number of elements), fill it with ones

(use library NumPy)

In [None]:
def generalized_det(A):
    pass

In [None]:
matrix1 = np.array([[1,2,3], [4,5,6]])
print(generalized_det(matrix1))