Zalecamy nie czytać notatników na githubie, ze względu na źle wyświetlające się wizualizacje i brak możliwości uruchamiania kodu. Polecamy otworzyć notatnik w google colab, następującym linkiem:

<a target="_blank" href="https://colab.research.google.com/github/asia281/StaszicAI/blob/main/20-05/Staszicowe_intro_to_Numpy.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
#@title Kod do wizualizacji, nie trzeba go czytać (polecamy zwinąć komórkę)

import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

# -----------------------
# Visualization Code
# -----------------------

def plot_vector_sum(a, b):
    # Create a figure with arrows for vectors
    plt.figure(figsize=(6, 6))
    plt.grid(True)
    sstr = lambda x: f'({x[0]}, {x[1]})'

    # Plot vectors as arrows
    # Vector v
    plt.quiver(0, 0, *a, angles='xy', scale_units='xy', scale=1, color='r', label=f'a = {sstr(a)}')
    plt.plot(a[0], a[1], 'bo')  # Ending position of vector v as a blue dot
    # Vector w, shifted to the end of vector v
    plt.quiver(*a, *b, angles='xy', scale_units='xy', scale=1, color='r', label=f'b = {sstr(b)}')
    # Ending position of the sum of vectors as a blue dot
    plt.plot(*(a + b), 'bo')
    # Resultant sum of vectors as an arrow
    plt.quiver(0, 0, *(a + b), angles='xy', scale_units='xy', scale=1, color='g', label=f'a + b = {sstr(a + b)}')

    # Set axis limits
    plt.xlim(0, 5)
    plt.ylim(0, 5)

    # Add legend
    plt.legend()

    # Show plot
    plt.show()

# NumPy


Zaczynamy od zaimportowania bibliotek.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Array creation

There are multiple ways to create an array, try examples below.

### From nested lists

In [None]:
array_1d = np.array([1, 2, 3, 4, 5])
array_1d

In [None]:
array_2d = np.array(
  [
    [1,  2,  3,  4,  5 ],
    [6,  7,  8,  9,  10],
    [11, 12, 13, 14, 15]
  ]
)
array_2d

In [None]:
array_3d = np.array(
  [
    [
      [1,  2,  3,  4,  5 ],
      [6,  7,  8,  9,  10],
      [11, 12, 13, 14, 15],
    ],
    [
      [16, 17, 18, 19, 20],
      [21, 22, 23, 24, 25],
      [26, 27, 28, 29, 30],
    ]
  ]
)
array_3d

In [None]:
# TODO: stwórz macierz 2d o wymiarach 2x2, która ma kolejne elementy: 4, 3, 2, 1

### Using zeros / ones / full



In [None]:
shape = (2, 3)

np.zeros(shape)

In [None]:
np.ones(shape)

In [None]:
fill_value = 42
np.full(shape, fill_value)

In [None]:
# TODO: stwórz macierz  o wymiarach 1x3, która jest wypełniona liczbami 3

### Using zeros_like / ones_like / full_like

In [None]:
np.zeros_like(array_2d)

In [None]:
np.ones_like(array_2d)

In [None]:
fill_value = 42
np.full_like(array_2d, fill_value)

### Using arange / linspace (1D arrays only)

which are used to fill an array with values from the interval $[low, high]$ with specific step or number of points.

In [None]:
start = 0
stop  = 1
step  = 0.1

np.arange(start, stop, step)

In [None]:
np.arange(start, stop + 1e-4, step)

In [None]:
n_steps = int((stop-start)/step) + 1
np.linspace(start, stop, n_steps)

In [None]:
# TODO: stwórz macierz 1d, która ma kolejne wartości: 1, 2, ..., 10

### Using identity / diag (2D arrays only)
 to create identity or diagonal (with optional offset) matrices.

In [None]:
n = 4
np.identity(n)

In [None]:
np.diag([1, 2, 3, 4])

In [None]:
offset = 1
np.diag([1, 2, 3], offset)

### Using tile & repeat
to create an array filled with "repeating pattern".

In [None]:
array = np.array([0,1,2])

In [None]:
np.tile(array, 2)

In [None]:
np.tile(array, (2, 1))

In [None]:
np.repeat(array, 2)

In [None]:
np.repeat(array, 2, axis=0)

In [None]:
np.repeat(array.reshape(1, -1), 2, axis=0)

In [None]:
# TODO: stwórz macierz array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [None]:
# TODO: stwórz macierz array([(1, 2, 3)^10]) (macierz z (1, 2, 3) powtórzonymi 10 razy)

## NumPy dtypes

Full list of NumPy dtypes is available here: https://numpy.org/doc/stable/user/basics.types.html

Among which there are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex.

Data-types can be used as arguments to the dtype keyword that many numpy functions or methods accept, in particular array creation routines.

See examples below.

In [None]:
np.array([1, 2, 3, 4, 5], dtype=int)

In [None]:
np.array([1, 2, 3, 4, 5], dtype=float)

Types can be cast as follows

In [None]:
float_array   = np.array([1, 2, 3, 4, 5], dtype=float)

float_array, float_array.dtype

## Mathematical operations
As stated before

> NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

Let us observe some examples.



In [None]:
x = np.arange(0, 3, 1)
y = np.arange(3, 6, 1)

x, y

#### Elementwise

Basic mathematical operations will be applied to the arrays elementwise.

### **Mnożenie wektora przez liczbę**

Mnożenie wektora przez liczbę polega na pomnożeniu każdej składowej wektora. Na przykład, jeśli chcemy pomnożyć wektor `a` przez `3`, wynik będzie wyglądał następująco:

$ d = 3 \cdot a = [3 \cdot a_1, 3 \cdot a_2] $

Przykład w `numpy`:


In [None]:
2 * x

In [None]:
x * 2

In [None]:
np.sin(x)

In [None]:
x**2

The same applies to the binary (taking two arguments) operators and arrays of the same shape.

In [None]:
x + y

In [None]:
x * y

In [None]:
x / y

In [None]:
x // y

In [None]:
y ** x

In [None]:
a = np.array([1, 2])
b = np.array([3, 1])

c = a + b
print("Wynik dodawania a + b:", c)

In [None]:
plot_vector_sum(a, b)  # Wizualizujemy sumę wektorów a i b

## (Additional) Saving / loading data

### Using savetxt / loadtxt

to process a single 1D/2D array in human readable format. File created with `savetxt` can be easily accessed with a text editor

In [None]:
data = np.array(
  [
    [1, 2],
    [3, 4],
  ]
)

np.savetxt('data.txt', data)
!cat data.txt

as well as seamlessly loaded back with `loadtxt`.


In [None]:
np.loadtxt('data.txt')

File format can be specified, see https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html for reference

In [None]:
np.savetxt('data.txt', data, fmt='%.2f')
!cat data.txt

In [None]:
np.savetxt('data.txt', data, fmt='%d')
!cat data.txt

Nevertheless information regarding dtype is not preserved.

In [None]:
np.loadtxt('data.txt')

In [None]:
np.loadtxt('data.txt', dtype=int)

`savetxt` method can be used to save an array in csv format, we only need to specify the proper delimiter

In [None]:
np.savetxt('data.csv', data, delimiter=',', fmt='%.2f')
!cat data.csv

as well as loaded back

In [None]:
np.loadtxt('data.csv', delimiter=',')

### Using save / load
to process an arbitrary array in binary format. Unfortunately the resulting file is not human readable. On the other hand these methods can handle array of arbitrary shape and preserve the underlying dtype.

In [None]:
data = np.array(
  [
    [
      [1, 2],
      [3, 4],
    ]
  ]
)

np.save('data.npy', data)

!cat data.npy

In [None]:
np.load('data.npy')

In [None]:
np.load('data.npy').dtype == data.dtype

## Shape manipulation
It is possible to change the structure of data in the array. See examples below.

#### Reshape

In [None]:
data = np.arange(0, 12, 1, dtype = int)
data

In [None]:
data.reshape(3, 4)

In [None]:
data.reshape(4, 3)

In [None]:
data_reshaped = data.reshape(2, 3, 2)
data_reshaped

In [None]:
shape = data_reshaped.shape
shape

In [None]:
# TODO: stwórz macierz a = array([3, 3, 3, 3]), a następnie zmień ją na: a = array([[3, 3], [3, 3]])

#### Inferring **one** dimension while using reshape method

In [None]:
data.reshape(6, -1)

In [None]:
data.reshape(-1, 6)

In [None]:
data.reshape(3, 2, -1)

#### Flatten

In [None]:
data_reshaped.flatten()

#### Adding / deleting dummy dimensions
Facilites vectorized arithmetic with conjunction with broadcasting mechanism (explained later)

In [None]:
data

In [None]:
data.shape

In [None]:
data_expanded1 = np.expand_dims(data, 0)
data_expanded1

In [None]:
data_expanded1.shape

In [None]:
data_expanded2 = np.expand_dims(data, 1)
data_expanded2

In [None]:
data_expanded2.shape

In [None]:
data_squeezed = data_expanded2.squeeze()
data_squeezed

In [None]:
data_squeezed.shape

#### Transposing

For 2D arrays it behaves just as expected (performs matrix transpose).

In [None]:
data = np.arange(0, 4, 1).reshape(2, 2)
data

In [None]:
data.transpose()

For higher dimensional arrays its best visualized via observing how the shape changes.

In [None]:
shape = (1, 2, 3, 4)
data = np.ones(shape)
data.shape

The default behavior is to flip dimensions.

In [None]:
data.transpose().shape

However other permutations of dimensions might be specified.

In [None]:
dimensions_permutation = [1, 0, 3, 2]
data.transpose(dimensions_permutation).shape

`.T` is a shortand for `.transpose()`

In [None]:
data.T.shape

## Concatenation & stacking
The following methods are used to merge multiple arrays (with compatible shapes) into one array.

### concatenate
used to join a sequence of arrays along an **existing** axis.

In [None]:

import numpy as np

# TODO: stwórz macierze array_1 = array([1, 1, 1, 1]), array_2 = array([2, 2, 2, 2]))
array_1 = ...
array_2 = ...

array_1, array_2

In [None]:
np.concatenate([array_1, array_2, array_1])

In [None]:
# TODO: zmień te macierze na: array_1 = array([[1, 1, 1, 1]]), array_2 = array([[2, 2, 2, 2]]))
array_1 = ...
array_2 = ...

array_1, array_2

The default axis to join on is 0

In [None]:
np.concatenate([array_1, array_2, array_1])

But it can be changed

In [None]:
np.concatenate([array_1, array_2, array_1], axis=1)

### stack

used to stack a sequence of arrays along a **new** axis.

In [None]:
shape = (4,)
array_1 = np.full(shape, 1)
array_2 = np.full(shape, 2)

array_1, array_2

In [None]:
np.stack([array_1, array_2, array_1])

In [None]:
np.stack([array_1, array_2, array_1], axis=1)

#### Broadcasting

When arrays have different shapes the broadcasting mechanism comes into play.

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

- they are equal, or
- one of them is 1

If these conditions are not met, a `ValueError: operands could not be broadcast together` exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.

from: https://numpy.org/devdocs/user/basics.broadcasting.html

In [None]:
x_shape = (2, 4)
y_shape = (2, 1)

x = np.ones(x_shape)
y = np.ones(y_shape)

(x + y).shape

In [None]:
# Try to guess before running this cell
(x + y)

In [None]:
x_shape =    (2, 4)
y_shape = (3, 1, 1)

x = np.ones(x_shape)
y = np.ones(y_shape)

(x + y).shape

In [None]:
# Try to guess before running this cell
(x + y)

In [None]:
x_shape = (   1, 4)
y_shape = (3, 2, 1)

x = np.ones(x_shape)
y = np.ones(y_shape)

(x + y).shape

In [None]:
# Try to guess before running this cell
(x + y)

### Matrix / tensor operations
NumPy package wouldn't be complete without means to manipulate arrays other than elementwise operations, such as most common linear algebra functionalities. See the examples below.



## **Mnożenie macierzy przez macierz**

Mnożenie dwóch macierzy polega na:
1. przemnożeniu każdej kolumny drugiej macierzy przez pierwszą macierz (jako mnożenie macierz-wektor) i
2. zapisaniu wyników jako kolumny nowej macierzy.

Jeśli mamy dwie macierze:

$$
A = \begin{pmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \\
\end{pmatrix}
,
B = \begin{pmatrix}
b_{11} & b_{12} \\
b_{21} & b_{22} \\
\end{pmatrix}
$$

to ich iloczyn, oznaczany jako $C = A \cdot B$, jest obliczany jako

$$
C = \begin{pmatrix}
a_{11} \cdot b_{11} + a_{12} \cdot b_{21} & a_{11} \cdot b_{12} + a_{12} \cdot b_{22} \\
a_{21} \cdot b_{11} + a_{22} \cdot b_{21} & a_{21} \cdot b_{12} + a_{22} \cdot b_{22} \\
\end{pmatrix}
$$

Podobnie działa mnożenie większych macierzy: element na pozycji $(i, j)$ w wynikowej macierzy to iloczyn skalarny i-tego wiersza pierwszej macierzy i j-tej kolumny drugiej macierzy.

Pamiętaj jednak, że można pomnożyć ze sobą jedynie niektóre macierze: liczba kolumn pierwszej macierzy musi być równa liczbie wierszy drugiej!


#### Matrix multiplication

can be performed with `matmul` method

In [None]:
a = np.eye(3)
a[2,2] = 2
b = np.arange(0, 3, 1).reshape(3, 1)

a

In [None]:
b

In [None]:
np.matmul(a, b)

or `@` operand

In [None]:
a @ b

Be mindfull of its varying behaviour for different input shapes

- If both arguments are 2-D they are multiplied like conventional matrices.
- If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
- If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.

from: https://numpy.org/doc/stable/reference/generated/numpy.matmul.html

#### Dot product

can be performed using `dot` method.

In [None]:
x = np.arange(0, 3, 1)
y = np.arange(3, 6, 1)

x, y

In [None]:
# TODO: jaki jest wynik tej komórki?
x * y

In [None]:
# TODO: a tej?
np.dot(x, y)

#### (Additional) Einstein summation convention
Einstein summation convention provide a concise way of expressing multilinear operations. You can look it up here https://mathworld.wolfram.com/EinsteinSummation.html and its NumPy specification here https://numpy.org/doc/stable/reference/generated/numpy.einsum.html.

Take a look a the following ilustrative examples.


In [None]:
data = np.tri(3)
data

In [None]:
np.einsum('ij->ji', data)  # transpose

In [None]:
np.einsum('ij->i', data)  # row sum

In [None]:
np.einsum('ij->j', data)  # column sum

In [None]:
np.einsum('ii->i', data)  # diagonal

In [None]:
vector = np.ones(3)
vector

In [None]:
np.einsum('ij,j->i', data, vector)  # matrix multiplication (matrix x vector)

In [None]:
np.einsum('ij,jk->ik', data, vector.reshape(3, 1))  # matrix multiplication (matrix x matrix)

## Other operations

### Aggregations: mean / var
Very often we are required to calculate some aggregations with respect to specific dimensions of an array, such as mean or variance.

In [None]:
data = np.arange(0, 20, 1).reshape(4, 5)
data

By default the whole array is aggregated

In [None]:
data.mean()

But aggregations can also be applied axiswise

In [None]:
data.mean(axis=0)

In [None]:
data.mean(axis=1)

In [None]:
data.var(axis=0)

### Sorting
Arrays can be sorted. By default sorting is performed with respect to the last dimension

In [None]:
data = np.array([
  [1, 4, 2, 8, 5],
  [2, 3, 1, 9, 6],
  [8, 2, 3, 7, 4],
])

np.sort(data)

but sorting dimension can be explicitly specified.

In [None]:
np.sort(data, axis=0)

In [None]:
np.sort(data, axis=1)

You can also get the "sorting permutations" that can be used in conjunction with advanced indexing (introduced below).

In [None]:
np.argsort(data, axis=0)

In [None]:
np.argsort(data, axis=1)

In [None]:
# TODO: stwórz macierz array([(1, 2)^8]) i posortuj ją

## Indexing

Allows selecting items and subarrays. It can be also used to modify fragments of an array.

### Basic indexing

In [None]:
data = np.arange(0, 16, 1)
data

In [None]:
data[10]

In [None]:
data[-2]

In [None]:
data[:5]

In [None]:
data[::2]

In [None]:
data[-3:]

In [None]:
# TODO: weź fragment array([3, 4]) używając operacji wprowadzonych powyżej

In [None]:
# TODO: czy masz pomysł, jak odwrócić tablicę używając ::

It is also possible in higher dimensional setting


In [None]:
data = data.reshape(4, 4)
data

In [None]:
data[1, 1]

In [None]:
data[1, :]

One does not need to specify trailing colons `:`

In [None]:
data[:-2]

Note how the column selection is achieved. In this case `:` can't be ommited.

In [None]:
data[:, -2:]

One group of subsequent colons `:` can be replaced with `...`

In [None]:
data_reshaped = data.reshape(4, 2, 2)
data_reshaped

In [None]:
data_reshaped[..., 0]

Note the difference

In [None]:
data_reshaped[:, 0]

### View & copies

Basic indexing routines presented above return views of the data in the underlying arrays. When the underlying array is modified the changes are reflected in the view.

In [None]:
data = np.arange(0, 16, 1).reshape(4, 4)
data

In [None]:
view = data[:, 3]
view

In [None]:
data *= 2
data

In [None]:
view

To "detach" view from the underlying array use `copy` method

In [None]:
view_copy = view.copy()
data *= 2
view, view_copy

Views are only possible when selecting with slices, then it's a matter of rememebering offset and stride only.

You can read more about this here: https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html

### (Advanced) Advanced indexing

it is also possible to index with integer or boolean arrays

In [None]:
data = np.arange(0, 16, 1).reshape(4, 4)
data

We can select specific rows

In [None]:
data[[0, 3]]

and columns.

In [None]:
data[:, [0, 3]]

Seleceting both together is a little more complicated

In [None]:
rows = [
 [0, 0],
 [3, 3]
]
columns = [
 [0, 3],
 [0, 3]
]
data[rows, columns]

Use indices shape to control the output shape.

In [None]:
rows = [0, 0, 3, 3]
columns = [0, 3, 0, 3]
data[rows, columns]

Boolean arrays can be used as well, though they work somewhat differently.

In [None]:
index = (data % 3 == 0)
index

In [None]:
data[index]

Its most common use case is to update array entries based on some condition.

In [None]:
data[index] = -1
data

Detailed bahavior of indexing with boolean arrays is described here: https://numpy.org/doc/stable/reference/arrays.indexing.html#advanced-indexing

# NumPy Exercises

### Exercise: multiplication table
Create "{0, ..., 10} x {0, ..., 10} multiplication table" using only
- multiplication of 1D arrays, dummy dimensions and broadcasting mechanism.

In [None]:
### YOUR CODE BEGINS HERE ###

### YOUR CODE ENDS HERE ###

# Matplotlib

Intro to plotting

Let's start from creating Line graph:

In [None]:
import matplotlib.pyplot as plt

study_hours = [2, 4, 3, 5, 1]
test_scores = [75, 88, 82, 95, 68]

# Create the line plot
plt.plot(study_hours, test_scores)

# Add labels and title
plt.xlabel("Study Hours")
plt.ylabel("Test Scores")
plt.title("Study Hours vs. Test Scores")

# Show the plot
plt.show()

Then lets explore barplots

In [None]:
genres = ["Comedy", "Action", "Drama", "Sci-Fi"]
votes = [10, 8, 5, 3]

# Create the bar chart
plt.bar(genres, votes)

# Add labels and title
plt.xlabel("Movie Genres")
plt.ylabel("Number of Votes")
plt.title("Favorite Movie Genres in Class")

# Show the plot
plt.show()


In [None]:
# TODO: stwórz line plot, którego wykres to prosta linia od punktu (1, 1) do punktu (10, 10)

In [None]:
# TODO: stwórz line plot, którego wykres to prosta linia idąca przez punkty (1, 1), (2, 2), ..., (10, 10)