# Lesson 2-extra: More about NumPy, Pandas

> Instructor: [Yuki Oyama](mailto:y.oyama@lrcs.ac), [Prprnya](mailto:nya@prpr.zip)
>
> The Christian F. Weichman Department of Chemistry, Lastoria Royal College of Science

This material is licensed under <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC BY-NC-SA 4.0</a><img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg" alt="" style="max-width: 1em;max-height:1em;margin-left: .2em;"><img src="https://mirrors.creativecommons.org/presskit/icons/by.svg" alt="" style="max-width: 1em;max-height:1em;margin-left: .2em;"><img src="https://mirrors.creativecommons.org/presskit/icons/nc.svg" alt="" style="max-width: 1em;max-height:1em;margin-left: .2em;"><img src="https://mirrors.creativecommons.org/presskit/icons/sa.svg" alt="" style="max-width: 1em;max-height:1em;margin-left: .2em;">


After finishing Lesson 2, we have a basic understanding of NumPy, which still has many useful features worth learning. However, when processing large and complicated datasets, we need to use a more powerful tool, Pandas. In this lesson, we are going to learn the advanced features of NumPy in the first half, and Pandas in the second half.

In [None]:
import numpy as np

## Indexing and Masking

Our lesson begins by expanding the concept of indexing. NumPy arrays support not only integer indexing and slicing, but also several different but useful indexing methods.

### Arbitrary Indexing

In [None]:
mat = np.array([
    [0.0, 0.1, 0.2, 0.3, 0.4],
    [1.0, 1.1, 1.2, 1.3, 1.4],
    [2.0, 2.1, 2.2, 2.3, 2.4],
    [3.0, 3.1, 3.2, 3.3, 3.4],
])
mat

In [None]:
rows = [2, 2, 0]
cols = [0, 1, 3]

mat[rows, cols]

### Masking

In [None]:
primes = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37])
primes

In [None]:
mask = primes % 4 == 3
mask

In [None]:
primes[mask]

## Array Manipulation

NumPy provides a series of functions that allow us to perform various operations on arrays like kneading dough.

### Reshape

In [None]:
arr1d = np.arange(12)
arr1d

In [None]:
arr1d.size, arr1d.ndim, arr1d.shape

In [None]:
arr2d = arr1d.reshape(3, 4)
arr2d

In [None]:
arr2d.size, arr2d.ndim, arr2d.shape

In [None]:
arr1d.reshape(2, 5)

In [None]:
arr3d = arr1d.reshape(2, 3, 2)
arr3d

### Flatten

In [None]:
arr3d.flatten()

### Transpose

In [None]:
arr2d.transpose()

In [None]:
arr3d.transpose()

## Vectorization

NumPy arrays naturally support batch operations, which is good indeed. However, when we need to combine arrays with Python native functions, we may encounter scenarios where batch operations cannot be used, and we have to fall back to regular loops and iterations. Fortunately, NumPy provides a feature called [**vectorization**](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html), which can be used to convert Python native functions into vectorized functions that support array operations using decorator syntax.

In [None]:
squares = np.arange(5) ** 2
squares

In [None]:
np.sqrt(squares)

In [None]:
import math
math.sqrt(squares)

In [None]:
@np.vectorize
def sqrt(n: int | float) -> float:
    return math.sqrt(n)

sqrt(squares)

## Broadcasting

When we need to combine arrays with different shapes, NumPy provides a feature called [**broadcasting**](https://numpy.org/doc/stable/user/basics.broadcasting.html), which can be used to convert arrays with different shapes into arrays with the same shape.


In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([2, 5])
a, b

In [None]:
a + b

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([2])
a * b

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[1, 1, 1],
              [2, 2, 2],
              [3, 3, 3]])
a + b

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
b = np.array([8, 7])
a + b

In [None]:
# TODO: More precise examples

## Linear Algebra in NumPy (`numpy.linalg`)

Since matrices can also be viewed as two-dimensional arrays, it is natural for NumPy to implement various concepts in linear algebra.

In [None]:
vec1 = np.array([1, 1, 1])
vec2 = np.array([-1, -1, 1])
vec1, vec2

In [None]:
norm1 = np.linalg.norm(vec1)
norm2 = np.linalg.norm(vec2)
norm1, norm2

In [None]:
np.vdot(vec1, vec2)

In [None]:
np.cross(vec1, vec2)

In [None]:
mat1 = np.array([
    [1, 0, 0],
    [0, 0, -1],
    [0, 1, 0],
])
mat2 = np.array([
    [0, -1, 0],
    [1, 0, 0],
    [0, 0, 1],
])
mat = mat1 @ mat2
mat

In [None]:
x = np.array([1, 0, 0])
y = np.array([0, 1, 0])
z = np.array([0, 0, 1])
mat @ x, mat @ y, mat @ z

In [None]:
# TODO: Trace, determinant, eigenvalues and eigenvectors

## Pandas

- `Series` and `DataFrame`
- loading and saving files