## Inspecting Arrays

```python
print("Shape:", arr.shape)
print("Data type:", arr.dtype)
```

---

# 3. Indexing, Slicing, and Modifying

## Indexing

```python
# Single element
print(arr[0])  # First element
```

## Slicing

```python
# Slice elements 1 to 3
print(arr[1:4])
```

## Modifying Elements

```python
# Change first element
arr[0] = 100
print(arr)
```

---

# 4. Array Operations

NumPy supports element-wise operations.

## Basic Arithmetic

```python
# Create two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)

# Element-wise multiplication
print(a * b)

# Scalar multiplication
print(2 * a)
```

## Mathematical Functions

```python
print(np.sin(a))
print(np.log(b))
```

---

# 5. Useful Functions

## Aggregations

```python
# Sum
print(np.sum(arr))

# Mean
print(np.mean(arr))

# Max
print(np.max(arr))
```

## Reshaping Arrays

```python
# Reshape a 1D array to 2D
arr2d = np.arange(12).reshape((3, 4))
print(arr2d)
```

---

# 6. Mini-Exercises

### 6.1 Create an array of 10 random numbers and compute their mean

```python
# Your code here
random_array = np.random.rand(10)
print(random_array)
print("Mean:", np.mean(random_array))
```

### 6.2 Create a 5x5 array of zeros and change the center element to 1

```python
# Your code here
matrix = np.zeros((5, 5))
matrix[2, 2] = 1
print(matrix)
```

### 6.3 Create two 1D arrays and compute their dot product

```python
# Your code here
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Dot product:", np.dot(a, b))
```

---

# Congratulations! 🎉

You've learned the basics of **NumPy**! 

Next, we move on to **Pandas** to learn how to handle data tables efficiently.

---

# Quick Recap
- **NumPy arrays** are faster and more memory-efficient than Python lists.
- Use **element-wise operations** for speed.
- Useful functions: `sum()`, `mean()`, `reshape()`, etc.

See you in the next notebook!


# NumPy

NumPy stands for numerical python. It is the main mathematical workhorse for python.

If you want to learn more about it, visit [https://numpy.org/](https://numpy.org/).

In [2]:
# Import numpy
import numpy as np

## Creating Arrays
# From a Python list

In [3]:
# Create a 1-dimensional array (i.e. a vector)
arr_1d = np.array([1, 2, 3, 4, 5])
print(arr_1d)
print(type(arr_1d))

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [12]:
# Create a 2-dimensional array (i.e. a matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
print(type(arr_2d))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


In [13]:
# Contrary to lists, NumPy arrays behave correctly with regards to the basic OPERATORS
print(arr_1d * 2)
#
print("-" * 20)
#
print(arr_2d + 5.8)

[ 2  4  6  8 10]
--------------------
[[ 6.8  7.8  8.8]
 [ 9.8 10.8 11.8]]


In [14]:
# You can verify data about the values by accessing their arguments
print("Shape:", arr_2d.shape)
print("Data type:", arr_1d.dtype)

Shape: (2, 3)
Data type: int64


In [82]:
list_arr_1d = [1, 2, 3, 4, 5]
#
print(list_arr_1d)
print(arr_1d)
#
print("*" * 20)
#
print(list_arr_1d.__add__)
print(arr_1d.__add__)

[1, 2, 3, 4, 5]
[1 2 3 4 5]
********************
<method-wrapper '__add__' of list object at 0x000001A8DFBF8940>
<method-wrapper '__add__' of numpy.ndarray object at 0x000001A8C7366970>


# Slicing and accessing elements

It is done like with conventional Python lists.

In [98]:
# The most general form
# Select everything
print(arr_2d[:, :])

[[1 2 3]
 [4 5 6]]


In [101]:
# The most general form
# Select everything
print(arr_2d[0, :])

#
print("*" * 20)

# The : can be omitted in the case of rows
print(arr_2d[0,])

[1 2 3]
********************
[1 2 3]


In [106]:
# The most general form
# Select everything
print(arr_2d[[0, 1], :])
#
print("*" * 20)
#
print(arr_2d[0:1, :])
#
print("*" * 20)
#
print(arr_2d[0:2, :])

[[1 2 3]
 [4 5 6]]
********************
[[1 2 3]]
********************
[[1 2 3]
 [4 5 6]]


In [108]:
# The most general form
# Select everything
print(arr_2d[:, [0, 1]])


[[1 2]
 [4 5]]


In [109]:
print(arr_2d[:,0])
#
print("-" * 20)
#
print(arr_2d[1])

[1 4]
--------------------
[4 5 6]


# Important methods

In [31]:
print(arr_2d)
#
print("*" * 20)
#
print(arr_2d.flatten())

[[1 2 3]
 [4 5 6]]
********************
[1 2 3 4 5 6]


In [70]:
# Getting particular values
print("The mean value:", np.mean(a=arr_2d))
print("The minimum value:", np.min(a=arr_2d))
print("The maximum value:", np.max(a=arr_2d))
print("The standard deviation:", np.std(a=arr_2d))

The mean value: 3.5
The minimum value: 1
The maximum value: 6
The standard deviation: 1.707825127659933


# Special matrices

In [None]:
## Special arrays can be built for linear algebra

# Zeros
zeros = np.zeros((3, 3))
print(zeros)

# Ones
ones = np.ones((2, 4))
print(ones)

# Identity matrix
print(np.eye(N=3))

# Random numbers
randoms = np.random.randn(2, 4)
print(randoms)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[-0.98278376 -1.75741086 -0.12597606  0.4714257 ]
 [-0.14363501 -1.75270714  1.65420769 -0.77414674]]


In [20]:
# Create two matrices
x1 = np.random.randn(2, 4)
x2 = np.random.randn(4, 3)

# See what they look like
print(x1)
print(x2)

[[-1.04010323 -0.74950739 -0.68388976  1.28730243]
 [ 0.53775621 -0.22328257  0.27027873 -0.25773923]]
[[-1.37674233  0.9886126  -1.56275261]
 [ 0.02439585 -0.90002453 -0.01067206]
 [-0.20918737  0.82149002  0.40323609]
 [ 0.40419462  2.02269558  0.8242591 ]]


In [15]:
# Compute their scalar product
x1.dot(x2)

NameError: name 'x1' is not defined

In [None]:
# Compute their scalar product using the @ operator
x1 @ x2

array([[ 0.03208695,  0.10824856, -0.08137401],
       [-1.40884911, -2.15640244,  1.50336391]])

In [30]:
# You can verify data about the values by accessing their arguments
print("Shape:", x1.shape)
print("Data type:", x1.dtype)
x1.diagonal()

Shape: (2, 4)
Data type: float64


array([ 0.10587821, -1.55012716])

# Creating and reading your own

In [22]:
# You can build your own arrays by OOOOO
ex_array = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

# See what it looks like
print(ex_array)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [23]:
# The function works on other formats besides .txt BTW
tricho = np.genfromtxt(fname="data/trichoptera.csv", skip_header=1, delimiter=",")

# See what it looks like
print(tricho)
print(tricho.shape)

[[0. 0. 1. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 1. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
(220, 56)


# 3. Indexing, Slicing, and Modifying

In [27]:
# Use the axis parameter to define which axis you want to add along

# Total sum
print("Total sum of the trichoptera dataset:")
print(tricho.sum(axis=None))

# Row sums
print("\nColumn sums of the trichoptera dataset:")
print(tricho.sum(axis=0))

# Column sums
print("\nRow sums of the trichoptera dataset:")
print(tricho.sum(axis=1))

Total sum of the trichoptera dataset:
1651.0

Column sums of the trichoptera dataset:
[ 95.  59. 127. 148.  64. 121.  73.  60.  54.  81.  98.  65.  52.  62.
  37.  71.  34.  57.  39.  28.  16.  26.  37.  26.   8.  11.   4.   7.
   9.   9.  13.   8.   5.   3.   5.   6.   1.   3.   3.   3.   1.   5.
   2.   2.   2.   1.   1.   1.   1.   1.   1.   1.   1.   1.   1.   1.]

Row sums of the trichoptera dataset:
[ 1.  1.  2.  2.  1.  1.  2.  1.  3.  2.  2.  1.  1.  3.  2.  2.  2.  1.
  0.  0.  1.  0.  7.  6. 12.  9. 11.  8.  7.  6.  9.  4.  7.  7.  8. 17.
  8. 11.  9.  5.  1.  8. 11.  2. 14.  9. 16. 19. 10. 11. 13. 12. 14. 11.
 14. 15. 12. 12. 13. 11. 12.  2.  0.  4. 12.  3. 14. 11. 17. 15. 14.  9.
 15. 12. 10. 13. 13. 14. 12. 14. 16. 15. 10.  4.  2.  7. 10.  7. 19.  7.
 19. 16. 13. 10. 12. 14. 17. 10. 11. 16. 15. 14. 13. 16. 10.  3.  0.  4.
 15.  9. 14.  6. 15. 12. 10. 11.  8. 17. 12.  8.  6. 12. 12. 10. 14. 13.
  6.  5.  0.  5. 14.  8.  9.  3. 12.  9. 12.  9.  8.  6.  7.  5.  5.  7.
  9.  8

# Array manipulation

In [None]:
# I actually didn't tell you the whole story. The 220 points refer to the SAME 22
# points sampled repeatedly over 10 periods.
# (it's more complicated but let's stick to that)

In [48]:
# First flatten the array
# This gives you a 10*22*56=12320 long 1D array.
#
# Then reshape it into a 10x22x56 3D array
tricho_3d = tricho.flatten().reshape(10, 22, 56)
#tricho_3d = tricho.reshape(10, 22, 56)
#
print(tricho_3d)

[[[0. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  ...
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[1. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [1. 1. 1. ... 0. 0. 0.]
  ...
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[1. 1. 1. ... 0. 0. 0.]
  [1. 1. 1. ... 0. 0. 0.]
  [1. 1. 1. ... 0. 0. 0.]
  ...
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 ...

 [[1. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [1. 0. 1. ... 0. 0. 0.]
  ...
  [0. 0. 1. ... 0. 0. 0.]
  [1. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  ...
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]

 [[0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 1. ... 0. 0. 0.]
  ...
  [0. 0. 1. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]
  [0. 0. 0. ... 0. 0. 0.]]]


In [49]:
# See information about it
print(tricho_3d.shape)

(10, 22, 56)


In [None]:
# Use the axis parameter to define which axis you want to add along
print("\nSums along the first axis (time) of the 3d array:")
print(tricho_3d.sum(axis=0).shape)
#
print("\nSums along the second axis (sites) of the 3d array:")
print(tricho_3d.sum(axis=1).shape)
#
print("\nSums along the third axis (species) of the 3d array:")
print(tricho_3d.sum(axis=2).shape)



Sums along the first axis (time) of the 3d array:
(22, 56)

Sums along the second axis (species) of the 3d array:
(10, 56)

Sums along the third axis (sites) of the 3d array:
(10, 22)


In [54]:
tricho_3d.sum(axis=2)

array([[ 1.,  1.,  2.,  2.,  1.,  1.,  2.,  1.,  3.,  2.,  2.,  1.,  1.,
         3.,  2.,  2.,  2.,  1.,  0.,  0.,  1.,  0.],
       [ 7.,  6., 12.,  9., 11.,  8.,  7.,  6.,  9.,  4.,  7.,  7.,  8.,
        17.,  8., 11.,  9.,  5.,  1.,  8., 11.,  2.],
       [14.,  9., 16., 19., 10., 11., 13., 12., 14., 11., 14., 15., 12.,
        12., 13., 11., 12.,  2.,  0.,  4., 12.,  3.],
       [14., 11., 17., 15., 14.,  9., 15., 12., 10., 13., 13., 14., 12.,
        14., 16., 15., 10.,  4.,  2.,  7., 10.,  7.],
       [19.,  7., 19., 16., 13., 10., 12., 14., 17., 10., 11., 16., 15.,
        14., 13., 16., 10.,  3.,  0.,  4., 15.,  9.],
       [14.,  6., 15., 12., 10., 11.,  8., 17., 12.,  8.,  6., 12., 12.,
        10., 14., 13.,  6.,  5.,  0.,  5., 14.,  8.],
       [ 9.,  3., 12.,  9., 12.,  9.,  8.,  6.,  7.,  5.,  5.,  7.,  9.,
         8.,  8., 11.,  5.,  2.,  0.,  5., 14.,  4.],
       [10.,  7., 14.,  9., 13.,  8., 11.,  9.,  8.,  3.,  4.,  4.,  7.,
         6.,  8.,  8.,  5.,  3.,  0., 

In [42]:
# Giving it nothing will give you the sum of the entire dataset
print("Total sum of the 3-dimensional array:")
print(tricho_3d.sum(axis=None))

# Use the axis parameter to define which axis you want to add along
print("\nSums along the first axis (time) of the 3d array:")
print(tricho_3d.sum(axis=0))
#
print("\nSums along the second axis (species) of the 3d array:")
print(tricho_3d.sum(axis=1))
#
print("\nSums along the third axis (sites) of the 3d array:")
print(tricho_3d.sum(axis=2))


Total sum of the 3-dimensional array:
1651.0

Sums along the first axis (time) of the 3d array:
[[ 6.  3. 10. ...  0.  0.  0.]
 [ 3.  2.  9. ...  0.  0.  0.]
 [ 7.  4. 10. ...  0.  0.  0.]
 ...
 [ 0.  1.  8. ...  0.  0.  0.]
 [ 4.  3.  8. ...  0.  0.  0.]
 [ 2.  2.  0. ...  0.  0.  0.]]

Sums along the second axis (species) of the 3d array:
[[ 0.  0.  5.  0.  0.  4.  0.  0.  0.  7.  0.  0.  0.  0.  0.  1.  0.  0.
   0.  0. 12.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.  1.
   0.  0.]
 [ 9.  1. 13. 18.  1. 15.  3. 11. 22. 18. 10.  6.  1.  0.  0. 12.  0. 10.
   0.  1.  1.  1.  0.  6.  0.  3.  0.  2.  1.  0.  2.  3.  0.  0.  0.  0.
   0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   1.  0.]
 [15. 15. 14. 16.  0. 13. 16. 12. 10.  7. 18. 10.  0.  4. 11. 13.  6. 14.
   4.  9.  2.  0.  7.  3.  0.  1.  0.  1.  4.  0.  6.  0.  0.  0.  2.  0.
   0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  

In [96]:
# You can also select a particular slice
tricho_3d[0:2, :, 2].sum()

np.float64(18.0)

In [None]:


## Indexing

```python
# Single element
print(arr[0])  # First element
```

## Slicing

```python
# Slice elements 1 to 3
print(arr[1:4])
```

## Modifying Elements

```python
# Change first element
arr[0] = 100
print(arr)
```



SyntaxError: invalid syntax (988533085.py, line 6)

# LINALG

In [55]:
# Create a square matrix that could be a covariance matrix
# between two variables
S = np.array([[1.0, 0.8],
              [0.8, 1.0]])

In [56]:
# Compute its determinant
print("The determinant of S:")
print(np.linalg.det(S))
print(type(np.linalg.det(S)))

The determinant of S:
0.3599999999999999
<class 'numpy.float64'>


In [57]:
# Get the inverse of the S matrix
Sm1 = np.linalg.inv(S)

print("The inverse of S:")
print(Sm1)
print(type(Sm1))

The inverse of S:
[[ 2.77777778 -2.22222222]
 [-2.22222222  2.77777778]]
<class 'numpy.ndarray'>


In [58]:
print("\nThe result of Sm1 x S:")
print(Sm1 @ S)
print(type(Sm1 @ S))

print("\nThe result of S x Sm1:")
print(S @ Sm1)
print(type(S @ Sm1))


The result of Sm1 x S:
[[1.00000000e+00 0.00000000e+00]
 [2.12175956e-16 1.00000000e+00]]
<class 'numpy.ndarray'>

The result of S x Sm1:
[[1.00000000e+00 2.12175956e-16]
 [0.00000000e+00 1.00000000e+00]]
<class 'numpy.ndarray'>


In [60]:
# Perform eigenanalysis
S2 = np.array([[1.0, 0.5],
               [2.0, 1.0]])

print("The determinant of S2:")
print(np.linalg.det(S2))
print(type(np.linalg.det(S2)))

The determinant of S2:
0.0
<class 'numpy.float64'>


In [62]:
# Invert matrix S2 which is singular
# Uncomment at your own risk
# (There ain't no risk, it's mathematically impossible)
#np.linalg.inv(S2)

In [5]:
#np.linalg.cholesky
#np.linalg.eig
#np.linalg.qr
#np.linalg.svd
#np.linalg.inv

Sm1 = np.linalg.inv(S)

print("The inverse of S:")
print(Sm1)

print("\nThe result of Sm1 S:")
print(Sm1 @ S)

print("\nThe result of S Sm1:")
print(S @ Sm1)

The inverse of S:
[[ 2.77777778 -2.22222222]
 [-2.22222222  2.77777778]]

The result of Sm1 S:
[[1.00000000e+00 0.00000000e+00]
 [2.12175956e-16 1.00000000e+00]]

The result of S Sm1:
[[1.00000000e+00 2.12175956e-16]
 [0.00000000e+00 1.00000000e+00]]


# The return of list unpacking

In [None]:
print(np.linalg.eig(S))

EigResult(eigenvalues=array([1.8, 0.2]), eigenvectors=array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]]))


SVDResult(U=array([[-0.70710678, -0.70710678],
       [-0.70710678,  0.70710678]]), S=array([1.8, 0.2]), Vh=array([[-0.70710678, -0.70710678],
       [-0.70710678,  0.70710678]]))

In [4]:
import numpy as np


# Perform eigenanalysis
S = np.array([[1.0, 0.8],
          [0.8, 1.0]])


lam, U = np.linalg.eig(S)


In [87]:
U.mean(axis=1)

array([-5.55111512e-17,  7.07106781e-01])

In [90]:
print(U)
print(U * U)
print(U @ U)

[[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
[[0.5 0.5]
 [0.5 0.5]]
[[-2.22044605e-16 -1.00000000e+00]
 [ 1.00000000e+00 -2.22044605e-16]]


In [None]:
def pca(X):
    n = X.shape[0]
    X = X - X.mean(axis=1)
    S = 1/(n - 1.0) * X.T @ X
    _, U = np.linalg.eig(S)
    F = X @ U
    return F

In [97]:
def pca(X):
    n = X.shape[0]
    X = X - X.mean(axis=1)
    S = 1/(n - 1.0) * X.T @ X
    eig_out = np.linalg.eig(S)
    U = eig_out[1]
    F = X @ U
    return F

In [17]:
# Understanding _
# You 
3

3

In [18]:
_

3

# Masked matrices

In [63]:
# Generate a 5x5 matrix with values either -1 or 1
ex_array = np.random.choice(a=[0.0, 1.0, 2.0, 3.0, -999.0], size=(5, 5))

print(ex_array, "\n")

mymask = ex_array == -999

print(mymask)

print("\nmymask is a: ", type(mymask))

ex_mask = np.ma.masked_array(ex_array, mymask)

print("\nex_mask is a: ", type(ex_mask), "\n")

print(ex_mask)

# Replace values that are equal to 5 with np.nan
ex_nan = ex_array.copy()

ex_nan[ex_nan == -999] = np.nan

print("\nex_mask is a: ", type(ex_mask), "\n")

print(ex_nan)

[[-999. -999.    1. -999.    2.]
 [   1.    2.    2.    3.    2.]
 [-999. -999.    1. -999. -999.]
 [   3.    3.    2.    2.    0.]
 [   0.    1.    2. -999.    2.]] 

[[ True  True False  True False]
 [False False False False False]
 [ True  True False  True  True]
 [False False False False False]
 [False False False  True False]]

mymask is a:  <class 'numpy.ndarray'>

ex_mask is a:  <class 'numpy.ma.MaskedArray'> 

[[-- -- 1.0 -- 2.0]
 [1.0 2.0 2.0 3.0 2.0]
 [-- -- 1.0 -- --]
 [3.0 3.0 2.0 2.0 0.0]
 [0.0 1.0 2.0 -- 2.0]]

ex_mask is a:  <class 'numpy.ma.MaskedArray'> 

[[nan nan  1. nan  2.]
 [ 1.  2.  2.  3.  2.]
 [nan nan  1. nan nan]
 [ 3.  3.  2.  2.  0.]
 [ 0.  1.  2. nan  2.]]


In [None]:
# Print out the mean of these arrays
print("The mean of ex_array is:", ex_array.mean())
print("The mean of ex_mask is:", ex_mask.mean())
print("The mean of ex_nan is:", ex_nan.mean())
print("The nanmean of ex_nan is:", np.nanmean(ex_nan))

The mean of ex_array is: -238.4
The mean of ex_mask is: 1.7894736842105263
The mean of ex_nan is: nan
The nanmean of ex_nan is: 1.7894736842105263


In [77]:
(1 + 1/2**53) == 1
52/2

26.0