# My NumPy Notebook
>**[Back to index](../README.md)**

Numpy is a library for ___scientific computing___ in Python. It provides a ___multidimensional array object___, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

The array object in NumPy is called ___ndarray___, it provides a lot of supporting functions that make working with ndarray very easy.

NumPy arrays are stored at one **continuous place in memory unlike lists**, so processes can access and manipulate them very **efficiently**. This behavior is called **locality of reference** in computer science. This is the main reason why NumPy is faster than Python lists.
- - - - - 

# Differences between NumPy Arrays and Python Lists:

## Memory Storage for NumPy Arrays

### Contiguous Memory

NumPy arrays store their elements in a single, contiguous block of memory.

Each element has the same data type, so they all occupy the same number of bytes, which makes indexing and arithmetic operations very efficient.


### Metadata Stored

Pointer to the data buffer: A reference to the block of memory where the actual array elements are stored.

Data type (dtype): Information about the type of elements (e.g., int32, float64) and how many bytes each element requires.

Shape: The dimensions of the array (e.g., (3, 3) for a 3x3 matrix).

Strides: The number of bytes to step in each dimension when traversing the array.


### NumPy Array Example

```python
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)
```

**Element size:** Each int32 element takes 4 bytes.
**Data block:** Stores the values [1, 2, 3, 4, 5, 6] contiguously in memory.
**Metadata:**
Pointer to the data block.
Data type (numpy.ndarray).
Shape (2, 3).
Strides (12, 4) (step size to move between rows and columns).

## Memory Storage for Python Lists

### Pointers to Objects
Python lists are heterogeneous, so they do not store the actual data values directly.

Instead, they store pointers (references) to objects in memory. Each element of the list can point to data of a different type.

### Metadata Stored
Pointer to the list object: A reference to the memory location where the list resides.

Size: The number of elements in the list.

Capacity: The total space allocated for the list (to avoid frequent resizing during dynamic growth).

Pointers to elements: Each slot in the list stores a pointer to the actual data.

### Python List Example

```python
py_list = [1, "two", 3.0]
```

#### Metadata:

Pointer to the list object

Size 3

Capacity number of bytes allocated for the list.

Element pointers

py_list[0]: Points to an int object storing 1.

py_list[1]: Points to a str object storing "two".

py_list[2]: Points to a float object storing 3.0.

#### Each element is a full Python object, which includes

Type information: Identifies whether the object is an integer, string, float, etc.

Reference count: Tracks how many references point to the object for memory management.

Data value: The actual data of the object.



# Import NumPy

Normally numpy is imported as np

If not installed, install it by:

```python
pip install numpy
```
> **Note:**
> Run the following code to import NumPy before starting:


In [2]:
import numpy as np

# Basic initializations

## Basic initialization of a NumPy array

The list must be homogenous, i.e. all elements must be of the same type, and all demensions.

When creating a NumPy array without specifying the data type:
- **For integer values, NumPy will use `np.int32` or `np.int64` depending on your system**
- **For floating point values, NumPy will use `np.float64`**
- For boolean values, NumPy will use `np.bool_`
- For strings, NumPy will use `np.str_` with length equal to the longest string


In [12]:
# np.array(list / tuple) -> ndarray
arrayA = np.array([[1, 2, 3], [4, 5, 6]])
print(arrayA)
print(arrayA.dtype)

arrayZ = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
print(arrayZ)
print(arrayZ.dtype)

[[1 2 3]
 [4 5 6]]
int64
[[1. 2. 3.]
 [4. 5. 6.]]
float64


## Initialize a NumPy array with specified data type
Data type:
- np.int8
- np.int16
- np.int32
- np.int64
- np.float16
- np.float32
- np.float64
- np.bool_
- np.str_


In [7]:
# np.array(list / tuple, dtype=np.data type / dtype='data type') -> ndarray
arrayB = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
print(arrayB)

[[1. 2. 3.]
 [4. 5. 6.]]


# Get functions

## Get the dimension of a NumPy array

In [6]:
# arrayName.ndim -> int (number of dimensions)
arrayA.ndim

2

## Get the shape of a NumPy array

In [10]:
# arrayName.shape -> tuple (int of rows, int of columns, ...)
arrayA.shape

(2, 3)

## Get the data type of elements in a NumPy array

In [11]:
# arrayName.dtype -> dtype('type of data')
arrayA.dtype

dtype('int64')

## Get the size of each element of a NumPy array in bytes

In [9]:
# arrayName.itemsize -> int (element size in bytes)
arrayB.itemsize
# (arrayB.dtype = np.float32)

4

## Get the size of a NumPy array (total number of elements)

In [18]:
# arrayName.size -> int (total number of elements)
arrayA.size

6

## Get the total size of NumPy array elements in bytes (only the elememts)

In [10]:
# arrayName.nbytes() -> int (number of bytes taken by the items)
# arrayA = [[1, 2, 3],
#           [4, 5, 6]] (default data type with integers: int64)
arrayA.nbytes

48

## Get a specific element(s) from a NumPy array

### Get a element

In [11]:
# arrayName[row, column, ...] -> np.dataType(value)
arrayA[1, 1]

5

### Get a range of elements

In [26]:
arrayC = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   [11,12,13,14,15,16,17,18,19,20],
                   [21,22,23,24,25,26,27,28,29,30],
                   [31,32,33,34,35,36,37,38,39,40],
                   [41,42,43,44,45,46,47,48,49,50]], dtype=np.int16)

arrayName[(row) start: end: step, (coloum) start: end: step, ...] -> np.array
step is **optional**, default is 1.


In [20]:
arrayC[0:3:2, 0:7:3]
tmp = arrayC[0:3:2]
Tmp = arrayC[0:,0:7:3]

# Redefine value(s) in a NumPy array

## Redefine 1 single value to 1 single value
>arrayName[row, column, ...] = value

In [6]:
arrayC[0, 0] = 21
print(arrayC)

[[21  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]
 [31 32 33 34 35 36 37 38 39 40]
 [41 42 43 44 45 46 47 48 49 50]]


## Redefine a range of values to 1 single value

In [7]:
# arrayName[(row) start: end: step, (coloum) start: end: step, ...] = value
arrayC[1, 0:7:2] = 21
print(arrayC)

[[21  2  3  4  5  6  7  8  9 10]
 [21 12 21 14 21 16 21 18 19 20]
 [21 22 23 24 25 26 27 28 29 30]
 [31 32 33 34 35 36 37 38 39 40]
 [41 42 43 44 45 46 47 48 49 50]]


## Redefine a range of values to a list/NumPy array
### Redefine the arrayC starting from the 2th row and 1th column to a list
>Tips:
>1. The list/NumPy array must have the same **shape** as the range to be redefined.
>2. The list/NumPy array must have the same **data type** as the range to be redefined.


In [35]:
# arrayName[(row) start: end: step, (coloum) start: end: step, ...] = list / ndarray / tuple
arrayC[1:3, 0:6] = np.array([[44, 55, 66, 77, 88, 99], [99, 88, 77, 66, 55, 44]])
#print(arrayC)

### redefine the arrayC starting from the 3th row and 7th column to a list

In [27]:
arrayC[3:, 6:] = np.array([21, 42, 63, 84])
#print(arrayC)

# Special initialization of NumPy arrays

## All zeros/ones array (default data type: np.float64)

In [22]:
# np.zeros(tuple for size / list for size / ndarray for size) -> ndarray
arr = np.zeros((2,3))
print(arr)
print(arr.shape)
print(arr.dtype)
print()
arr = np.ones(np.array([2,3,3]), dtype=np.int32)
print(arr)
print(arr.shape)
print(arr.dtype)

[[0. 0. 0.]
 [0. 0. 0.]]
(2, 3)
float64

[[[1 1 1]
  [1 1 1]
  [1 1 1]]

 [[1 1 1]
  [1 1 1]
  [1 1 1]]]
(2, 3, 3)
int32


## All any same numbers

In [26]:
# np.full(tuple for size / list for size / ndarray for size, value to fill) -> ndarray
arr = np.full((2, 2), 7)
print(arr)
arr = np.full(np.array([2,3]), 2.1)
print(arr)

[[7 7]
 [7 7]]
[[2.1 2.1 2.1]
 [2.1 2.1 2.1]]


## All the same sequences

In [27]:
# np.full(tuple for size / list for size / ndarray for size, sequence to fill) -> ndarray
arr = np.full(np.array([2,3]), [2.1, 3.1, 4.1])
print(arr)

[[2.1 3.1 4.1]
 [2.1 3.1 4.1]]


## full_like

In [30]:
# np.full_like(ndarray for size / tuple for size / list for size, sequence to fill) -> ndarray
arr = np.full_like(arrayC, 21)
print(arr)
arr = np.full_like([[123, 456, 789],
                    [234, 567, 890]], 29)
print(arr)

[[21 21 21 21 21 21 21 21 21 21]
 [21 21 21 21 21 21 21 21 21 21]
 [21 21 21 21 21 21 21 21 21 21]
 [21 21 21 21 21 21 21 21 21 21]
 [21 21 21 21 21 21 21 21 21 21]]
[[29 29 29]
 [29 29 29]]


# Initializing with random values

## Random float values from 0 to 1 in a NumPy array with designated shape (default data type: np.float64)

In [33]:
# np.random.rand(int for rows, int for columns, ...(NOT TUPLE!)) -> ndarray
arr = np.random.rand(2, 3)
print(arr)
print(arr.dtype)

[[0.72386136 0.63283858 0.17287236]
 [0.52707211 0.91092857 0.62394526]]
float64


if with tuple as the shape argument:

In [37]:
# np.random.rand(tuple for shape) -> ndarray
np.random.random_sample((2, 3)) # same as np.random.rand((2, 3))

array([[0.82420618, 0.07794334, 0.69636217],
       [0.52502305, 0.69368714, 0.37053738]])

## Random integer values in a NumPy array with desinated range and shape (default data type: np.int64)

In [42]:
# np.random.randint(low=0, high, size=tuple / list / ndarray for size) -> ndarray
arr = np.random.randint(8,10, size=(3, 4))# 8 <= ndarray < 10
print(arr)
print(arr.dtype)

[[9 8 9 9]
 [8 8 9 8]
 [8 8 8 8]]
int64


## Identity matrix (default data type: np.float64)

In [47]:
# np.identiry(int for dimension)
arr = np.identity(4)
print(arr)
print(arr.dtype)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
float64


## Repeat elements of a NumPy array

### axis=None: repeat the whole array in a single row

In [50]:
# np.repeat(ndarray to repeat, number of times to repeat, axis=None)
arr = np.repeat(np.array([[1, 2, 3],
                          [4, 5, 6]]), 3, axis=None)
print(arr)

[1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6]


### axis=int less than target array dimension: repeat each elements along the axis

In [None]:
# np.repeat(ndarray to repeat, number of times to repeat, axis=None)
arr = np.repeat(np.array([[[ 1,  2,  3,  4,  5], [11, 12, 13, 14, 15]], 
                         [[21, 22, 23, 24, 25], [31, 32, 33, 34, 35]]]), 3, axis=1)
print(arr)

In [68]:
# [[1, 2],
#  [3, 4]]
arr = np.array([[1, 2], [3, 4]])
result = np.repeat(arr, 2, axis=0)
print(result, end = '\n\n')

arr = np.array([[1, 2], [3, 4]])
result = np.repeat(arr, 2, axis=1)
print(result, end = '\n\n')

arr = np.array([[1, 2], [3, 4]])
result = np.repeat(arr, [2, 3], axis=0)  # with different repeat times
print(result)


[[1 2]
 [1 2]
 [3 4]
 [3 4]]

[[1 1 2 2]
 [3 3 4 4]]

[[1 2]
 [1 2]
 [3 4]
 [3 4]
 [3 4]]


# NumPy array copying

Simply applying

```python
ndarrayB = ndarrayA
```

will only point the variable `ndarrayB` to the same memory location as `ndarrayA`.

We can use the `copy()` method to actually copy an array with a different memory location.

In [69]:
# ndarray.copy() -> ndarray
arrayA = np.array([[1, 2, 3],
                   [4, 5, 6]])
arrayB = arrayA.copy()
print(arrayA, end = '\n\n')
print(arrayB, end = '\n\n')
arrayA[0, 0] = 0
print(arrayA, end = '\n\n')
print(arrayB, end = '\n\n')

[[1 2 3]
 [4 5 6]]

[[1 2 3]
 [4 5 6]]

[[0 2 3]
 [4 5 6]]

[[1 2 3]
 [4 5 6]]



# Simple NumPy Mathematics (element-wise)

## Adding / subtracting / multiplying / dividing / powing a value to an array

When changing the element dtype from int to float, the ndarray will be automatically converted to float dtype.

In [24]:
# ndarray +, -, *, /, //, %, ** value -> ndarray
arrayA = np.array([[1, 2, 3],
                   [4, 5, 6]])
print(arrayA)
print(arrayA.dtype) # int64

# +
arrayB = arrayA + 10
print(arrayB)
print(arrayB.dtype) # int64

#-
arrayB = arrayA - 0.1
print(arrayB)
print(arrayB.dtype) # float64

# *
arrayB = arrayA * 2
print(arrayB)
print(arrayB.dtype) # int64

# /
# Be advised, the result of normal division is always float64 type, even if the result are all integers.
arrayC = np.array([[2, 4, 6],
                   [8, 10, 12]], dtype=np.int32)
arrayB = arrayC / 2
print(arrayB)
print(arrayB.dtype) # float64

# //
# The result of floor division from an integer array stays integer type.
arrayB = arrayC // 2
print(arrayB)
print(arrayB.dtype)

# %
arrayB = arrayA % 2
print(arrayB)
print(arrayB.dtype)

# **
arrayB = arrayA ** 2
print(arrayB)
print(arrayB.dtype)

[[1 2 3]
 [4 5 6]]
int64
[[11 12 13]
 [14 15 16]]
int64
[[0.9 1.9 2.9]
 [3.9 4.9 5.9]]
float64
[[ 2  4  6]
 [ 8 10 12]]
int64
[[1. 2. 3.]
 [4. 5. 6.]]
float64
[[1 2 3]
 [4 5 6]]
int32
[[1 0 1]
 [0 1 0]]
int64
[[ 1  4  9]
 [16 25 36]]
int64


## Trigonometric functions

In [73]:
# np.sin/cos/tan...(ndarray) -> ndarray
arrayA = np.array([[0, np.pi/6],
                   [np.pi/3, np.pi/2]])
print(np.sin(arrayA))
print(np.cos(arrayA))
print(np.tan(arrayA), end = '\n\n')
arrayB = np.array([[0, 0.5],
                   [3**0.5/2, 1]])
print(np.arcsin(arrayB))
print(np.arccos(arrayB))
print(np.arctan(arrayB))

[[0.        0.5      ]
 [0.8660254 1.       ]]
[[1.00000000e+00 8.66025404e-01]
 [5.00000000e-01 6.12323400e-17]]
[[0.00000000e+00 5.77350269e-01]
 [1.73205081e+00 1.63312394e+16]]

[[0.         0.52359878]
 [1.04719755 1.57079633]]
[[1.57079633 1.04719755]
 [0.52359878 0.        ]]
[[0.         0.46364761]
 [0.71372438 0.78539816]]


# Linear algebra calculations
>[click here for more `linalg` functions](https://numpy.org/doc/stable/reference/routines.linalg.html)

## Matrix multiplication (cross-product)

In [79]:
# np.matmul.(ndarray, ndarray) -> ndarray
arrayA = np.array([[1, 2, 3],
                   [4, 5, 6]])
arrayB = np.array([[2, 4],
                   [1, 1],
                   [3, 1]])
#print(np.matmul(arrayA, arrayB)) same as arrayA @ arrayB
print(arrayA @ arrayB)

[[13  9]
 [31 27]]


## Determinant

In [82]:
# np.linalg.det(ndarray) -> float
print(np.linalg.det(np.array([[1, 2], 
                              [3, 4]])))

-2.0000000000000004


# Statistics

## Max min mean std normalization

In [84]:
# np.ndarray(ndarray, axis=None)
arrayA = np.array([[[5090,   200,   3,   4,   5,   6,   7,   8,   9,  10],
                    [  11,  300,  13,  14,  15,  16,  17,  18,  19,  20]],
                    
                    [[101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
                     [111, 112, 113, 114, 115, 116, 117, 118, 119, 120]]])
print(np.max(arrayA))
print(np.max(arrayA, axis=0))
# The 5090 term was originally 1, which is what written in the image below

5090
[[5090  200  103  104  105  106  107  108  109  110]
 [ 111  300  113  114  115  116  117  118  119  120]]


![arrayA visualization](/Users/ygzdysyy/Visual_Studio_Code_Projects/Python/Notes/NumPy_Notes/arrayA_visualization.jpg)

## Summation

In [81]:
# np.sum(ndarray, axis=None) -> ndarray
arrayA = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arrayA, axis=None))
print(np.sum(arrayA, axis=0))
print(np.sum(arrayA, axis=1))

21
[5 7 9]
[ 6 15]


# Reshape

In [85]:
# ndarray.reshape(tuple / list / ndarray for shape) -> ndarray
arrayA = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arrayA)

arrayB = arrayA.reshape((4, 2))
print(arrayB)

print(arrayB.T)

[[1 2 3 4]
 [5 6 7 8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[1 3 5 7]
 [2 4 6 8]]


# Vector stacking

## Vertical vector stacking

In [90]:
# np.vstack(tuple / list / ndarray for vector1, vector2, ...) -> ndarray
arrayB = np.vstack((np.array([1, 2, 3]), np.array([4, 5, 6])))
print(arrayB)
arrayA = np.vstack((np.array([[1, 2, 3, 4], [4, 5, 6, 7]]), np.array([[7, 8, 9, 10], [10, 11, 12, 13]])))
print(arrayA)

[[1 2 3]
 [4 5 6]]
[[ 1  2  3  4]
 [ 4  5  6  7]
 [ 7  8  9 10]
 [10 11 12 13]]


## Horizontal vector stacking

In [91]:
# np.hstack(tuple / list / ndarray for vector1, vector2, ...) -> ndarray
arrayB = np.hstack((np.array([1, 2, 3]), np.array([4, 5, 6])))
print(arrayB)
arrayA = np.hstack((np.array([[1, 2, 3, 4], [4, 5, 6, 7]]), np.array([[7, 8, 9, 10], [10, 11, 12, 13]])))
print(arrayA)

[1 2 3 4 5 6]
[[ 1  2  3  4  7  8  9 10]
 [ 4  5  6  7 10 11 12 13]]


# Miscellaneous

## dtype casting

In [86]:
# ndarray.astype(data type) -> ndarray
arrayA = np.array([1, 2, 3, 4, 5], dtype=np.float64)
arrayB = arrayA.astype(np.int32)
print(arrayA)
print(arrayA.dtype)
print(arrayB)
print(arrayB.dtype)

[1. 2. 3. 4. 5.]
float64
[1 2 3 4 5]
int32


## Load data from txt file

In [87]:
# np.genfromtxt('dataName', delimiter='delimiter') -> ndarray
arrayA = np.genfromtxt('datafile.txt', delimiter=',', dtype=np.int32)
print(arrayA)

[[   1    2    3    4]
 [   5    6    7 5090]]


## ! Ndarray boolean operations

In [88]:
# boolean ndarray operation
arrayB = ~ np.array([True, False]) # NOT
print(arrayB)
arrayB = np.array([True, True, False, False]) & np.array([True, False, True, False]) # AND
print(arrayB)
arrayB = np.array([True, True, False, False]) | np.array([True, False, True, False]) # OR
print(arrayB)
arrayB = np.array([True, True, False, False]) ^ np.array([True, False, True, False]) # XOR
print(arrayB)

[False  True]
[ True False False False]
[ True  True  True False]
[False  True  True False]


## Boolean masking and advanced indexing

### Return a boolean ndarray indicating whether each element fit the boolean expression or not.

In [89]:
# ndarray in boolean expression -> booleanNdarray
arrayA = np.array([1, 2, 3, 4, 5, 6, 5090])
arrayB = arrayA % 2 == 0
print(arrayB)
print(arrayB.dtype)

[False  True False  True False  True  True]
bool


### Jointing boolean ndarrays

In [91]:
# booleanNdarray booleanOperator booleanNdarray -> booleanNdarray
arrayA = np.array([1, 2, 3, 4, 5, 6, 7, 5090])
arrayB = (arrayA % 2 == 0) & (arrayA < 10)
print(arrayB)

[False  True False  True False  True False False]


### Return all the value that fit the boolean expression in the ndarray

In [98]:
# ndarray[booleanNdarray] -> ndarray (1 dimenssional)
arrayA = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]])
arrayB = arrayA[arrayA > 7]
print(arrayB)

[ 8  9 10  8  9 10 11 12 13 14]


### Return an boolean ndarray of whether any / all collection of values along a given axis fits a condition

In [92]:
# np.any / all(booleanNdarray, axis=None)
arrayA = np.array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
                   [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
                   [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]])
arrayB = np.any(arrayA > 25)
print(arrayB)
arrayB = np.any(arrayA > 25, axis=0)
print(arrayB)
arrayB = np.any(arrayA > 25, axis=1)
print(arrayB)

print()
arrayB = np.all(arrayA > 5)
print(arrayB)
arrayB = np.all(arrayA > 5, axis=0)
print(arrayB)
arrayB = np.all(arrayA > 5, axis=1)
print(arrayB)

True
[False False False False False  True  True  True  True  True]
[False False  True]

False
[False False False False False  True  True  True  True  True]
[False  True  True]


In [93]:
arrayA = np.array([[ 1,  2,  3,  4,  5],
                   [ 6,  7,  8,  9, 10],
                   [11, 12, 13, 14, 15],
                   [16, 17, 18, 19, 20],
                   [21, 22, 23, 24, 25],
                   [26, 27, 28, 29, 30]])
arrayB = arrayA[2:4, 0:2]
print(arrayB)
arrayB = arrayA[[0, 1, 2, 3], [1, 2, 3, 4]]# get [0,1] [1,2] [2,3] [3,4]
print(arrayB)


[[11 12]
 [16 17]]
[ 2  8 14 20]
