## Numpy Crash Course

### Numpy in Nutshell

![image-2.png](attachment:image-2.png)

### Array
**Points to Cover:**
- Creating NumPy arrays from lists, tuples, or using functions like `arange()`, `zeros()`, `ones()`, `linspace()`, etc.
- Difference between 1D, 2D, and multi-dimensional arrays.
- Understanding array data types and typecasting.
- Indexing, slicing, and modifying elements.

![image.png](attachment:image.png)

In [1]:
import numpy as np

In [2]:
d1  = np.array([1,2,3])
d1.shape

(3,)

In [3]:
d2 = np.array([[1,2,3], [4,5,6]])
print(d2)
d2.shape

[[1 2 3]
 [4 5 6]]


(2, 3)

In [4]:
np.array([[1,2,3], [4,5,6]]), np.array([[7,8,9], [11,22,33]])

(array([[1, 2, 3],
        [4, 5, 6]]),
 array([[ 7,  8,  9],
        [11, 22, 33]]))

In [5]:
d3 = np.array([[[1,2,3], [4,5,6]], [[7,8,9], [11,22,33]]])
print(d3)
d3.shape

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [11 22 33]]]


(2, 2, 3)

In [6]:
# list(1d)-> 2d, list(2d) -> 3d, list(3d) -> 4d ...
np.array([range(1,5), range(1,5)])

array([[1, 2, 3, 4],
       [1, 2, 3, 4]])

In [7]:
np.arange(0,10, 2)

array([0, 2, 4, 6, 8])

In [8]:
# 2x2 zeros array
z2 = np.zeros((2,2), dtype='int64')
print(z2)
z2.shape, z2.dtype

[[0 0]
 [0 0]]


((2, 2), dtype('int64'))

In [9]:
o2 = np.ones((2,2), dtype='int')
o2, o2.dtype

(array([[1, 1],
        [1, 1]]),
 dtype('int32'))

In [10]:
print(d1)

d1[0], d1[1], d1[2]

[1 2 3]


(1, 2, 3)

In [11]:
print(d2)

d2[0,2]
d2[1,1]

[[1 2 3]
 [4 5 6]]


5

In [12]:
d3

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [11, 22, 33]]])

In [13]:
d3[0][1,2]
d3[0,1,2]

6

In [14]:
d3.dtype

d3_f = d3.astype(float)
d3_f
d3_f.dtype

dtype('float64')

### NaN and INF
**Points to Cover:**
- Introduction to `NaN` (Not a Number) and `INF` (Infinity) in NumPy.
- Generating `NaN` and `INF` values and understanding their impact on computations.
- Handling `NaN` and `INF` using functions like `np.isnan()`, `np.isinf()`, and `np.nan_to_num()`.
- Importance of handling missing or infinite values in machine learning datasets.


In [15]:
# 1/0 -> np.inf
# 0/0 -> np.nan

# np.inf+anynumber => np.inf
# np.inf + np.inf => np.inf
# np.inf - np.inf => np.nan

In [16]:
# nums = np.array([1,2,np.inf], dtype='int')
nums = np.array([1,2,np.inf], dtype='float')

nums

array([ 1.,  2., inf])

In [54]:
# np.isnan, np.isinf

# sum(np.isinf(nums))
np.isinf(nums).sum()

1

In [62]:
nums2 = np.array([1,2,3,4,np.inf, np.nan])

In [64]:
nums2.dtype
np.isnan(nums2).sum()
np.isnan(nums2)

array([False, False, False, False, False,  True])

In [65]:
np.nan_to_num(nums2, nan=-1, posinf=1000, neginf=-1000)

array([   1.,    2.,    3.,    4., 1000.,   -1.])

### Statistical Operations
**Points to Cover:**
- Common statistical operations on NumPy arrays: `mean()`, `median()`, `std()`, `var()`, `min()`, `max()`, `sum()`, etc.
- Applying operations along different axes.
- Use of statistical functions in data preprocessing and feature scaling in machine learning.
- Understanding the role of these operations in evaluating model performance metrics.


In [69]:
d1

array([1, 2, 3])

In [79]:
np.mean(d1), d1.mean(), np.var(d1), d1.var()

(2.0, 2.0, 0.6666666666666666, 0.6666666666666666)

In [83]:
np.std(d1), d1.std(), np.min(d1), d1.min(), np.max(d1), d1.max()

(0.816496580927726, 0.816496580927726, 1, 1, 3, 3)

In [85]:
print(d2)
np.mean(d2), d2.mean(), np.var(d2), d2.var()

[[1 2 3]
 [4 5 6]]


(3.5, 3.5, 2.9166666666666665, 2.9166666666666665)

In [91]:
np.mean(d2, axis=0) # mean along the cols
np.mean(d2, axis=1)

np.var(d2, axis=0)
np.var(d2, axis=1)

array([0.66666667, 0.66666667])

In [93]:
normalized_d2 = (d2-np.mean(d2, axis=0))/np.std(d2, axis=0)

In [94]:
normalized_d2

array([[-1., -1., -1.],
       [ 1.,  1.,  1.]])

### Shape, Reshape, Ravel, Flatten
**Points to Cover:**
- Understanding the shape of arrays using `shape`.
- Reshaping arrays using `reshape()` and `resize()`.
- Flattening arrays into 1D using `ravel()` and `flatten()`.
- Practical examples in reshaping datasets for model input in machine learning.
  

![image.png](attachment:image.png)

In [97]:
d21 = d2.ravel()

In [99]:
d2

array([[1, 2, 3],
       [4, 5, 6]])

In [100]:
d21[3] = 0

In [101]:
d21

array([1, 2, 3, 0, 5, 6])

In [104]:
d2.shape, d3.shape

((2, 3), (2, 2, 3))

In [108]:
d2.reshape(1, 6)

array([[1, 2, 3, 0, 5, 6]])

In [109]:
d21.shape

(6,)

In [110]:
d21f = d2.flatten()

In [112]:
d21f[5] = 0
d21f

array([1, 2, 3, 0, 5, 0])

In [113]:
d2

array([[1, 2, 3],
       [0, 5, 6]])

In [None]:
# image kind of data
# 100 x (28x28)

In [118]:
images = np.random.rand(100, 28, 28)

In [120]:
images.shape

(100, 28, 28)

In [121]:
images = images.reshape(100, 28*28)
images.shape

(100, 784)

### Sequence, Repetitions, and Random Numbers
**Points to Cover:**
- Generating sequences with `arange()` and `linspace()`.
- Repeating elements using `tile()` and `repeat()`.
- Generating random numbers with `random()`, `randint()`, `normal()`, and setting random seeds with `seed()`.
- Importance of random numbers in model initialization, data shuffling, and testing different scenarios in machine learning.


In [124]:
np.arange(1,10).reshape(3,3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [126]:
np.arange(1,10, 0.1)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5,
       3.6, 3.7, 3.8, 3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
       4.9, 5. , 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6. , 6.1,
       6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7. , 7.1, 7.2, 7.3, 7.4,
       7.5, 7.6, 7.7, 7.8, 7.9, 8. , 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7,
       8.8, 8.9, 9. , 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9])

In [132]:
np.linspace(1,10, 91)

array([ 1. ,  1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9,  2. ,
        2.1,  2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9,  3. ,  3.1,
        3.2,  3.3,  3.4,  3.5,  3.6,  3.7,  3.8,  3.9,  4. ,  4.1,  4.2,
        4.3,  4.4,  4.5,  4.6,  4.7,  4.8,  4.9,  5. ,  5.1,  5.2,  5.3,
        5.4,  5.5,  5.6,  5.7,  5.8,  5.9,  6. ,  6.1,  6.2,  6.3,  6.4,
        6.5,  6.6,  6.7,  6.8,  6.9,  7. ,  7.1,  7.2,  7.3,  7.4,  7.5,
        7.6,  7.7,  7.8,  7.9,  8. ,  8.1,  8.2,  8.3,  8.4,  8.5,  8.6,
        8.7,  8.8,  8.9,  9. ,  9.1,  9.2,  9.3,  9.4,  9.5,  9.6,  9.7,
        9.8,  9.9, 10. ])

In [135]:
print(d1)
np.repeat(d1, 2)

[1 2 3]


array([1, 1, 2, 2, 3, 3])

In [136]:
np.tile(d1, 2)

array([1, 2, 3, 1, 2, 3])

In [140]:
d2

array([[1, 2, 3],
       [0, 5, 6]])

In [142]:
np.repeat(d2, 2, axis=1)

array([[1, 1, 2, 2, 3, 3],
       [0, 0, 5, 5, 6, 6]])

In [139]:
np.tile(d2, 2)

array([[1, 2, 3, 1, 2, 3],
       [0, 5, 6, 0, 5, 6]])

In [143]:
## Random number generation

from numpy import random

In [163]:
random.random((3,3))

array([[0.4753463 , 0.7572311 , 0.33699572],
       [0.29201093, 0.63430136, 0.15390651],
       [0.83500548, 0.05360529, 0.19261558]])

In [165]:
random.randint(0,100, size=(3,3))

array([[33, 24, 28],
       [95, 14, 59],
       [ 4, 73, 67]])

In [166]:
random.rand(5)

array([0.92462633, 0.53099471, 0.00914431, 0.25861502, 0.18459479])

In [176]:
x = random.normal(0,10, 1000)

In [177]:
x.std()

9.936000131224816

In [174]:
x.mean()

-0.004101604740558578

In [184]:
random.normal(0,10, 10)

array([  4.29804604,  -6.58550812,  -7.03004137,  -1.47357112,
        -6.65155323,   2.6635061 ,  13.78116472,   7.06820981,
       -21.7339405 ,   6.49432895])

In [194]:
random.seed(0)
random.normal(0,10, 10)

array([17.64052346,  4.00157208,  9.78737984, 22.40893199, 18.6755799 ,
       -9.7727788 ,  9.50088418, -1.51357208, -1.03218852,  4.10598502])

### Where(), ArgMax(), ArgMin()
**Points to Cover:**
- Using `where()` to conditionally select elements from arrays.
- Finding the index of the maximum value using `argmax()`.
- Finding the index of the minimum value using `argmin()`.
- Applications in feature selection, anomaly detection, and optimizing model parameters.


In [204]:
d1 = np.arange(1,20)
d1

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [199]:
np.where(d1>10, d1, 0)

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [202]:
d1.max(), np.argmax(d1), d1[18]

(19, 18, 19)

In [205]:
d1.min(), np.argmin(d1)

(1, 0)

In [208]:
print(d2)
np.mean(d2, axis=0)

[[1 2 3]
 [0 5 6]]


array([0.5, 3.5, 4.5])

In [209]:
np.argmax(np.mean(d2, axis=0))

2

In [211]:
d2[:, 2]
d2[:, np.argmax(np.mean(d2, axis=0))]

array([3, 6])

In [214]:
print(d3)

np.where(d3%2 !=0, d3, 0)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [11 22 33]]]


array([[[ 1,  0,  3],
        [ 0,  5,  0]],

       [[ 7,  0,  9],
        [11,  0, 33]]])

### Save Preprocessed Arrays with File Read and Write
**Points to Cover:**
- Reading and writing arrays to and from files using `save()`, `load()`, `savetxt()`, `loadtxt()`.
- Understanding binary format with `.npy` and text format with `.txt`.
- Importance of file I/O in saving preprocessed data, model weights, and results.


In [216]:
d3p = np.where(d3%2 !=0, d3, 0)

In [222]:
# .npy <- numpy file format

import os
os.makedirs('data', exist_ok=True)

In [223]:
np.save('data/d3p.npy', d3p)

In [228]:
for i, d3_2 in enumerate(d3p):
    print(d3_2)
    np.savetxt(f'data/d3p_slice_{i}.txt', d3_2)

[[1 0 3]
 [0 5 0]]
[[ 7  0  9]
 [11  0 33]]


In [229]:
np.load('data/d3p.npy')

array([[[ 1,  0,  3],
        [ 0,  5,  0]],

       [[ 7,  0,  9],
        [11,  0, 33]]])

In [231]:
np.load('data/d3p.npy').shape

(2, 2, 3)

In [233]:
s1 = np.loadtxt('data/d3p_slice_0.txt')
s2 = np.loadtxt('data/d3p_slice_1.txt')

In [235]:
np.array([s1, s2]).shape

(2, 2, 3)

### Concatenate and Sorting
**Points to Cover:**
- Concatenating arrays using `concatenate()`, `vstack()`, `hstack()`.
- Sorting arrays using `sort()`, `argsort()`.
- Practical uses in combining datasets, sorting features, and data preprocessing.


In [243]:
d11 = np.arange(1, 5)
d12 = np.arange(5, 9)

d11, d12

(array([1, 2, 3, 4]), array([5, 6, 7, 8]))

In [244]:
np.concatenate((d11, d12))

array([1, 2, 3, 4, 5, 6, 7, 8])

In [245]:
np.hstack((d11, d12))

array([1, 2, 3, 4, 5, 6, 7, 8])

In [246]:
np.vstack((d11, d12))

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [249]:
np.random.shuffle(d1)

In [250]:
d1

array([19, 17, 11,  3, 18, 14, 12,  1,  4,  5, 13, 16, 10,  6,  2,  9, 15,
        8,  7])

In [251]:
np.sort(d1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [254]:
np.sort(d1)[::-1]

array([19, 18, 17, 16, 15, 14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,
        2,  1])

In [255]:
d1

array([19, 17, 11,  3, 18, 14, 12,  1,  4,  5, 13, 16, 10,  6,  2,  9, 15,
        8,  7])

In [256]:
np.argsort(d1)

array([ 7, 14,  3,  8,  9, 13, 18, 17, 15, 12,  2,  6, 10,  5, 16, 11,  1,
        4,  0], dtype=int64)

### Working with Dates
**Points to Cover:**
- Handling dates in NumPy using `datetime64` and `timedelta64`.
- Creating date ranges and performing date arithmetic.
- Practical examples in time-series data analysis, including generating date indices and calculating time differences.


In [258]:
np.datetime64('2024-01-04')

numpy.datetime64('2024-01-04')

In [264]:
a1 = np.array(['2024-09-01', '2024-10-01', '2024-11-01'], dtype='datetime64')

In [263]:
a1[0]-a1[1]

numpy.timedelta64(-30,'D')

In [266]:
a1[2] - a1[1]

numpy.timedelta64(31,'D')

In [270]:
a2 = np.arange('2024-09-01', '2025-08-31', dtype='datetime64[D]')

In [272]:
data = np.random.random(len(a2))

In [276]:
a2.shape, data.shape

((364,), (364,))