#### NumPy ndarray Overview

NumPy provides the **ndarray**: a fixed-type, N-dimensional container for numerical data that stores values in contiguous memory.

Why it is faster than list: 
- Fixed type = No type checking needed
- Contiguous memory = CPU can grab data in one go
- C language loops = No slow Python interpretation
- SIMD operations = Multiple calculations at once
- Less memory overhead = More data fits in fast cache

##### Why It Matters

- **Speed**: Numerical operations are implemented in C/Fortran; vectorized operations avoid Python loops and their overhead.
- **Memory Efficiency**: Homogeneous storage (all elements same dtype) reduces memory per item compared to Python lists.
- **Convenience**: Broadcasted arithmetic, linear algebra, random generation, reductions, reshaping — all made consistent.
- **Ecosystem**: pandas, SciPy, scikit-learn, matplotlib all expect NumPy arrays.

##### When to Use

Numerical computing, vectorized math over many elements, matrix computations, preprocessing large numeric datasets.

In [68]:
import numpy as np
from random import random 

##### NumPy shines when there are large quantities of “homogeneous” (same-type) data to be processed on the CPU.
- Speed: numerical operations are implemented in C/Fortran; vectorized operations avoid Python loops and their overhead.

Most NumPy arrays have some restrictions. For instance:

- All elements of the array must be of the same type of data.

- Once created, the total size of the array can’t change.

- The shape must be “rectangular”, not “jagged”; e.g., each row of a two-dimensional array must have the same number of columns.

In [2]:
a = np.array([1, 2, 3, 4, 5, 6])
a

array([1, 2, 3, 4, 5, 6])

- If dtype is 8 bytes and shape is (2,3) (2 rows, 3 cols), then:

- stride[1] = 8 bytes to move one column (next element in a row).

- stride[0] = 3 * 8 = 24 bytes to move one row (skip 3 elements).
- So offset for index (i,j) from base pointer = i*stride[0] + j*stride[1].

In [3]:
a = np.array([[1, 2, 3],
             [4, 5, 6]], dtype=np.int64)

print(a.shape)
print(a.ndim)
print(a.strides)
print(a.size)
print(a.dtype)
print(a.nbytes)
print(a.itemsize)

(2, 3)
2
(24, 8)
6
int64
48
8


In [246]:
elements = np.linspace(0, 10, num=15) 
print(elements)
elements = np.zeros((2,3))   
print(elements)
elements = np.ones((2,3))   
print(elements)
np.arange(0, 10, 2)    


[ 0.          0.71428571  1.42857143  2.14285714  2.85714286  3.57142857
  4.28571429  5.          5.71428571  6.42857143  7.14285714  7.85714286
  8.57142857  9.28571429 10.        ]
[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1.]
 [1. 1. 1.]]


array([0, 2, 4, 6, 8])

In [5]:
elements = np.full((3,3), 7) 
print(elements) 
elements = np.identity(3) 
print(elements) 
np.eye(3)
elements = np.diag([1,2,3])   
print(elements)



[[7 7 7]
 [7 7 7]
 [7 7 7]]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1 0 0]
 [0 2 0]
 [0 0 3]]


Use .astype() to cast: a.astype(np.float32). Casting creates a new array.

In [247]:
x = np.arange(0, 11, 2) 
print(x[2:7])
print(x[::2])

[ 4  6  8 10]
[0 4 8]


In [7]:
components = np.arange(12).reshape(3, 4)
print(components,"\n")
print(components[0:3, 1:3])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]] 

[[ 1  2]
 [ 5  6]
 [ 9 10]]


##### Slicing returns a view, not a copy (most of the time).

In [8]:
arr = np.arange(10)
s = arr[2:6] # .copy()
s[0] = 40

arr

array([ 0,  1, 40,  3,  4,  5,  6,  7,  8,  9])

@ or np.dot() is matrix multiplication for 2D arrays.

In [9]:
a = np.array([1,2,3])
b = np.array([10,20,30])
print(a + b)
print(a * 2)
print(a @ b)
print(np.dot(a, b))

print(np.sin(a))

[11 22 33]
[2 4 6]
140
140
[0.84147098 0.90929743 0.14112001]


- _Vectorization_: instead of writing Python loops, use array ops — NumPy does the *heavy lifting in C*. This is the primary reason for speed-ups.

- _Ufuncs_: fast element-wise functions (np.add, np.sin, np.exp, np.maximum, etc.). They often have extra methods like .reduce, .accumulate

In [254]:
arr = np.array(
    set(([1,2,3,4]))
)
print(arr)


{1, 2, 3, 4}


In [11]:
A = np.ones((3,4))
print(A)
b = np.arange(4)
print(b)
A + b 

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[0 1 2 3]


array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

In [12]:
s = arr[2:5]
s.base is arr

True

- Slicing is a view — modifying a slice modifies the original.

- np.append is inefficient — it creates a new array repeatedly (not like list.append). Use lists and then np.array() or pre-allocate and fill.

- np.arange with floats can give surprising endpoints due to float rounding; use linspace for exact num points.

- Integer division vs true division: / yields floats (true division), // yields floor division (result dtype depends).

- Boolean indexing returns a copy, not a view. E.g. b = a[a>0] is a copy.

- Non-contiguous arrays (e.g., transposed views) may be slower; np.ascontiguousarray() can help.

- Data type overflow: integer arrays wrap on overflow (no error). Be mindful of dtype range.

- Using object dtype kills vectorization; avoid unless necessary.

In [13]:
lst = [1,2,3]
arr = np.asarray(lst)
arr2 = np.asarray(arr) 
print(arr)
print(arr2)

[1 2 3]
[1 2 3]


In [14]:

elements = np.linspace(0, 10, num=15)
rounded_elements = np.round(elements, 2)
print(rounded_elements)



[ 0.    0.71  1.43  2.14  2.86  3.57  4.29  5.    5.71  6.43  7.14  7.86
  8.57  9.29 10.  ]


In [15]:
arr1 = np.ones((3,3))
np.fill_diagonal(arr1, 0)
print(arr1)

[[0. 1. 1.]
 [1. 0. 1.]
 [1. 1. 0.]]


In [16]:
elements = np.ones([3, 3]) 
print(elements * np.diag([1, 1, 1]))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


### Contiguous Memory (Data Stored Together)
Python List:
Memory: [pointer] -> [1] somewhere in memory
        [pointer] -> [2] somewhere else
        [pointer] -> [3] somewhere else

List items are scattered in computer memory
Computer must "jump around" to find each number
Like books scattered in different rooms of a house

NumPy Array:
Memory: [1][2][3][4][5] <- All together in one place

All numbers stored RIGHT NEXT TO each other
Computer reads them in one smooth motion
Like books arranged on one shelf - grab them all at once!

###  Vectorization (No Loops in Python)
Python List (with loop):
pythonresult = []
for num in [1, 2, 3, 4]:
    result.append(num + 5)  # Python interpreter runs this 4 times

Python must interpret each loop step
Interpretation is SLOW

NumPy Array (vectorized):
pythonresult = np.array([1, 2, 3, 4]) + 5  # Magic happens in C code!

The loop runs in C language (super fast)
Python just gives the order once
Like asking 1 expert to do 1000 tasks vs. explaining 1000 times

Why C is faster:

C is a "compiled" language (converted to machine code)
Python is "interpreted" (translated line by line)
C code runs directly on CPU, Python needs a translator

 Reshaping means changing the shape of an array.

- The shape of an array is the number of elements in each dimension.

- By reshaping we can add or remove dimensions or change number of elements in each dimension.

In [197]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(3, 2, 2)
print(newarr)
newarr = arr.reshape(3, 4)
print(newarr)


[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


You are allowed to have one "unknown" dimension.

Meaning that you do not have to specify an exact number for one of the dimensions in the reshape method.

In [198]:

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(2, -1, 2)

print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


**Flattening** array means converting a multidimensional array into a 1D array.

We can use reshape(-1) to do this.

In [199]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4 5 6]


In [200]:
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for value in np.nditer(arr):
    print(value)

1
2
3
4
5
6
7
8


In [201]:
arr = np.array([1, 2, 3])

for idx, i in np.ndenumerate(arr):
    print(idx, i)

(0,) 1
(1,) 2
(2,) 3


In [202]:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)

[array([1, 2]), array([3, 4]), array([5, 6])]


In [203]:
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

newarr = np.array_split(arr, 3)

print(newarr)


[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]


In [204]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
print(x)

(array([1, 3, 5, 7]),)


In [205]:
arr = np.array(['banana', 'cherry', 'apple'])

print(np.sort(arr))

['apple' 'banana' 'cherry']


In [206]:
arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False False  True  True]
[43 44]


In [207]:
A = np.array([[1, 1], [4, 0]])
I = np.eye(2)
result = A @ I  
print(result)  

[[1. 1.]
 [4. 0.]]


In [208]:
rng = np.random.default_rng()
arr = rng.random(5)
arr

array([0.56742193, 0.13946793, 0.99172281, 0.69080111, 0.05474674])

In [209]:
arr = np.random.randint(1, 10, size=(2, 4))
print(arr)

[[6 6 2 9]
 [1 1 4 8]]


In [210]:
arr = np.random.normal(loc=50, scale=50, size=5)
arr

array([ 62.1926485 , 146.47461183,  63.95246305,  83.77833376,
        48.67497061])

In [211]:
arr1 = np.empty(3)
print(arr1) 

[1. 1. 4.]


Transpose

In [212]:
arr = np.array([[1, 2, 3], 
                [4, 5, 6]])

arr.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [213]:
m = np.matrix([[1,2],[3,4]])
np.asarray(m)    
print(m)
np.asanyarray(m) 

[[1 2]
 [3 4]]


matrix([[1, 2],
        [3, 4]])

 Boolean Indexing (Filtering)

In [214]:
arr = np.array([1, 2, 3, 4, 5])
arr[arr > 3] = 0
print(arr)

[1 2 3 0 0]


In [215]:
arr = np.array([5, 10, 15, 20, 25, 30])

# AND condition (&)
result = arr[(arr > 10) & (arr < 25)]
print(result)  

# OR condition (|)
result = arr[(arr < 10) | (arr > 25)]
print(result)  

# NOT condition (~)
result = arr[~(arr == 20)]
print(result)  

[15 20]
[ 5 30]
[ 5 10 15 25 30]


In [216]:
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])


result = arr[arr > 6]
print(result) 


[7 8 9]


In [217]:
arr = np.array([[10, 20, 30],
                [40, 50, 60],
                [70, 80, 90]])

# rows = np.arange(arr.shape())


# cols = [0, 1, 2]
# arr[rows, cols] = 0
# print(arr)

3D and Higher Dimensional Indexing

In [218]:
arr = np.arange(24).reshape(2, 3, 4)
print(arr[:, :, 1])
print(arr)

flatted = arr.flatten()
print(flatted)

[[ 1  5  9]
 [13 17 21]]
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


### WHY reshape exists:

- Prepare data for different operations
- Matrix multiplication needs specific shapes
- Neural networks need specific input shapes
- Image processing (flatten/unflatten images)

In [219]:
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

print(arr.ravel())  

print(arr.ravel(order='F')) 

[1 2 3 4 5 6]
[1 4 2 5 3 6]


In [220]:
A = np.array([[1],
              [3],
              [4]])  

B = np.array([[5, 6, 7]])

print(np.dot(A.T, B.T))
result = A.T @ B.T
print(result)
print(np.matmul(A.T, B.T))

[[51]]
[[51]]
[[51]]


In [221]:
existing = np.array([[1, 2, 3], [4, 5, 6]])

arr = np.ones_like(existing)
print(arr)


[[1 1 1]
 [1 1 1]]


#### FromString

In [222]:
arr = np.fromstring("1-2-3-4-5", sep='-', dtype=int)
print(arr) 

[1 2 3 4 5]


In [223]:
np.random.random(np.arange(5))

array([], shape=(0, 1, 2, 3, 4), dtype=float64)

##### dtype - Data Type
This is SUPER IMPORTANT because it affects memory, speed, and precision!

In [224]:
# Reason 1: Memory Usage
arr_int8 = np.ones(1_000_000, dtype=np.int8)
arr_int64 = np.ones(1_000_000, dtype=np.int64)

print(arr_int8.nbytes)  
print(arr_int64.nbytes) 

1000000
8000000


In [225]:
# Reason 2: Precision
arr_f32 = np.array([1.123456789], dtype=np.float32)
print(arr_f32)  

arr_f64 = np.array([1.123456789], dtype=np.float64)
print(arr_f64) 

[1.1234568]
[1.12345679]


In [226]:
#  Converting existing array
arr = np.array([1, 2, 3])
arr2 = arr.astype(np.float64)
arr2

array([1., 2., 3.])


#### WHY transpose matters:

In [231]:
A = np.array([[1, 2],
              [3, 4]]) 

B = np.array([[5, 6]])

A @ B.T

array([[17],
       [39]])

In [None]:
            #   Maths Science English
scores = np.array([[85, 90, 78],  # Student 1
                   [92, 88, 95],  # Student 2
                   [78, 85, 82]]) # Student 3

subject_scores = scores.T

math_avg = subject_scores[0].mean()
print(math_avg)

85.0


In [None]:
arr = np.arange(24).reshape(2, 3, 4)
print("arr shape",arr.shape)  
print(arr, "\n")

transposed = arr.transpose(1, 0, 2)
print("transposed arr shape ",transposed.shape) 
print(transposed)

arr shape (2, 3, 4)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] 

transposed arr shape  (2, 3, 4)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [None]:
a = [[1, 0], [0, 1]] 
b = [[4, 1], [2, 2]]
np.dot(a, b)

array([[4, 1],
       [2, 2]])

In [None]:
a = np.arange(3*4*5*6).reshape((3,4,5,6))
b = np.arange(3*4*5*6)[::-1].reshape((5,4,6,3))
print(np.dot(a, b)[2,3,2,1,2,2])
print(sum(a[2,3,2,:] * b[1,2,:,2]))
# print(b)

247683
499128


WHY axis parameter:

- axis=0: Stack along rows (add more rows)
- axis=1: Stack along columns (add more columns)
- Higher dimensions: same logic

CRITICAL: Shapes must match except in concatenation axis!

In [None]:
arr1 = np.ones((2, 3))  
arr2 = np.ones((2, 4))    
print(arr1,"\n\n", arr2,"\n")
result = np.concatenate([arr1, arr2], axis=1)
print(result.shape)  

[[1. 1. 1.]
 [1. 1. 1.]] 

 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]] 

(2, 7)


### matmul differs from dot in two important ways:

- Multiplication by scalars is not allowed, use * instead.

- Stacks of matrices are broadcast together as if the matrices were elements, respecting the signature (n,k),(k,m)->(n,m):

#### matmul automatically detects that the first dimension (3) is a “stack” dimension

- It performs matrix multiplication for each “stack”

- This is batch matrix multiplication — a powerful feature often used in ML.

#### Rule for np.dot(a, b) when > 2D:

The sum-product is over the last axis of a and the second-to-last axis of b,
and no broadcasting of extra axes occurs.

So:

- a: (3, 2, 4) → last axis = 4

- c: (3, 4, 2) → second-to-last axis = 4 ✅ (they match)

Then, result shape =

- a's all axes except the last + b's all axes except the second-to-last

→ (3, 2) + (3, 2) = (3, 2, 3, 2)

In [256]:
a = np.ones([3, 2, 4])
c = np.ones([3, 4, 2])
print(np.dot(a, c).shape)
print(np.matmul(a, c).shape)
# print("With dot: ",np.dot(a, c), "\n\n")
# print("With matmul",np.matmul(a, c))

a = np.array(4)
b = np.array(5)
print(a * b) # @ raise error

(3, 2, 3, 2)
(3, 2, 2)
20


In [243]:

arr = np.array([[[1, 2], [3, 4]],
                [[5, 6], [7, 8]]])

sum = np.sum(arr, axis=2)
print(sum)

[[ 3  7]
 [11 15]]


Alignment: The shapes of the two arrays are (4, 3) and (3,). 
- The NumPy broadcasting rules compare dimensions from right to left.
- Dimension matching: Since arr2 has fewer dimensions, its shape is conceptually padded with a "1" on the left, becoming (1, 3).
- Stretching: The first dimension of arr2 is then "stretched" from size 1 to size 4 to match the first dimension of arr1. This is done virtually and efficiently without creating copies of the data.
- Resulting operation: This process creates a conceptual intermediate array that looks like this:

In [None]:
arr1 = np.array([
    [0, 0, 0],
    [10, 10, 10],
    [20, 20, 20],
    [30, 30, 30]
])  

arr2 = np.array([1, 2, 3])  
result = arr1 + arr2

[[1 2 3]
 [4 5 6]]


In [233]:
arr = np.ones((1, 3, 1, 4))
print(arr.shape) 

squeezed = arr.squeeze(axis=0)
print(squeezed.shape) 


squeezed = arr.squeeze(axis=(0, 2))
print(squeezed.shape) 

(1, 3, 1, 4)
(3, 1, 4)
(3, 4)


Masked Arrays