# Guide to NumPy for Data Science & AI 

**Course Module: Data Analysis with Python**

NumPy is the foundational library for numerical computing in Python. This notebook is designed to be a comprehensive reference for students transitioning from Python basics to the world of Data Science and Machine Learning.

### What is NumPy?
NumPy, or Numerical Python, is the cornerstone of the Python scientific computing stack. It provides a high-performance, multidimensional array object, and tools for working with these arrays. Its integration with C and Fortran libraries makes it incredibly fast for mathematical operations. Almost every data science library, including Pandas, Scikit-learn, and TensorFlow, is built on top of NumPy.

### Why is NumPy Essential?
1.  **Performance:** NumPy arrays are implemented in C and stored in a contiguous block of memory. This allows for vectorized operations (called ufuncs) that are orders of magnitude faster than performing equivalent operations with Python lists and loops.
2.  **Memory Efficiency:** Being stored contiguously, NumPy arrays are also more memory-efficient than Python lists, which are arrays of pointers to Python objects.
3.  **Functionality & Convenience:** NumPy provides a massive library of high-level mathematical functions, linear algebra routines, random number generators, and more. The syntax is concise and powerful, allowing you to express complex mathematical ideas in a few lines of code.

---
## 1. Getting Started: Installation and Importing

Before we begin, ensure NumPy is installed. The standard alias for importing is `np`.

In [2]:
# Standard import convention for NumPy
import numpy as np

In [2]:
# Check the installed version to ensure compatibility
print(f"NumPy Version: {np.__version__}")

NumPy Version: 2.2.5


---
## 2. The NumPy `ndarray`: Creation and Core Attributes

### 2.1. Simple Array Creation
The most basic way to create an array is from a Python list or tuple using `np.array()`.

In [3]:
# Create a 1-dimensional (1D) array from a list
list_1d = [1, 2, 3, 4, 5]
arr_1d = np.array(list_1d)
print("1D Array from list:")
print(arr_1d)


1D Array from list:
[1 2 3 4 5]


In [4]:
# CHECKING THE TYPE of the object itself
print("Object type:", type(arr_1d))


Object type: <class 'numpy.ndarray'>


In [26]:
# Create a 2-dimensional (2D) array from a list of lists
list_2d = [[1, 2, 3], [4, 5, 6]]
arr_2d = np.array(list_2d)
print(f'Type: {arr_2d.dtype}')
print(f'Shape: {arr_2d.shape}')
print("\n2D Array from list of lists:")
print(arr_2d)

Type: int64
Shape: (2, 3)

2D Array from list of lists:
[[1 2 3]
 [4 5 6]]


In [6]:
print(arr_2d.dtype)

int64


In [27]:
# What would happen here?

list_2d = [[1, 2, 3], [4, 5, 6, 7]]
try:
    arr_2d = np.array(list_2d)
    print("\n2D Array from list of lists:")
    print(arr_2d)
except ValueError as e:
    print("\nError creating 2D array from uneven lists:")
    print(e)


Error creating 2D array from uneven lists:
setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.


In [28]:
# And here?

list_2d = [[1, 2, 3], [4, 5, 'a']]
arr_2d = np.array(list_2d)
print("\n2D Array from list of lists:")
print(arr_2d)


2D Array from list of lists:
[['1' '2' '3']
 ['4' '5' 'a']]


In [23]:
arr_2d.dtype

dtype('<U21')

#### Practice Exercises (2.1)

**Exercise 1:** Create a 1D NumPy array containing the floating point numbers `[1.5, 2.5, 3.5, 4.5]`.
**Exercise 2:** Create a 3x3 NumPy array from a Python list of lists.

In [None]:
# Exercise 1

arr_source = [1.5, 2.5, 3.5, 4.5]
arr_1d = np.array(arr_source)

arr_1d

In [29]:
# Exercise 2

arr_source = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr_2d = np.array(arr_source)

arr_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
# Solution for Exercise 1
ex1_arr = np.array([1.5, 2.5, 3.5, 4.5])
print("Solution 1:")
print(ex1_arr)

# Solution for Exercise 2
ex2_list = [[-1, 0, 1], [1, -1, 0], [0, 1, -1]]
ex2_arr = np.array(ex2_list)
print("\nSolution 2:")
print(ex2_arr)

### 2.2. Array Creation with Built-in Functions
NumPy provides several highly efficient functions for creating arrays from scratch.

In [30]:
# `np.arange()` is similar to Python's `range()` but returns a NumPy array.
arange_arr = np.arange(0, 10, 2) # (start, stop, step)
print("np.arange(0, 10, 2):\n", arange_arr)

np.arange(0, 10, 2):
 [0 2 4 6 8]


In [None]:
# `np.zeros()` creates an array filled with zeros.
zeros_arr = np.zeros((2, 4), dtype=int) # Shape is passed as a tuple. With "dtype=int" everything will be "0", without it "0."
print("\nnp.zeros((2, 4)):\n", zeros_arr)


np.zeros((2, 4)):
 [[0 0 0 0]
 [0 0 0 0]]


In [33]:
# `np.ones()` creates an array filled with ones.
ones_arr = np.ones((3, 2))
print("\nnp.ones((3, 2)):\n", ones_arr)


np.ones((3, 2)):
 [[1. 1.]
 [1. 1.]
 [1. 1.]]


In [None]:
# `np.linspace()` creates an array with a specific number of evenly spaced points.
linspace_arr = np.linspace(0, 5, 3) # (start, stop, num_points)
print("\nnp.linspace(0, 5, 6):\n", linspace_arr)


np.linspace(0, 5, 6):
 [0.   1.25 2.5  3.75 5.  ]


In [39]:
# `np.full()` creates an array of a given shape filled with a specific value.
full_arr = np.full((2,3), 7)
print("\nnp.full((2,3), 7):\n", full_arr)



np.full((2,3), 7):
 [[7 7 7]
 [7 7 7]]


In [None]:
# `np.eye()` creates a 2D identity matrix (1s on the diagonal, 0s elsewhere).
# Always returns a square matrix.
eye_arr = np.eye(4)
print("\nnp.eye(4):\n", eye_arr)


np.eye(4):
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


#### Practice Exercises (2.2)

**Exercise 1:** Create an array of 50 ones.
**Exercise 2:** Create a 4x4 array of zeros, but with `float` as the data type.
**Exercise 3:** Create an array with 10 numbers evenly spaced between 0 and 1.

In [None]:
# Exercise 1
ex1 = np.ones(50)

ex1

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [48]:
# Exercise 2

ex2 = np.zeros((4, 4), dtype=float)

ex2

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [49]:
# Exercise 3

ex3 = np.linspace(0, 1, 10)

ex3

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [None]:
# Solution for Exercise 1
ex1_arr = np.ones(50)
print("Solution 1:")
print(ex1_arr)

# Solution for Exercise 2
ex2_arr = np.zeros((4,4), dtype=float)
print("\nSolution 2:")
print(ex2_arr)

# Solution for Exercise 3
ex3_arr = np.linspace(0, 1, 10)
print("\nSolution 3:")
print(ex3_arr)

### 2.3. Array Attributes: Dimensions, Shape, Size, and Type
These attributes provide metadata about the array.

In [None]:
# Let's create a sample 3D array for demonstration
arr = np.arange(24).reshape(2, 3, 4) # 2 matrices, each with 3 rows and 4 columns (3D)
# Number in 'arrange' and multiplication of what's inside reshape must be equal
print("Sample Array:\n", arr)

Sample Array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


In [None]:
# What would happen here?
# Number in 'arrange' and multiplication of what's inside reshape must be equal

arr = np.arange(24).reshape(3, 2, 4) 
print("Sample Array:\n", arr)

Sample Array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]]]


In [79]:
# Is it possible to create 4D array? If yes - how? If no - why?

arr = np.arange(24).reshape(3, 2, 2, 2) 

arr

array([[[[ 0,  1],
         [ 2,  3]],

        [[ 4,  5],
         [ 6,  7]]],


       [[[ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15]]],


       [[[16, 17],
         [18, 19]],

        [[20, 21],
         [22, 23]]]])

In [80]:
# a.ndim: NUMBER OF DIMENSIONS
print(f"\nNumber of Dimensions (a.ndim): {arr.ndim}")


Number of Dimensions (a.ndim): 4


In [81]:
# a.shape: ARRAY SHAPE as a tuple (depth, rows, columns)
print(f"Shape (a.shape): {arr.shape}")

Shape (a.shape): (3, 2, 2, 2)


In [82]:
# a.size: ARRAY SIZE (total ELEMENT COUNT)
print(f"Size/Element Count (a.size): {arr.size}")

Size/Element Count (a.size): 24


In [67]:
# a.dtype: NUMERIC 'TYPE' OF ELEMENTS
print(f"Numeric Type of elements (a.dtype): {arr.dtype}")

Numeric Type of elements (a.dtype): int64


In [68]:
# You can explicitly set the dtype during creation for different precision
float_arr = np.array([1, 2, 3,], dtype=np.float64)
print(f"\nFloat array has dtype: {float_arr.dtype}")


Float array has dtype: float64


### 2.4. Array Attributes: Memory Usage
You can inspect how much memory your array is using.

In [69]:
# a.ndim: NUMBER OF DIMENSIONS
print(f"\nNumber of Dimensions (a.ndim): {arr.ndim}")

# a.shape: ARRAY SHAPE as a tuple (depth, rows, columns)
print(f"Shape (a.shape): {arr.shape}")

# a.size: ARRAY SIZE (total ELEMENT COUNT)
print(f"Size/Element Count (a.size): {arr.size}")

# a.dtype: NUMERIC 'TYPE' OF ELEMENTS
print(f"Numeric Type of elements (a.dtype): {arr.dtype}")

# You can explicitly set the dtype during creation for different precision
float_arr = np.array([1, 2, 3], dtype=np.float64)
print(f"\nFloat array has dtype: {float_arr.dtype}")


Number of Dimensions (a.ndim): 4
Shape (a.shape): (3, 2, 2, 2)
Size/Element Count (a.size): 24
Numeric Type of elements (a.dtype): int64

Float array has dtype: float64


In [70]:
arr = np.arange(10, dtype=np.int64) # 64-bit integers
print("Array with 64-bit integers:", arr)
print("Array dtype:", arr.dtype)

# a.itemsize: BYTES PER ELEMENT
# An int64 is 8 bytes (64 bits / 8 bits per byte)
print(f"Bytes per Element (a.itemsize): {arr.itemsize}")

# a.nbytes: TOTAL BYTES OF MEMORY USED
# This is equivalent to a.size * a.itemsize
print(f"Total Bytes of Memory (a.nbytes): {arr.nbytes}")

Array with 64-bit integers: [0 1 2 3 4 5 6 7 8 9]
Array dtype: int64
Bytes per Element (a.itemsize): 8
Total Bytes of Memory (a.nbytes): 80


### 2.5. Array Copy vs. View and Conversion to List

In [71]:
a = np.arange(10)
print("Original 'a':", a)

Original 'a': [0 1 2 3 4 5 6 7 8 9]


In [72]:
# Simple assignment is NOT a copy; it's just another name for the same object.
b = a
b[0] = 100
print("'a' was changed when 'b' was changed:", a)

'a' was changed when 'b' was changed: [100   1   2   3   4   5   6   7   8   9]


In [73]:
# Slicing creates a VIEW, a view into the original data. Modifying the view modifies the original.
c_view = a[1:4]
c_view[0] = 999
print("'a' was also changed by modifying the view 'c_view':", a)

'a' was also changed by modifying the view 'c_view': [100 999   2   3   4   5   6   7   8   9]


In [74]:
# To create a true, independent ARRAY COPY, you must use the `.copy()` method.
d_copy = a.copy()
d_copy[0] = -500
print("'a' is finally safe from changes to the copy 'd_copy':", a)
print("The copy 'd_copy' is independent:", d_copy)


'a' is finally safe from changes to the copy 'd_copy': [100 999   2   3   4   5   6   7   8   9]
The copy 'd_copy' is independent: [-500  999    2    3    4    5    6    7    8    9]


In [75]:
# .tolist() provides a CONVERSION TO a standard Python LIST
a_list = a.tolist()
print(f"\n'a' converted to a Python list: {a_list}")
print("Type is now:", type(a_list))


'a' converted to a Python list: [100, 999, 2, 3, 4, 5, 6, 7, 8, 9]
Type is now: <class 'list'>


---
## 3. Indexing, Slicing, and Setting Elements

### 3.1. Basic ARRAY INDEXING
For **Multi-Dimensional Arrays**, you use a tuple of indices `[row, column]` to GET/SET elements.

In [3]:
arr_2d = np.array([[10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]])
print("2D Array with shape (ROWS,COLUMNS) of", arr_2d.shape, ":\n", arr_2d)
print("ELEMENT COUNT:", arr_2d.size)
print("NUMBER OF DIMENSIONS:", arr_2d.ndim)


2D Array with shape (ROWS,COLUMNS) of (3, 4) :
 [[10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]]
ELEMENT COUNT: 12
NUMBER OF DIMENSIONS: 2


In [4]:
# Get a single element: arr[row, col]
element = arr_2d[1, 2] # GET/SET element
print("\nElement at row 1, col 2:", element)



Element at row 1, col 2: 22


In [5]:
# You can also ADDRESS FIRST ROW USING SINGLE INDEX
first_row = arr_2d[0]
print("\nGetting the entire first row (index 0):", first_row)



Getting the entire first row (index 0): [10 11 12 13]


In [6]:
# Set an element to a new value
arr_2d[0, 0] = 99
print("\nArray after setting arr[0,0] to 99:\n", arr_2d)


Array after setting arr[0,0] to 99:
 [[99 11 12 13]
 [20 21 22 23]
 [30 31 32 33]]


### 3.2. Array Slicing
**SLICING WORKS MUCH LIKE STANDARD PYTHON SLICING** (`start:stop:step`), but can be applied to multiple dimensions. The `step` value is also called a **STRIDE**.

In [None]:
arr_1d = np.arange(20)
print("1D Array:", arr_1d)

In [None]:
# Get a slice of elements (from index 2 up to, but not including, index 8)
print("Slice from index 2 to 8:", arr_1d[2:8])

In [None]:
# Slice with a stride of 3 (get every third element)
print("Slice with a stride of 3:", arr_1d[::3])

In [7]:
# Slice a 2D array
arr_2d = np.arange(1, 17).reshape(4, 4)
print("\n2D Array:\n", arr_2d)


2D Array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]


In [8]:
# Get the first two rows and columns 1 through 2
slice_2d = arr_2d[:2, 1:3]
print("\nSlice of 2D array (first 2 rows, cols 1-2):\n", slice_2d)


Slice of 2D array (first 2 rows, cols 1-2):
 [[2 3]
 [6 7]]


In [11]:
# How to get the middle of the square?

arr_2d[1:3, 1:3]  # This will give you the middle 2x2 square of the 3D array

array([[ 6,  7],
       [10, 11]])

### 3.3. Setting Values with Slices, Fill, and Type Coercion

In [12]:
arr = np.arange(10)
print("Original:", arr)

Original: [0 1 2 3 4 5 6 7 8 9]


In [13]:
# Set a slice of the array to a new value
arr[5:] = 100
print("After setting slice:", arr)

After setting slice: [  0   1   2   3   4 100 100 100 100 100]


In [14]:
# Use FILL to set all values to a single value
arr.fill(7)
print("After using .fill(7):", arr)

After using .fill(7): [7 7 7 7 7 7 7 7 7 7]


In [15]:
# BEWARE OF TYPE COERSION: If you assign a value that doesn't match the array's dtype,
# NumPy will silently force it to match, which can lead to loss of data.
int_arr = np.zeros(5, dtype=np.int32)
print(f"\nInteger array of type {int_arr.dtype}: {int_arr}")
int_arr[0] = 3.14159 # Assigning a float to an int array
print(f"After assigning a float: {int_arr} <-- The float was truncated to an integer!" )


Integer array of type int32: [0 0 0 0 0]
After assigning a float: [3 0 0 0 0] <-- The float was truncated to an integer!


### 3.4. Fancy Indexing
Fancy indexing allows selecting elements using arrays of indices instead of single scalars or slices. This is a very powerful feature.

#### Fancy Indexing: By Position
This uses an array of integers to specify the indices of the elements you want to select.

In [16]:
arr = np.arange(100, 110)
print("Original Array:", arr)

Original Array: [100 101 102 103 104 105 106 107 108 109]


In [17]:
# Pass a list or array of indices to select specific elements
indices = [1, 3, 8]
print("\nSelected elements at indices [1, 3, 5]:", arr[indices])


Selected elements at indices [1, 3, 5]: [101 103 108]


In [22]:
# For 2D arrays, you can pass a tuple of index arrays: one for rows, one for columns.
arr_2d = np.arange(12).reshape((3, 4))
print("\n2D Array:\n", arr_2d)



2D Array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [25]:
# Select elements at (row 0, col 1), (row 1, col 2), and (row 2, col 3)
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 3])
print("\nSelected elements at (0,1), (1,2), (2,3):", arr_2d[rows, cols])


Selected elements at (0,1), (1,2), (2,3): [ 1  6 11]


In [27]:
# Select entire rows using fancy indexing
print("\nSelect full rows 0 and 2:", arr_2d[[0, 2]])


Select full rows 0 and 2: [[ 0  1  2  3]
 [ 8  9 10 11]]


In [28]:
# This is equivalent to

print("\nSelect full rows 0 and 2:", arr_2d[[0, 2], :]) # (rows, columns)


Select full rows 0 and 2: [[ 0  1  2  3]
 [ 8  9 10 11]]


In [32]:
# How do you choose columns?

arr_2d[:, [2, 3]] # This will select the second and third columns of the 2D array

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

In [41]:
# Choose row 0 and 2, and from column 1 to the end

arr_2d[[0, 2], 1:]  # This will select rows 0 and 2, and all columns from index 1 to the end

array([[ 1,  2,  3],
       [ 9, 10, 11]])

#### Fancy Indexing: With Booleans
You can use a boolean array (a mask) to select elements from another array where the mask is `True`.

In [43]:
data = np.random.randint(-10, 10, size=(4, 4))
print("Original Data:\n", data)

Original Data:
 [[ 1  6 -2  8]
 [-1  8  8 -9]
 [-8  9  9  1]
 [-1  5  6  5]]


In [44]:
# We can create a boolean condition.
# For example, let's find all numbers greater than 0.
is_positive = data > 0
print("\nBoolean Mask (data > 0):\n", is_positive)


Boolean Mask (data > 0):
 [[ True  True False  True]
 [False  True  True False]
 [False  True  True  True]
 [False  True  True  True]]


In [45]:
# Now we can use this boolean mask to select only the positive numbers.
# The result will be a flattened 1D array.
positive_numbers = data[is_positive]
print("\nSelected Positive Numbers:", positive_numbers)


Selected Positive Numbers: [1 6 8 8 8 9 9 1 5 6 5]


In [None]:
# This can be done in one step.
print("\nNumbers less than -5:", data[data < -5])

In [None]:
# You can also combine conditions using `&` (and) and `|` (or).
# Make sure to use parentheses due to operator precedence.
print("\nNumbers that are even AND positive:", data[(data % 2 == 0) & (data > 0)])

#### Practice Exercises (3)

**Exercise 1 (Slicing):** Create a 6x6 array with numbers from 0 to 35. Select the bottom-right 3x3 sub-array.

**Exercise 2 (Fancy Indexing):** Create a 10x10 array of random integers between 1 and 100. Extract all numbers greater than 90.

**Exercise 3 (Fancy Indexing):** From the array in Exercise 2, extract the elements at positions (1,1), (3,5), and (8,0).

In [None]:
# Solution for Exercise 1
ex1_arr = np.arange(36).reshape(6, 6)
print("Original 6x6 Array:\n", ex1_arr)
bottom_right_3x3 = ex1_arr[3:, 3:]
print("\nSolution 1 (Bottom-Right 3x3):\n", bottom_right_3x3)

# Solution for Exercise 2
ex2_arr = np.random.randint(1, 101, size=(10,10))
print("\nOriginal 10x10 Array:\n", ex2_arr)
high_values = ex2_arr[ex2_arr > 90]
print("\nSolution 2 (Values > 90):", high_values)

# Solution for Exercise 3
rows_to_get = [1, 3, 8]
cols_to_get = [1, 5, 0]
specific_elements = ex2_arr[rows_to_get, cols_to_get]
print("\nSolution 3 (Specific Elements):", specific_elements)

---
## 4. Array Operations and Universal Functions (UFuncs)

### 4.1. Element-wise Array Operations
Standard arithmetic operators are overloaded to work element-wise on arrays. These are examples of Universal Functions (UFuncs).

In [46]:
arr = np.arange(1, 6)
print("Original Array:", arr)

Original Array: [1 2 3 4 5]


In [49]:
# Basic arithmetic with a scalar
print("Original Array:", arr)
print("Array + 5:", arr + 5)
print("Array - 2:", arr - 2)
print("Array * 10:", arr * 10)
print("Array / 2:", arr / 2)
print("Array ** 2 (squared):", arr ** 2)


Original Array: [1 2 3 4 5]
Array + 5: [ 6  7  8  9 10]
Array - 2: [-1  0  1  2  3]
Array * 10: [10 20 30 40 50]
Array / 2: [0.5 1.  1.5 2.  2.5]
Array ** 2 (squared): [ 1  4  9 16 25]


In [50]:
np.full(5, 10)

array([10, 10, 10, 10, 10])

In [None]:
# Arithmetic between two arrays of the same shape
arr2 = np.full(5, 10)
print("\nSecond Array:", arr2)
print("arr + arr2:", arr + arr2)

In [None]:
# what would happen here?

np.arange(0, 5) + np.ones(4)

In [None]:
# Other common ufuncs
print("\nSquare root (np.sqrt):", np.sqrt(arr))
print("Exponential e^x (np.exp):", np.exp(arr))
print("Natural log (np.log):", np.log(arr))

### 4.2. Aggregation & Calculation Methods
These methods perform a computation on an array, often resulting in a single number.

In [52]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Sample Array:\n", arr)

Sample Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [53]:
# Calculation methods on the entire array
print(f"Sum of all elements (sum): {arr.sum()}")
print(f"Product of all elements (prod): {arr.prod()}")
print(f"Min value (min): {arr.min()}")
print(f"Max value (max): {arr.max()}")

Sum of all elements (sum): 45
Product of all elements (prod): 362880
Min value (min): 1
Max value (max): 9


In [54]:
# ARGMIN and ARGMAX find the *index* of the min/max value in the flattened array
print(f"Index of min value (argmin): {arr.argmin()}")
print(f"Index of max value (argmax): {arr.argmax()}")

Index of min value (argmin): 0
Index of max value (argmax): 8


### 4.3. Statistics Array Methods
Key methods for descriptive statistics.

In [None]:
arr_stats = np.arange(1, 11, dtype=float)
arr_stats[5] = 20 # add an outlier
print("Sample Array for stats:\n", arr_stats)

# Statistics for the whole array
print(f"MEAN of all elements: {arr_stats.mean():.2f}")
print(f"MEDIAN (np.median): {np.median(arr_stats):.2f}")
print(f"STANDARD DEV. of all elements: {arr_stats.std():.2f}")
print(f"VARIANCE of all elements: {arr_stats.var():.2f}")

### 4.4. Aggregations Along Axes
You can perform aggregations along a specific axis.
* `axis=0` collapses the rows (operates on each **COLUMN**)
* `axis=1` collapses the columns (operates on each **ROW**)

In [None]:
arr_2d = np.arange(12).reshape(3,4)
print("2D Array:\n", arr_2d)

In [None]:
# Get the sum of each column (axis=0)
print("Sum of each column (axis=0):", arr_2d.sum(axis=0))

In [None]:
# Get the mean of each row (axis=1)
print("Mean of each row (axis=1):", arr_2d.mean(axis=1))

In [None]:
# Get the max of each column (axis=0)
print("Max of each column (axis=0):", arr_2d.max(axis=0))

In [None]:
arr_3d = np.arange(24).reshape(2,4,3) # Depth = 2, Rows = 4, Columns = 3 - Number in "np.arange(24)" and in "reshape" must be equal
print("2D Array:\n", arr_3d)

2D Array:
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]


In [None]:
# what happen here?

# arr_3d.sum(axis=0)
# arr_3d.sum(axis=1)
# arr_3d.sum(axis=2)


In [None]:
# What about 4th and 5th dimension?

In [None]:
arr_4d.sum(axis=0)

### 4.5. Other Useful Methods: `clip` and `round`

In [None]:
arr = np.array([-5.6, 1.7, 8.2, 10.0, 15.1])
print("Original:", arr)

In [None]:
# .clip() limits values in an array. All values below a_min become a_min, and all above a_max become a_max.
# This is useful for preventing extreme values (outliers).
print("\nClipped between 0 and 10:", arr.clip(0, 10))

In [None]:
# .round() rounds elements to the given number of decimals.
print("\nRounded to 0 decimals:", arr.round())
print("Rounded to 1 decimal:", arr.round(decimals=1))

#### Practice Exercises (4)

**Exercise 1:** Create a 3x5 array of random floats. Compute the sum of the entire array and the sum of each individual column.
**Exercise 2:** For the array from Exercise 1, find the minimum and maximum value for each row.
**Exercise 3:** You have an array of prices: `[20.50, 22.10, 19.80, 50.00, 15.25]`. Use `clip` to ensure no price is lower than $20 and no price is higher than $45.

In [None]:
# Solution for Exercise 1
ex1_arr = np.random.randn(3, 5)
print("Original 3x5 Array:\n", ex1_arr)
print(f"\nSum of entire array: {ex1_arr.sum():.2f}")
print(f"Sum of each column: {ex1_arr.sum(axis=0)}")

# Solution for Exercise 2
print("\nMin of each row:", ex1_arr.min(axis=1))
print("Max of each row:", ex1_arr.max(axis=1))

# Solution for Exercise 3
prices = np.array([20.50, 22.10, 19.80, 50.00, 15.25])
clipped_prices = prices.clip(20, 45)
print("\nOriginal Prices:", prices)
print("Clipped Prices:", clipped_prices)

---
## 5. Array Manipulation and Reshaping

### 5.1. Changing an Array's Shape (`reshape`, `transpose`, `T`)

In [None]:
arr = np.arange(12)
print("Original Array:", arr)

In [None]:
# `reshape()` returns a new array with a new shape. The new shape must have the same size.
reshaped_arr = arr.reshape(3, 4)
print("\nReshaped to 3x4:\n", reshaped_arr)


In [None]:
# Transposing an array with `.transpose()` swaps its axes. For a 2D array, rows become columns.
transposed_arr = reshaped_arr.transpose()
print("\nTransposed 4x3 array (using .transpose()):\n", transposed_arr)


In [None]:
# The `.T` attribute is a convenient shortcut for `.transpose()`.
print("\nTransposed 4x3 array (using .T):\n", reshaped_arr.T)

### 5.2. Flattening Arrays (`flatten`, `ravel`)
Both methods convert a multi-dimensional array to a 1D array. The key difference is memory management:
- `flatten()`: Always returns a new **copy** of the array.
- `ravel()`: Returns a **view** of the original array whenever possible. This is more memory-efficient.

In [None]:
arr = np.arange(6).reshape(2, 3)
print("Original array:\n", arr)

In [None]:
# Flatten returns a copy, so the original is safe
arr = np.arange(12).reshape(2, 3, 2) # Recreate the original
print(arr)

In [None]:
flattened_arr = arr.flatten()
print(flattened_arr)

In [None]:
# Ravel returns a view, so modifying it affects the original
raveled_arr = arr.ravel()
print(raveled_arr)

### 5.3. Resizing and Squeezing (`resize`, `squeeze`)

In [None]:
# `a.resize(new_shape)` modifies the array IN-PLACE and can change the total size
# Note: If other variables refer to the array, it will raise an error.
arr_to_resize = np.arange(4)
print("Original:", arr_to_resize)
arr_to_resize.resize((3, 2)) # Size changes from 4 to 6. New elements are zero-filled.
print("Resized in-place to (3,2):\n", arr_to_resize)

In [None]:
# `a.squeeze()` removes single-dimensional entries (axes of length 1) from the shape of an array
arr_3d = np.array([[[1],[2],[3]]])
print(f"\nArray with shape {arr_3d.shape}")
squeezed_arr = arr_3d.squeeze()
print(f"Squeezed array has shape {squeezed_arr.shape}:")
print(squeezed_arr)

In [None]:
arr_2d = np.array([[1, 2, 3]]) # what is te difference between 2D and 1D?
print(f"\nArray with shape {arr_2d.shape}")

In [None]:
squeezed_arr = arr_3d.squeeze()
print(f"Squeezed array has shape {squeezed_arr.shape}:")
print(squeezed_arr)

### 5.4. Stacking and Splitting Arrays

In [None]:
arr1 = np.arange(1, 5).reshape(2, 2)
arr2 = np.arange(5, 9).reshape(2, 2)
print(arr1)
print(arr2)

In [None]:
# `np.vstack()` (vertical stack) stacks arrays row-wise
v_stacked = np.vstack((arr1, arr2))
print("Vertically stacked:\n", v_stacked)

In [None]:
# `np.hstack()` (horizontal stack) stacks arrays column-wise
h_stacked = np.hstack((arr1, arr2))
print("\nHorizontally stacked:\n", h_stacked)

In [None]:
# `np.concatenate()` is a more general function
concatenated_v = np.concatenate([arr1, arr2], axis=0) # Same as vstack
concatenated_h = np.concatenate([arr1, arr2], axis=1) # Same as hstack
print("\nConcatenated (axis=0):\n", concatenated_v)
print("\nConcatenated (axis=1):\n", concatenated_h)


In [None]:
# How to stack arrays from different dimensions?

arr1 = np.arange(1, 17).reshape(4, 4)
arr2 = np.arange(5, 9).reshape(2, 2)

In [None]:
arr1 = np.arange(1, 5).reshape(2, 2)
arr2 = np.arange(5, 9).reshape(2, 2)

concatenated_h = np.concatenate([arr1, arr2], axis=2) # what sould happen here?

In [None]:
# Join a sequence of arrays along a new axis

np.stack([arr1, arr2], axis=0).shape

In [None]:
# Now more confusing

arr1 = np.arange(1, 17).reshape(4, 4)
arr2 = np.arange(17, 33).reshape(4, 4)
print(arr1)
print(arr2)

In [None]:
np.stack([arr1, arr2], axis=2)

In [None]:
# Splitting arrays
big_arr = np.arange(16).reshape(4, 4)
print("\nBig Array for splitting:\n", big_arr)

In [None]:
# `np.hsplit()` splits an array into several smaller ones horizontally
h_split = np.hsplit(big_arr, 2)
print("\nHorizontally split (into 2):\n", h_split)

In [None]:
# `np.vsplit()` splits an array vertically
v_split = np.vsplit(big_arr, 4)
print("\nVertically split (into 4):\n", v_split)

---
## 6. Sorting, Searching, and Set Logic

### 6.1. Sorting (`sort`, `argsort`)

In [None]:
arr = np.array([[3, 2, 4], [1, 5, 0]])
print("Original Array:\n", arr)

In [None]:
# `np.sort()` returns a new, sorted COPY of an array
sorted_copy = np.sort(arr, axis=1) # Sort each row
print("\nSorted copy (original is unchanged):\n", sorted_copy)

In [None]:
sorted_copy = np.sort(arr, axis=0) # Sort each column
print("\nSorted copy (original is unchanged):\n", sorted_copy)

In [None]:
# `a.sort()` sorts an array IN-PLACE (it modifies the original)
arr.sort(axis=1)
print("\nArray sorted along rows (in-place):\n", arr)

### 6.2. Searching and Iterating (`nonzero`, `where`)

In [None]:
arr = np.array([[3, 0, 3], [1, 1, 0], [3, 2, 0]])
print("Array:\n", arr)

In [None]:
# `a.nonzero()`: Returns the indices of the non-zero elements.
non_zero_indices = arr.nonzero()
print("\nIndices of non-zero elements (rows, then cols):", non_zero_indices)

In [None]:
# `np.where()`: A powerful conditional function. `np.where(condition, value_if_true, value_if_false)`
arr_with_neg = np.array([-1, 5, -10, 8, -3])
replaced_arr = np.where(arr_with_neg < 0, 0, arr_with_neg)
print("\nUsing where to replace negative values with 0:", replaced_arr)


In [None]:
# `np.where()`: return only indices
arr_with_neg = np.array([-1, 5, -10, 8, -3])
replaced_arr = np.where(arr_with_neg < 0)
print("\nUsing where to replace negative values with 0:", replaced_arr)


In [None]:
# `np.where()`: in matrix
arr_with_neg = np.array([[-1, 5, -10, 8], [-1, -1, 7, -3], [-1, -5, -10, -3], [1, 51, 0, -8]])
replaced_arr = np.where(arr_with_neg < 0)
print("\nUsing where to replace negative values with 0:", replaced_arr)


In [None]:
# How to use replaced_arr to get the values?

---
## 7. Broadcasting
Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. It's a powerful mechanism that allows for concise code without making explicit copies of data.

**The Broadcasting Rule:**
In order to broadcast, the size of each trailing dimension (i.e., starting from the end) in the arrays must be equal or one of them must be 1. NumPy will then stretch the dimension with size 1 to match the other.

In [None]:
# Example 1: Array and a Scalar
# The scalar is broadcast to the shape of the array.
arr = np.arange(5)
result = arr + 100
print("Array 'arr' (shape 5,):", arr)
print("Scalar 100 is broadcast to shape (5,) and added element-wise.")
print("Result:", result)

In [None]:
# Example 2: 2D array and 1D array
arr_2d = np.arange(12).reshape(4, 3)
arr_1d = np.array([10, 20, 30])
print("\n2D Array (shape 4,3):\n", arr_2d)
print("1D Array (shape 3,):", arr_1d)

# Rule Check: arr_2d.shape -> (4, 3) | arr_1d.shape -> (3,)
# The trailing dimensions match (3 and 3). Broadcasting is possible.
result = arr_2d + arr_1d
print("\nResult of broadcasting arr_1d across rows of arr_2d:\n", result)


In [None]:
# Example 3: Broadcasting a column vector
# To broadcast down columns, we must reshape the 1D array to a column vector (shape (N,1))
col_vec = np.array([[100],[200],[300],[400]]) # Shape (4,1)
print("\nColumn vector (shape 4,1):\n", col_vec)

# Rule Check: arr_2d.shape -> (4, 3) | col_vec.shape -> (4, 1)
# Trailing dimensions: 3 vs 1 (one is 1, ok). Next dimensions: 4 vs 4 (equal, ok).
result_col = arr_2d + col_vec
print("\nResult of broadcasting column vector across columns of arr_2d:\n", result_col)

In [None]:
mat = np.arange(24).reshape(2, 3, 4)
mat

In [None]:
print(mat.shape)

In [None]:
# np.array([1, 2, 3, 4])
# np.array([1, 2])
# np.array([[1], [2]])
# np.array([[[1]], [[2]]])

---
## 8. Random Number Generation
Random numbers are critical for simulations, sampling, and initializing machine learning models.

### 8.1. Basic Random Functions

In [None]:
# `np.random.rand()`: Random values in a given shape from a uniform distribution over [0, 1).
print("Uniform distribution [0,1) in a 2x3 array:\n", np.random.rand(2, 3))

# `np.random.randn()`: Random values from a standard normal distribution (mean=0, variance=1).
print("\nStandard Normal distribution in a 2x3 array:\n", np.random.randn(2, 3))

# `np.random.randint()`: Random integers from low (inclusive) to high (exclusive).
print("\nRandom integers between 10 and 20, size=5:", np.random.randint(10, 20, size=5))


### 8.2. Seeding for Reproducibility
In any scientific or ML application, it's crucial that your results are reproducible. Setting a 'seed' ensures that the sequence of 'random' numbers generated is always the same.

In [None]:
# Set a seed for the random number generator
np.random.seed(42)
print("With seed=42, first call:", np.random.rand(4))
print("With seed=42, second call:", np.random.rand(4))

# Resetting the seed will reproduce the exact same sequence
np.random.seed(42)
print("\nAfter resetting seed to 42, first call is identical:", np.random.rand(4))

### 8.3. Shuffling and Sampling

In [None]:
arr = np.arange(10)
print("Original array:", arr)

# `np.random.shuffle()`: Modifies a sequence in-place by shuffling its contents.
np.random.shuffle(arr)
print("Shuffled array (in-place):", arr)

# `np.random.permutation()`: Randomly permutes a sequence, or returns a permuted range. Returns a copy.
arr = np.arange(10)
permuted_arr = np.random.permutation(arr)
print("\nPermuted array (copy):", permuted_arr)
print("Original is unchanged:", arr)

# `np.random.choice()`: Generates a random sample from a given 1-D array
choices = np.random.choice(arr, size=5, replace=False) # Sample 5 unique elements
print("\nRandom choice of 5 unique elements:", choices)

---
## 9. Introduction to Linear Algebra (`np.linalg`)
NumPy's `linalg` module provides essential linear algebra functionality.

In [None]:
A = np.array([[1, 2,], [3, 4]])
B = np.array([[5, 6], [7, 8]])
v = np.array([9, 10])
print("Matrix A:\n", A)
print("\nMatrix B:\n", B)
print("\nVector v:", v)

In [None]:
# Dot Product of two vectors
print("\nDot product of A[0] and v:", np.dot(A[0], v))

In [None]:
# Matrix-Vector dot product
print("\nMatrix-vector product (A dot v):\n", A.dot(v))

In [None]:
# why?

v.dot(A)

In [None]:
# Matrix-Matrix dot product
print("\nMatrix-vector product (A dot v):\n", A.dot(B))

In [None]:
# Matrix-Matrix Multiplication
# The `@` operator is the recommended way to do matrix multiplication since Python 3.5
print("\nMatrix-matrix product (A @ B):\n", A @ B)

In [None]:
# Home exercise: Convicne yourself that A @ B != B @ A

In [None]:
# Diagonal of a matrix
print("\nDiagonal of A:", np.diag(A))

---
## 10. Saving and Loading Arrays
NumPy makes it easy to save your arrays to disk for later use. The standard `.npy` format is a fast and efficient way to store a single array.

In [None]:
arr_to_save = np.arange(20).reshape(5, 4)
print("Array to save:\n", arr_to_save)

In [None]:
# Method 1: `np.save` and `np.load` (Recommended)
# Saves the array in a binary file in NumPy .npy format.
np.save('my_saved_array.npy', arr_to_save)


In [None]:
# Now, load that array from the file
loaded_arr = np.load('my_saved_array.npy')
print("\nLoaded with np.load:\n", loaded_arr)


In [None]:


# Method 2: `a.dump()` and `np.load()`
# a.dump() pickles the object to a file. It can be read with np.load()
arr_to_save.dump('my_dump.pkl')
loaded_dump = np.load('my_dump.pkl', allow_pickle=True)
print("\nLoaded from dump file:\n", loaded_dump)

---
# Conclusion

This concludes our deep dive into NumPy. You have learned about creating arrays, inspecting their attributes, performing a vast array of mathematical and logical operations, manipulating their shape, and using advanced features like broadcasting, random number generation, and linear algebra. These skills are not just foundational; they are a prerequisite for virtually all advanced data analysis and machine learning in Python.

**Next up: Data Analysis with Pandas!** Pandas uses NumPy arrays as its backbone and provides the `DataFrame` object, which will be our primary tool for exploring and cleaning tabular data.