# A Brief Introduction to NumPy

### 1. Using Packages in Python

There are three common ways to import packages in Python, each with its own advantages and disadvantages.

**A. Importing the Full Package**

You can import the entire package using `import numpy`. This makes all of NumPy's functions available, but you must prefix every function with `numpy.` (and a dot).

This method is very clear because it's obvious where each function is coming from. However, typing `numpy.` every time can be tedious...

```python
import numpy

arr = numpy.array([1, 2, 3])
arr_sum = numpy.sum(arr)
```

**B. Importing Specific Functions**

If you only need one or two functions, you can import them directly using the `from ... import ...` syntax. This adds the function names directly to your environment, so you don't need any prefix.

But this method has a downside. While this is shorter, it can cause "namespace pollution". If you import a function named `sum` from NumPy, it will overwrite Python's built-in `sum()` function. This can be confusing...

```python
from numpy import array, sum

arr = array([4, 5, 6])
arr_sum = sum(arr)
```

**C. Importing with an Alias**

This is the most popular and recommended method for NumPy. We use the `as` keyword to give the package a short "nickname" (an alias).

The standard, community-agreed-upon alias for NumPy is `np`. This gives us the best of both worlds:

1.  It's short (we type `np.` instead of `numpy.`).
2.  It's clear (we still use a prefix, so `np.sum()` is clearly from NumPy).

```python
import numpy as np

arr = np.array([10, 20, 30])
arr_sum = np.sum(arr)
```

In [None]:
import numpy as np

### Why NumPy? Python Lists vs. NumPy Arrays

Python is a dynamically-typed language. This means a single Python list can hold different types of data (e.g., integers, strings, etc.). This flexibility is powerful, but it comes at a cost: each item in the list is a complex C structure that includes its value, its type, and a reference count.

NumPy introduces a fixed-type array. This "loses" flexibility but gains massive efficiency for numerical data, as NumPy can store the data as a continuous block of memory.

In [None]:
# Create an array from a list
arr = np.array([1, 4, 2, 5, 3])
print(f"Array from list: {arr}")

In [None]:
# NumPy will "upcast" types if necessary
arr = np.array([3.14, 4, 2, 3])
print(f"Array with upcasting (int -> float): {arr}")


In [None]:
# You can also specify the data type (dtype) explicitly
arr = np.array([1, 2, 3, 4], dtype='float')
print(f"Array with specified dtype: {arr}")

Array with specified dtype: [1. 2. 3. 4.]


In [None]:
# Create an array of zeros
zeros = np.zeros(10, dtype=int)
print(f"Array of 10 zeros: {zeros}")

In [None]:
# Create a 3x5 array of ones
ones = np.ones((3, 5), dtype=float)
print(f"3x5 array of ones:\n{ones}")

In [None]:
# Create an array of 5 values evenly spaced between 0 and 1
linspace = np.linspace(0, 1, 5)
print(f"Evenly spaced values: {linspace}")


In [None]:
# Create a 3x5 array of random integers between 0 and 10
random_ints = np.random.randint(0, 10, size=(3, 5))
print(f"3x5 random integer array:\n{random_ints}")

Array of 10 zeros: [0 0 0 0 0 0 0 0 0 0]
3x5 array of ones:
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
Evenly spaced values: [0.   0.25 0.5  0.75 1.  ]
3x5 random integer array:
[[1 9 3 6 0]
 [5 7 8 5 4]
 [6 8 5 6 8]]
3x5 random integer array:
[[1 9 3 6 0]
 [5 7 8 5 4]
 [6 8 5 6 8]]


### ndim, shape, size, and Reshaping

NumPy arrays have attributes that describe their dimensions and size.

ndim: The number of dimensions.

shape: The size of each dimension.

size: The total number of elements in the array.

You can also reshape an array, which is very useful.

In [7]:
# Create a 3-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))

print(f"x3 ndim: {x3.ndim}")
print(f"x3 shape: {x3.shape}")
print(f"x3 size: {x3.size}")

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60


In [8]:
# Reshaping is common
# Create a 1D array of 9 numbers
arr = np.arange(9)
print(f"Original 1D array: {arr}")

# Reshape it into a 3x3 array
grid = arr.reshape((3, 3))
print(f"Reshaped 3x3 array:\n{grid}")

Original 1D array: [0 1 2 3 4 5 6 7 8]
Reshaped 3x3 array:
[[0 1 2]
 [3 4 5]
 [6 7 8]]


### Array Indexing and Slicing
Accessing elements is similar to Python lists, but you can use comma-separated indices for multiple dimensions.

CRITICAL NOTE: Unlike Python lists, NumPy array slices are views, not copies. Modifying a slice will modify the original array.

להביא דוגמא מהשיעור

In [9]:
# 1-dimensional array
x1 = np.arange(10)
print(f"x1: {x1}")

# Access one element
print(f"Element 7: {x1[7]}")

# Slice from index 4 to 7
print(f"Slice [4:7]: {x1[4:7]}")

# Slice from the start to index 5
print(f"Slice [:5]: {x1[:5]}")

# Slice from index 5 to the end
print(f"Slice [5:]: {x1[5:]}")

x1: [0 1 2 3 4 5 6 7 8 9]
Element 7: 7
Slice [4:7]: [4 5 6]
Slice [:5]: [0 1 2 3 4]
Slice [5:]: [5 6 7 8 9]


In [10]:
# 2-dimensional array
numbers = np.arange(12).reshape(3, 4)
print(f"2D array:\n{numbers}")

# Get a single element (row 2, column 2)
print(f"Element [2, 2]: {numbers[2, 2]}")

# Get the first column (all rows, column 0)
print(f"First column: {numbers[:, 0]}")

# Get the second row (row 1, all columns)
print(f"Second row: {numbers[1, :]}")

2D array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Element [2, 2]: 10
First column: [0 4 8]
Second row: [4 5 6 7]


In [11]:
# Proof that slices are VIEWS
print(f"Original array: {x1}")

# Create a slice
x1_slice = x1[5:8]
print(f"Slice: {x1_slice}")

# Modify the slice
x1_slice[0] = 99
print(f"Modified slice: {x1_slice}")

# The original array is changed!
print(f"Original array after slice modification: {x1}")

Original array: [0 1 2 3 4 5 6 7 8 9]
Slice: [5 6 7]
Modified slice: [99  6  7]
Original array after slice modification: [ 0  1  2  3  4 99  6  7  8  9]


### Array Concatenation

You can combine arrays using np.concatenate. 

In [None]:
# Concatenate 1D arrays
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.concatenate([x, y])
print(f"Concatenated 1D: {z}")

In [None]:
# Concatenate 2D arrays (grid)
grid = np.array([[1, 2, 3], [4, 5, 6]])
print(f"\nOriginal grid:\n{grid}")

In [None]:
# Concatenate along the first axis (rows, axis=0)
grid_ax0 = np.concatenate([grid, grid], axis=0)
print(f"\nConcatenated axis=0:\n{grid_ax0}")

In [None]:
# Concatenate along the second axis (columns, axis=1)
grid_ax1 = np.concatenate([grid, grid], axis=1)
print(f"\nConcatenated axis=1:\n{grid_ax1}")

Concatenated 1D: [1 2 3 3 2 1]

Original grid:
[[1 2 3]
 [4 5 6]]

Concatenated axis=0:
[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]

Concatenated axis=1:
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


### Array Splitting

You can also split them using np.split.

In [13]:
# Splitting an array
x = [1, 2, 3, 99, 99, 3, 2, 1]
# Split at indices 3 and 5
x1, x2, x3 = np.split(x, [3, 5])

print(f"Split part 1: {x1}")
print(f"Split part 2: {x2}")
print(f"Split part 3: {x3}")

Split part 1: [1 2 3]
Split part 2: [99 99]
Split part 3: [3 2 1]


### Universal Functions (ufuncs)

The key to NumPy's speed is vectorized operations, implemented via "ufuncs".

Instead of writing a slow Python loop, you can apply operations element-by-element on the entire array at once. This is much faster because the loop happens in C, not Python .

In [None]:
# This is a slow Python loop
def multiply(numbers, factor):
    output = list()
    for val in numbers:
        output.append(val * factor)
    return output

values_list = list(range(10))
print(f"Slow Python loop: {multiply(values_list, 2)}")

# This is the fast NumPy way (using a ufunc)
values_arr = np.arange(10)
print(f"Fast NumPy ufunc: {values_arr * 2}")

Slow Python loop: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Fast NumPy ufunc: [ 0  2  4  6  8 10 12 14 16 18]


In [15]:
# Ufuncs work between two arrays as well
arr1 = np.arange(5)
arr2 = np.arange(1, 6)

print(f"{arr1} + {arr2} = {arr1 + arr2}")
print(f"{arr1} / {arr2} = {arr1 / arr2}")

# They also work on multi-dimensional arrays
x_2d = np.arange(9).reshape((3, 3))
print(f"\n2D array:\n{x_2d}")
print(f"\n2 to the power of x_2d:\n{2 ** x_2d}")

[0 1 2 3 4] + [1 2 3 4 5] = [1 3 5 7 9]
[0 1 2 3 4] / [1 2 3 4 5] = [0.         0.5        0.66666667 0.75       0.8       ]

2D array:
[[0 1 2]
 [3 4 5]
 [6 7 8]]

2 to the power of x_2d:
[[  1   2   4]
 [  8  16  32]
 [ 64 128 256]]


### Aggregations (Sum, Min, Max, etc.)

Aggregations are functions that summarize the values in an array. A key concept here is the axis keyword.

axis=0: Collapse the columns (e.g., find the min of each column).

axis=1: Collapse the rows (e.g., find the min of each row).

In [None]:
rand_2d = np.random.random((2, 3)).round(2)
print(f"2x3 random array:\n{rand_2d}")

print(f"Sum of all elements: {np.sum(rand_2d)}")
print(f"Min of each column: {np.min(rand_2d, axis=0)}")
print(f"Max of each row: {np.max(rand_2d, axis=1)}")

2x3 random array:
[[0.8  0.52 0.27]
 [0.63 0.55 0.33]]
Sum of all elements: 3.1000000000000005
Min of each column: [0.63 0.52 0.27]
Max of each row: [0.8  0.63]


### Broadcasting

Broadcasting is a set of rules for applying binary ufuncs (like addition) on arrays of different sizes. NumPy "stretches" the smaller array to match the larger one, if the dimensions are compatible.

In [None]:
# Array and a scalar
a = np.array([0, 1, 2])
print(f"a: {a}")
print(f"a + 5: {a + 5}")  # 5 is "broadcast" to [5, 5, 5]

a: [0 1 2]
a + 5: [5 6 7]


In [None]:
# 2D array and a 1D array
ones = np.ones((3, 3))
a = np.array([0, 1, 2])
print(f"3x3 ones:\n{ones}")
print(f"\n1x3 array 'a': {a}")

# 'a' is broadcast across every row of 'ones'
print(f"\nones + a:\n{ones + a}")

3x3 ones:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

1x3 array 'a': [0 1 2]

ones + a:
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


In [None]:
# Two 1D arrays (one row, one column)
a = np.arange(3)  # shape (3,) -> treated as 1x3
b = np.arange(3).reshape((3, 1))  # shape (3, 1)

print(f"a (shape {a.shape}): {a}")
print(f"b (shape {b.shape}):\n{b}")

# 'a' is broadcast down 3 rows
# 'b' is broadcast across 3 columns
print(f"\na + b:\n{a + b}")

a (shape (3,)): [0 1 2]
b (shape (3, 1)):
[[0]
 [1]
 [2]]

a + b:
[[0 1 2]
 [1 2 3]
 [2 3 4]]


### Comparison Operators and Fancy Indexing

Comparison operators (like < or >) are also ufuncs. They return a boolean array.

You can use these boolean arrays to "mask" your data and select only the values you care about. This is one form of "fancy indexing".

Another form of fancy indexing is passing an array of indices to access elements.

In [None]:
x = np.array([1, 2, 3, 4, 5])

# Comparison ufunc
print(f"x < 3: {x < 3}")

print(f"Sum of (x < 3): {np.sum(x < 3)}")
print(f"Any > 8? {np.any(x > 8)}")
print(f"All < 8? {np.all(x < 8)}")

x < 3: [ True  True False False False]
Sum of (x < 3): 2
Any > 8? False
All < 8? True


In [21]:
# Boolean mask indexing
rand = np.random.randint(100, size=10)
print(f"Random array: {rand}")

# Select only the values less than 50
print(f"Values < 50: {rand[rand < 50]}")

Random array: [78 68 36 14 69 59 68 48 89 19]
Values < 50: [36 14 48 19]


In [22]:
# Index array ("fancy indexing")
print(f"Random array: {rand}")

# Select elements at index 3, 7, and 4
indices = [3, 7, 4]
print(f"Elements at [3, 7, 4]: {rand[indices]}")

Random array: [78 68 36 14 69 59 68 48 89 19]
Elements at [3, 7, 4]: [14 48 69]
