# A Brief Introduction to NumPy

### 1. Using Packages in Python

There are three common ways to import packages in Python, each with its own advantages and disadvantages.

**A. Importing the Full Package**

You can import the entire package using `import numpy`. This makes all of NumPy's functions available, but you must prefix every function with `numpy.` (with a dot).

This method is very clear because if you work with many packages simultaneously, it’s obvious where each function comes from. However, typing the name of the package every time can be tedious...

```python
import numpy

arr = numpy.array([1, 2, 3])
arr_sum = numpy.sum(arr)
```

**B. Importing Specific Functions**

If you only need one or two functions, you can import them directly using the `from ... import ...` syntax. This adds the function names directly to your environment, so you don't need any prefix.

But this method has a downside. While this is shorter, it can cause "namespace pollution". If you import a function named `sum` from NumPy, it will overwrite Python's built-in `sum()` function. This can be confusing...

```python
from numpy import array, sum

arr = array([4, 5, 6])
arr_sum = sum(arr)
```

**C. Importing with an Alias**

This is the most popular and recommended method for NumPy. We use the `as` keyword to give the package a short "nickname" (an alias). The standard, community-agreed-upon alias for NumPy is `np`. 

This gives us the best of both worlds:

1. It's short (we type `np.` instead of `numpy.`).
2. It's clear (we still use a prefix, so we know it's from NumPy).
3. It avoids namespace pollution (we don't overwrite built-in functions).

```python
import numpy as np

arr = np.array([10, 20, 30])
arr_sum = np.sum(arr)
```

In [4]:
import numpy as np

### 2. Why NumPy? Python Lists vs. NumPy Arrays

Python is a dynamically-typed language. This means a single Python list can hold different types of data (e.g., integers, strings, etc.). This flexibility is powerful, but it comes at a cost: each item in the list is a complex C structure that includes its value, its type, and a reference count.

NumPy introduces a fixed-type array. This "loses" flexibility but gains **massive efficiency** for numerical data, as NumPy can store the data as a continuous block of memory.

Here's a quick comparison of performance between Python lists and NumPy arrays for a common operation: squaring each element and summing the results.

> I use the `%timeit` command available in Jupyter notebooks to measure execution time. pay attention to the time units. Some results in `ms` (milliseconds), `µs` (microseconds), or `ns` (nanoseconds). Remember that a microsecond (`µs`) is 1,000 times faster than a millisecond (`ms`), and a nanosecond (`ns`) is 1,000 times faster than a microsecond (`µs`).

In [5]:
python_list = list(range(10,000))
%timeit sum(x*x for x in python_list)

181 ns ± 3.56 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [6]:
numpy_array = np.arange(10,000)
%timeit np.sum(numpy_array * numpy_array)

2.1 μs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


### 3. Creating NumPy Arrays from Lists

NumPy arrays can be created from Python lists using `np.array()`.

This is one of the most common ways to create arrays when you have existing data. The array will automatically infer the data type from the list elements

In [7]:
# Create an array from a list
arr = np.array([1, 2, 3, 4])
print("Array from list:", arr)

Array from list: [1 4 2 5 3]


In [12]:
# You can also specify the data type (dtype) explicitly
arr = np.array([1, 2, 3, 4], dtype='float')
print("Array with specified dtype:", arr)

Array with specified dtype: [1. 2. 3. 4.]


In [13]:
# Create an array of zeros (default dtype is float)
zeros = np.zeros(10) 
print("Array of 10 zeros:", zeros) 

Array of 10 zeros: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [14]:
# Create a 3x5 array of ones
ones = np.ones((3, 5), dtype=int)
print("3x5 array of ones:\n", ones)

3x5 array of ones:
 [[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


In [None]:
# Create an array of 5 values evenly spaced between 0 and 1
linspace = np.linspace(0, 1, 5)
print("Evenly spaced values:", linspace)


Evenly spaced values: [0.   0.25 0.5  0.75 1.  ]


In [None]:
# Create a 3x5 array of random integers between 0 and 10
random_ints = np.random.randint(0, 10, size=(3, 5))
print("3x5 random integer array:\n", random_ints)

3x5 random integer array:
[[3 4 4 0 2]
 [6 2 6 6 6]
 [1 6 7 2 1]]


### 4. ndim, shape and size

NumPy arrays have attributes that describe their dimensions and size.

`ndim`: The number of dimensions.

`shape`: The size of each dimension.

`size`: The total number of elements in the array.

In [17]:
x1 = np.array([1, 2, 3, 4])

print(f"x1 ndim: {x1.ndim}")
print(f"x1 shape: {x1.shape}")
print(f"x1 size: {x1.size}")

x1 ndim: 1
x1 shape: (4,)
x1 size: 4


In [18]:
x2 = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])

print(f"x2 ndim: {x2.ndim}")
print(f"x2 shape: {x2.shape}")
print(f"x2 size: {x2.size}")

x2 ndim: 2
x2 shape: (4, 2)
x2 size: 8


In [None]:
# Create a 3-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))

print(f"x3 ndim: {x3.ndim}")
print(f"x3 shape: {x3.shape}")
print(f"x3 size: {x3.size}")

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60


### 5. Array Indexing and Slicing
Accessing elements is similar to Python lists, but you can use comma-separated indices for multiple dimensions.

You can also slice the array, which is very useful.

> CRITICAL NOTE: Unlike Python lists, NumPy array slices are views, not copies. Modifying a slice will modify the original array.

In [None]:
# 1-dimensional array
x1 = np.arange(10)
print(f"x1: {x1}")

# Access one element
print(f"Element 7: {x1[7]}")

# Slice from index 4 to 7
print(f"Slice [4:7]: {x1[4:7]}")

# Slice from the start to index 5
print(f"Slice [:5]: {x1[:5]}")

# Slice from index 5 to the end
print(f"Slice [5:]: {x1[5:]}")

x1: [0 1 2 3 4 5 6 7 8 9]
Element 7: 7
Slice [4:7]: [4 5 6]
Slice [:5]: [0 1 2 3 4]
Slice [5:]: [5 6 7 8 9]


In [None]:
# 2-dimensional array
numbers = np.arange(12).reshape(3, 4)
print(f"2D array:\n{numbers}")

# Get a single element (row 2, column 2)
print(f"Element [2, 2]: {numbers[2, 2]}")

# Get the first column (all rows, column 0)
print(f"First column: {numbers[:, 0]}")

# Get the second row (row 1, all columns)
print(f"Second row: {numbers[1, :]}")

2D array:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Element [2, 2]: 10
First column: [0 4 8]
Second row: [4 5 6 7]


In [None]:
# Proof that slices are VIEWS
print(f"Original array: {x1}")

# Create a slice
x1_slice = x1[5:8]
print(f"Slice: {x1_slice}")

# Modify the slice
x1_slice[0] = 99
print(f"Modified slice: {x1_slice}")

# The original array is changed!
print(f"Original array after slice modification: {x1}")

Original array: [0 1 2 3 4 5 6 7 8 9]
Slice: [5 6 7]
Modified slice: [99  6  7]
Original array after slice modification: [ 0  1  2  3  4 99  6  7  8  9]


### 6. Reshaping Arrays

NumPy arrays can be reshaped. This is useful when you want to change the dimensions of an array without changing its data.

In [None]:
# Create a 1D array of 9 numbers
arr = np.arange(9)  # arange is like range in python
print(f"Original 1D array: {arr}")

# Reshape it into a 3x3 array
grid = arr.reshape((3, 3))
print(f"Reshaped 3x3 array:\n{grid}")

Original 1D array: [0 1 2 3 4 5 6 7 8]
Reshaped 3x3 array:
[[0 1 2]
 [3 4 5]
 [6 7 8]]


### 7. Array Concatenation

You can combine arrays using `np.concatenate`. 

In [None]:
# Concatenate 1D arrays
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.concatenate([x, y])
print(f"Concatenated 1D: {z}")

Concatenated 1D: [1 2 3 3 2 1]


In [19]:
# Concatenate 2D arrays (grid)
grid = np.array([[1, 2, 3], [4, 5, 6]])
print(f"\nOriginal grid:\n{grid}")

# Concatenate along the first axis (rows, axis=0)
grid_ax0 = np.concatenate([grid, grid], axis=0)
print(f"\nConcatenated axis=0:\n{grid_ax0}")

# Concatenate along the second axis (columns, axis=1)
grid_ax1 = np.concatenate([grid, grid], axis=1)
print(f"\nConcatenated axis=1:\n{grid_ax1}")


Original grid:
[[1 2 3]
 [4 5 6]]

Concatenated axis=0:
[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]

Concatenated axis=1:
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]]


> Pay Attention: In python, you can concatenat list with `+` operator, but in NumPy, you should use `np.concatenate`, cause `+` operator performs element-wise addition...

In [22]:
print ([1,2,3]+[4,5,6])
print (np.array([1,2,3])+np.array([4,5,6]))

[1, 2, 3, 4, 5, 6]
[5 7 9]


### 8. Array Splitting

You can also split an array using `np.split`.

In [None]:
# Splitting an array
x = [1, 2, 3, 99, 99, 3, 2, 1]
# Split at indices 3 and 5
x1, x2, x3 = np.split(x, [3, 5])

print(f"Split part 1: {x1}")
print(f"Split part 2: {x2}")
print(f"Split part 3: {x3}")

Split part 1: [1 2 3]
Split part 2: [99 99]
Split part 3: [3 2 1]


### 9. Universal Functions (ufuncs)

Instead of writing a slow Python loop, you can apply operations element-by-element on the entire array at once. This is much faster because the loop happens in C, not Python .

In [25]:
# This is a slow Python loop
def multiply(numbers, factor):
    output = list()
    for val in numbers:
        output.append(val * factor)
    return output

values_list = list(range(100))

%timeit multiply(values_list, 2)

3.74 μs ± 39.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [26]:
# This is the fast NumPy way (using a ufunc)
values_arr = np.arange(100)

%timeit values_arr * 2

777 ns ± 2.32 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


You can apply arithmetic operations to two arrays of the same shape without writing loops. Ufuncs also extend naturally to multi-dimensional arrays, allowing efficient and expressive mathematical computations.

Here's an example:

In [30]:
# Ufuncs work between two arrays as well
arr1 = np.array([12,10,98])
arr2 = np.array([2,5,7])

print(f"{arr1} + {arr2} = {arr1 + arr2}")
print(f"{arr1} / {arr2} = {arr1 / arr2}")

# They also work on multi-dimensional arrays
x_2d = np.arange(9).reshape((3, 3))
print(f"\n2D array:\n{x_2d}")
print(f"\n2 to the power of x_2d:\n{2 ** x_2d}")

[12 10 98] + [2 5 7] = [ 14  15 105]
[12 10 98] / [2 5 7] = [ 6.  2. 14.]

2D array:
[[0 1 2]
 [3 4 5]
 [6 7 8]]

2 to the power of x_2d:
[[  1   2   4]
 [  8  16  32]
 [ 64 128 256]]


### 10. Aggregations (Sum, Min, Max, etc.)

Aggregations are functions that summarize the values in an array. A key concept here is the axis keyword.

axis=0: Collapse the columns (e.g., find the min of each column).

axis=1: Collapse the rows (e.g., find the min of each row).

In [None]:
rand_2d = np.random.random((2, 3)).round(2)
print(f"2x3 random array:\n{rand_2d}")

print(f"Sum of all elements: {np.sum(rand_2d)}")
print(f"Min of each column: {np.min(rand_2d, axis=0)}")
print(f"Max of each row: {np.max(rand_2d, axis=1)}")

2x3 random array:
[[0.69 0.33 0.99]
 [0.86 0.37 0.15]]
Sum of all elements: 3.3899999999999997
Min of each column: [0.69 0.33 0.15]
Max of each row: [0.99 0.86]


### 11. Broadcasting

Broadcasting is a set of rules for applying binary ufuncs (like addition) on arrays of different sizes. NumPy "stretches" the smaller array to match the larger one, if the dimensions are compatible.

In [31]:
# Array and a scalar
a = np.array([0, 1, 2])
print(f"a: {a}")
print(f"a + 5: {a + [5]}")  # 5 is "broadcast" to [5, 5, 5]

a: [0 1 2]
a + 5: [5 6 7]


In [None]:
# 2D array and a 1D array
ones = np.ones((3, 3))
a = np.array([0, 1, 2])
print(f"3x3 ones:\n{ones}")
print(f"\n1x3 array 'a': {a}")

# 'a' is broadcast across every row of 'ones'
print(f"\nones + a:\n{ones + a}")

3x3 ones:
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

1x3 array 'a': [0 1 2]

ones + a:
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


In [None]:
# Two 1D arrays (one row, one column)
a = np.arange(3)  # shape (3,) -> treated as 1x3
b = np.arange(3).reshape((3, 1))  # shape (3, 1)

print(f"a (shape {a.shape}): {a}")
print(f"b (shape {b.shape}):\n{b}")

# 'a' is broadcast down 3 rows
# 'b' is broadcast across 3 columns
print(f"\na + b:\n{a + b}")

a (shape (3,)): [0 1 2]
b (shape (3, 1)):
[[0]
 [1]
 [2]]

a + b:
[[0 1 2]
 [1 2 3]
 [2 3 4]]


### 12. Comparison Operators and Fancy Indexing

Comparison operators (like < or >) are also ufuncs. They return a boolean array.

You can use these boolean arrays to "mask" your data and select only the values you care about. This is one form of "fancy indexing".

Another form of fancy indexing is passing an array of indices to access elements.

In [None]:
x = np.array([1, 2, 3, 4, 5])

# Comparison ufunc
print(f"x < 3: {x < 3}")

print(f"Sum of (x < 3): {np.sum(x < 3)}")
print(f"Any > 8? {np.any(x > 8)}")
print(f"All < 8? {np.all(x < 8)}")

x < 3: [ True  True False False False]
Sum of (x < 3): 2
Any > 8? False
All < 8? True


In [None]:
# Boolean mask indexing
rand = np.random.randint(100, size=10)
print(f"Random array: {rand}")

# Select only the values less than 50
print(f"Values < 50: {rand[rand < 50]}")

Random array: [39 46 93 20 45 18 44 39 79 95]
Values < 50: [39 46 20 45 18 44 39]


In [None]:
# Index array ("fancy indexing")
print(f"Random array: {rand}")

# Select elements at index 3, 7, and 4
indices = [3, 7, 4]
print(f"Elements at [3, 7, 4]: {rand[indices]}")

Random array: [39 46 93 20 45 18 44 39 79 95]
Elements at [3, 7, 4]: [20 39 45]
