# NumPy Fundamentals
## Table of Contents

1.  [Introduction to NumPy](#scrollTo=eaae89ec&line=1&uniqifier=1)
2.  [Creating NumPy Arrays](#scrollTo=373b895b)
3.  [Array Attributes](#scrollTo=a0060298&line=3&uniqifier=1)
4.  [Array Indexing and Slicing](#scrollTo=d561ad79)
5.  [Array Manipulation](#scrollTo=8db3b4a3&line=3&uniqifier=1)
6.  [Basic Array Operations](#scrollTo=e68ee6c6&line=3&uniqifier=1)
7.  [Aggregate Functions](#scrollTo=a64f1b59&line=2&uniqifier=1)
8.  [Linear Algebra with NumPy](#scrollTo=f121ccc7&line=5&uniqifier=1)
9.  [Random Number Generation](#scrollTo=181128ae&line=3&uniqifier=1)
10. [Summary](#scrollTo=abab9d96&line=1&uniqifier=1)

## Introduction to NumPy



NumPy, which stands for Numerical Python, is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions to operate on these arrays. In data science, NumPy is critical because it forms the backbone for many other libraries like Pandas, SciPy, and Scikit-learn, enabling efficient numerical operations that are essential for data manipulation, analysis, and machine learning.

### Advantages of NumPy Arrays over Standard Python Lists:

1.  **Efficiency (Speed and Memory Usage):**
    *   **Speed:** NumPy operations are implemented in C and Fortran, making them significantly faster than equivalent operations performed using Python lists. This is particularly noticeable when dealing with large datasets, where vectorized operations can lead to orders of magnitude improvement in performance.
    *   **Memory Usage:** NumPy arrays are more memory-efficient than Python lists. Unlike Python lists, which can store elements of different data types and thus store pointers to objects, NumPy arrays are homogeneous (all elements are of the same data type). This allows NumPy to store elements contiguously in memory, leading to better cache utilization and reduced memory overhead.

2.  **Functionality (Mathematical Operations):**
    *   NumPy provides a vast collection of mathematical functions that can be directly applied to entire arrays without explicit loops. This includes linear algebra routines, Fourier transforms, random number generation, and various elementary mathematical operations (e.g., trigonometric functions, logarithms, exponents). While Python lists can be used for some numerical tasks, they lack the optimized, built-in mathematical functions that make NumPy so powerful for scientific and data-intensive applications.

In [None]:
import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

%timeit my_arr2 = my_arr * 2
%timeit my_list2 = [x * 2 for x in my_list]

<a id='Creating-Numpy-Arrays'></a>
## Creating NumPy Arrays

Now we demonstrate various ways to create NumPy arrays, including from Python lists, `arange`, `zeros`, `ones`, `full`, `empty`, `linspace`, and `eye`.


To begin demonstrating NumPy array creation, the first step is to import the NumPy library, which is a fundamental requirement for using its functionalities.



In [None]:
import numpy as np

Create a NumPy array from a Python list and print it.



In [None]:
list_array = np.array([1, 2, 3, 4, 5])
print("Array from Python list:", list_array)

Use `np.arange()` to create a NumPy array with a range of values and then print it.



In [None]:
arange_array = np.arange(10)
print("Array created with arange():", arange_array)

Use `np.zeros()` to create a NumPy array filled with zeros.



In [None]:
zeros_array = np.zeros((3, 4))
print("Array created with zeros():\n", zeros_array)

Use `np.ones()` to create a NumPy array filled with ones.



In [None]:
ones_array = np.ones((2, 3))
print("Array created with ones():\n", ones_array)

Use `np.full()` to create a NumPy array filled with a specific value.



In [None]:
full_array = np.full((2, 2), 7)
print("Array created with full():\n", full_array)

Use `np.empty()` to create an uninitialized NumPy array.



In [None]:
empty_array = np.empty((2, 3))
print("Array created with empty():\n", empty_array)

Use `np.linspace()` to create a NumPy array with evenly spaced numbers over a specified interval.



In [None]:
linspace_array = np.linspace(0, 10, 5)
print("Array created with linspace():", linspace_array)

Use `np.eye()` to create an identity matrix.



In [None]:
eye_array = np.eye(3)
print("Array created with eye():\n", eye_array)

## Array Attributes

Now we illustrate important array attributes such as `ndim` (number of dimensions), `shape` (dimensions of the array), `size` (total number of elements), and `dtype` (data type of elements).


Create a 2-dimensional NumPy array (also known as a Rank-2 tensor), which will serve as our example for demonstrating `ndim`, `shape`, `size`, and `dtype`.



In [None]:
sample_array = np.array([[1, 2, 3], [4, 5, 6]])
print("Created a 2-dimensional array:\n", sample_array)

In [None]:
print("Number of dimensions (ndim):", sample_array.ndim)

In [None]:
print("Shape of array (shape):", sample_array.shape)

In [None]:
print("Total number of elements (size):", sample_array.size)

In [None]:
print("Data type of elements (dtype):", sample_array.dtype)

## Array Indexing and Slicing

Access specific elements, rows, columns, and sub-arrays using integer indexing, slicing, and boolean indexing.


In [None]:
my_array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Original array:\n", my_array)

In [None]:
element = my_array[1, 2]
print("Element at second row, third column:", element)

In [None]:
first_row = my_array[0, :]
print("First row:", first_row)

In [None]:
last_column = my_array[:, -1]
print("Last column:", last_column)

In [None]:
sub_array = my_array[0:2, 1:3]
print("Sub-array (first two rows, middle two columns):\n", sub_array)

In [None]:
boolean_indexed_array = my_array[my_array > 7]
print("Elements greater than 7:", boolean_indexed_array)

## Array Manipulation

Reshaping arrays (`reshape`), flattening arrays (`flatten`, `ravel`), concatenating arrays (`concatenate`, `vstack`, `hstack`), and splitting arrays (`split`, `vsplit`, `hsplit`).


In [None]:
arr_1d = np.arange(12)
print("Original 1D array:\n", arr_1d)

In [None]:
arr_2d = arr_1d.reshape(3, 4)
print("Reshaped 2D array (3 rows, 4 columns):\n", arr_2d)

In [None]:
print("Original 1D array:", arr_1d)
print("Reshaped 2D array:\n", arr_2d)

In [None]:
flattened_array = arr_2d.flatten()
raveled_array = arr_2d.ravel()
print("Flattened array (using .flatten()):", flattened_array)
print("Raveled array (using .ravel()):", raveled_array)

# Note: ravel() returns a view when possible, flatten() always returns a copy.
# For this simple example, the output will look identical, but memory behavior differs.


In [None]:
arr_a = np.array([[1, 2], [3, 4]])
arr_b = np.array([[5, 6], [7, 8]])
print("Array arr_a:\n", arr_a)
print("Array arr_b:\n", arr_b)

In [None]:
concatenated_axis0 = np.concatenate((arr_a, arr_b), axis=0)
concatenated_axis1 = np.concatenate((arr_a, arr_b), axis=1)
print("Concatenated along axis 0 (vertical):\n", concatenated_axis0)
print("Concatenated along axis 1 (horizontal):\n", concatenated_axis1)

In [None]:
vstack_array = np.vstack((arr_a, arr_b))
hstack_array = np.hstack((arr_a, arr_b))
print("Vertical stack (vstack):\n", vstack_array)
print("Horizontal stack (hstack):\n", hstack_array)

In [None]:
arr_to_split = np.arange(16).reshape(4, 4)
print("Array to split (arr_to_split):\n", arr_to_split)

In [None]:
hsplit_arrays = np.hsplit(arr_to_split, 2)
vsplit_arrays = np.vsplit(arr_to_split, 2)
print("Horizontal split (hsplit):")
for i, arr in enumerate(hsplit_arrays):
    print(f"  Part {i+1}:\n{arr}")
print("\nVertical split (vsplit):")
for i, arr in enumerate(vsplit_arrays):
    print(f"  Part {i+1}:\n{arr}")

In [None]:
split_arrays_axis1 = np.split(arr_to_split, 4, axis=1)
print("Split into 4 parts along axis 1 (columns) using np.split():")
for i, arr in enumerate(split_arrays_axis1):
    print(f"  Part {i+1}:\n{arr}")

## Basic Array Operations

Element-wise arithmetic operations (+, -, *, /), broadcasting rules, and the use of universal functions (ufuncs) for mathematical operations.


In [None]:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
print("Array arr1:\n", arr1)
print("Array arr2:\n", arr2)

In [None]:
addition_result = arr1 + arr2
print("Element-wise addition:\n", addition_result)

In [None]:
subtraction_result = arr1 - arr2
print("Element-wise subtraction:\n", subtraction_result)

In [None]:
multiplication_result = arr1 * arr2
print("Element-wise multiplication:\n", multiplication_result)

In [None]:
division_result = arr1 / arr2
print("Element-wise division:\n", division_result)

In [None]:
scalar_addition_result = arr1 + 10
print("Scalar addition (arr1 + 10):\n", scalar_addition_result)

In [None]:
arr_broadcast = np.array([1, 2])
broadcasting_addition_result = arr1 + arr_broadcast
print("Broadcasting with 1D array (arr1 + arr_broadcast):\n", broadcasting_addition_result)

In [None]:
ufunc_result = np.sqrt(arr1)
print("Applying np.sqrt() to arr1:\n", ufunc_result)

## Aggregate Functions

Common aggregate functions like `sum()`, `min()`, `max()`, `mean()`, `std()`, `var()`, and how to apply them along specific axes.


In [None]:
agg_array = np.arange(1, 13).reshape(3, 4)
print("Original array (agg_array):\n", agg_array)

In [None]:
total_sum = np.sum(agg_array)
print("Sum of all elements:", total_sum)

In [None]:
min_element = np.min(agg_array)
max_element = np.max(agg_array)
print("Minimum element:", min_element)
print("Maximum element:", max_element)

In [None]:
mean_element = np.mean(agg_array)
print("Mean of all elements:", mean_element)

In [None]:
std_dev = np.std(agg_array)
variance = np.var(agg_array)
print("Standard deviation of all elements:", std_dev)
print("Variance of all elements:", variance)

In [None]:
sum_axis0 = np.sum(agg_array, axis=0)
print("Sum along axis 0 (columns):", sum_axis0)

In [None]:
sum_axis1 = np.sum(agg_array, axis=1)
print("Sum along axis 1 (rows):", sum_axis1)

In [None]:
min_axis0 = np.min(agg_array, axis=0)
max_axis0 = np.max(agg_array, axis=0)
print("Minimum along axis 0 (columns):", min_axis0)
print("Maximum along axis 0 (columns):", max_axis0)

In [None]:
min_axis1 = np.min(agg_array, axis=1)
max_axis1 = np.max(agg_array, axis=1)
print("Minimum along axis 1 (rows):", min_axis1)
print("Maximum along axis 1 (rows):", max_axis1)

In [None]:
mean_axis0 = np.mean(agg_array, axis=0)
std_axis0 = np.std(agg_array, axis=0)
var_axis0 = np.var(agg_array, axis=0)
print("Mean along axis 0 (columns):", mean_axis0)
print("Standard deviation along axis 0 (columns):", std_axis0)
print("Variance along axis 0 (columns):", var_axis0)

In [None]:
mean_axis1 = np.mean(agg_array, axis=1)
std_axis1 = np.std(agg_array, axis=1)
var_axis1 = np.var(agg_array, axis=1)
print("Mean along axis 1 (rows):", mean_axis1)
print("Standard deviation along axis 1 (rows):", std_axis1)
print("Variance along axis 1 (rows):", var_axis1)

## Linear Algebra with NumPy

Linear algebra is a fundamental mathematical tool in data science, providing the theoretical and practical backbone for many algorithms, including machine learning models, dimensionality reduction techniques, and image processing. NumPy, with its efficient array operations, is perfectly suited for performing these linear algebra computations.

Here we introduce common linear algebra operations relevant to data science, such as dot product, matrix multiplication, transpose, and inverse using NumPy functions.

In [None]:
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
print("Matrix A:\n", matrix_a)
print("Matrix B:\n", matrix_b)

In [None]:
dot_product = np.dot(matrix_a, matrix_b)
print("Dot product of A and B:\n", dot_product)

In [None]:
matrix_multiplication = np.matmul(matrix_a, matrix_b)
print("Matrix multiplication of A and B (using np.matmul()):\n", matrix_multiplication)
# For 2D arrays, np.dot() and np.matmul() produce the same result for matrix multiplication.

In [None]:
transpose_a = matrix_a.T
print("Transpose of Matrix A (using .T attribute):\n", transpose_a)

In [None]:
singular_matrix = np.array([[1, 2], [2, 4]])
print("Singular Matrix (singular_matrix):\n", singular_matrix)

Attempt to calculate the inverse of `singular_matrix` using `np.linalg.inv()` to demonstrate that singular matrices do not have an inverse, and observe the expected `LinAlgError`.



In [None]:
try:
    inverse_singular = np.linalg.inv(singular_matrix)
    print("Inverse of singular_matrix:\n", inverse_singular)
except np.linalg.LinAlgError as e:
    print(f"Error calculating inverse of singular matrix: {e}\nThis matrix is singular and does not have an inverse.")

Create a non-singular 2x2 NumPy array, `non_singular_matrix`.



In [None]:
non_singular_matrix = np.array([[1, 2], [3, 4]])
print("Non-Singular Matrix (non_singular_matrix):\n", non_singular_matrix)

In [None]:
inverse_non_singular = np.linalg.inv(non_singular_matrix)
print("Inverse of non_singular_matrix:\n", inverse_non_singular)

In [None]:
verification_matrix = np.dot(non_singular_matrix, inverse_non_singular)
print("Verification (non_singular_matrix * its inverse, should be identity matrix):\n", verification_matrix)

## Random Number Generation

Generate random numbers and arrays from various distributions using NumPy's `random` module, including `rand`, `randn`, `randint`, and `normal`.


### Random Number Generation with NumPy's `random` module

NumPy's `random` module is essential for simulating data, statistical modeling, and machine learning tasks. It provides functions to generate arrays of random numbers from various probability distributions, making it a powerful tool for numerical experiments and data science applications.



In [None]:
uniform_rand_array = np.random.rand(2, 3)
print("2x3 array from uniform distribution (rand()):\n", uniform_rand_array)

In [None]:
standard_normal_array = np.random.randn(2, 3)
print("2x3 array from standard normal distribution (randn()):\n", standard_normal_array)

In [None]:
random_integers_array = np.random.randint(1, 11, size=(2, 3))
print("2x3 array of random integers (randint()):\n", random_integers_array)

In [None]:
normal_dist_array = np.random.normal(loc=5, scale=2, size=(2, 3))
print("2x3 array from normal distribution (normal()):\n", normal_dist_array)

In [None]:
np.random.seed(42)
reproducible_rand_array = np.random.rand(2, 3)
print("Reproducible 2x3 array after setting seed (rand()):\n", reproducible_rand_array)
print("\nSetting a random seed ensures that the 'random' numbers generated are the same each time the code is run, which is crucial for reproducible research and debugging.")

## Summary:

*   **NumPy's Core Value**: NumPy is fundamental for scientific computing in Python, providing efficient support for large, multi-dimensional arrays and mathematical operations. It serves as a backbone for libraries like Pandas, SciPy, and Scikit-learn, offering significant performance advantages (speed and memory) over standard Python lists due to its C/Fortran implementations and homogeneous, contiguous memory storage.
*   **Array Creation Versatility**: Various methods were demonstrated for creating arrays, including `np.array()` (from Python lists), `np.arange()` (sequential values), `np.zeros()`, `np.ones()`, `np.full()` (fixed values), `np.empty()` (uninitialized), `np.linspace()` (evenly spaced values), and `np.eye()` (identity matrices).
*   **Essential Array Attributes**: Key attributes provide crucial metadata about arrays:
    *   `ndim`: Indicates the number of array dimensions (e.g., `2` for a 2D array).
    *   `shape`: Describes the dimensions of the array (e.g., `(2, 3)` for 2 rows and 3 columns).
    *   `size`: Represents the total number of elements in the array (e.g., `6` for a `(2, 3)` array).
    *   `dtype`: Specifies the data type of the elements (e.g., `int64`).
*   **Flexible Indexing and Slicing**: NumPy offers robust ways to access array elements:
    *   **Integer Indexing**: Directly accessing specific elements (e.g., `my_array[1, 2]` returns `7`).
    *   **Slicing**: Extracting sub-arrays, rows, or columns (e.g., `my_array[0:2, 1:3]` returns a sub-array `[[2 3]\n [6 7]]`).
    *   **Boolean Indexing**: Selecting elements based on a conditional expression (e.g., `my_array[my_array > 7]` returns `[ 8 9 10 11 12]`).
*   **Powerful Array Manipulation**: Techniques for restructuring and combining arrays were shown:
    *   `reshape()`: Changing the dimensions of an array (e.g., a 1D array of 12 elements into a `(3, 4)` 2D array).
    *   `flatten()` and `ravel()`: Converting multi-dimensional arrays into 1D arrays.
    *   `np.concatenate()`, `np.vstack()`, `np.hstack()`: Combining multiple arrays vertically or horizontally.
    *   `np.split()`, `np.vsplit()`, `np.hsplit()`: Dividing arrays into multiple sub-arrays.
*   **Efficient Basic Operations**: NumPy supports element-wise arithmetic operations (`+`, `-`, `*`, `/`) and leverages **broadcasting** rules to perform operations on arrays of different shapes (e.g., adding a scalar `10` to every element, or adding a 1D array to a 2D array). It also provides highly optimized **universal functions (ufuncs)** like `np.sqrt()` for element-wise mathematical computations.
*   **Comprehensive Aggregate Functions**: Common statistical functions can be applied globally or along specific axes:
    *   Global: `np.sum()` (e.g., `78`), `np.min()` (e.g., `1`), `np.max()` (e.g., `12`), `np.mean()` (e.g., `6.5`), `np.std()`, `np.var()`.
    *   Axis-wise: These functions can be applied along `axis=0` (columns) or `axis=1` (rows) to compute statistics per dimension (e.g., sum along `axis=0` yields `[15 18 21 24]`).
*   **Linear Algebra Capabilities**: NumPy is essential for linear algebra operations vital in data science:
    *   `np.dot()` and `np.matmul()`: For dot products and matrix multiplication.
    *   `.T`: For transposing matrices.
    *   `np.linalg.inv()`: For calculating matrix inverses, correctly handling `np.linalg.LinAlgError` for singular matrices which lack an inverse.
*   **Random Number Generation**: The `np.random` module provides functions for generating random numbers from various distributions:
    *   `np.random.rand()`: Uniform distribution (0 to 1).
    *   `np.random.randn()`: Standard normal distribution.
    *   `np.random.randint()`: Random integers within a range.
    *   `np.random.normal()`: Normal distribution with specified mean and standard deviation.
    *   `np.random.seed()`: Ensures reproducibility of random number sequences, critical for consistent results in simulations and experiments.

