# Section 3 — Data Computations with NumPy

NumPy is a foundational Python library for numerical operations and efficient array-based computations — critical in machine learning and industrial data workflows.

In this section, you will learn how to:
- **Create NumPy arrays** (vectors and matrices) and generate special arrays (`eye`, `zeros`, `ones`, `linspace`, etc.)  
- **Perform indexing, slicing, and boolean selection** to access and manipulate elements  
- **Use built-in attributes** (`shape`, `ndim`, `dtype`, etc.) to understand array structure and memory usage  
- **Execute element-wise operations and apply universal functions** (`ufuncs`) for fast mathematical computations  
- **Compute basic statistics** (`mean`, `std`, `min`, etc.) to summarize and analyze data  
- **Manipulate array shape and arrangement** (`reshape`, `concatenate`, `split`, etc.) for flexible data handling

In [35]:
print(np.__version__)

2.0.2


## 3.1 Creating Arrays

NumPy provides powerful ways to create arrays (vectors and matrices), which are the core data structure for numerical computation.

- Arrays can be created from Python lists or tuples.
- Special arrays can be generated directly using NumPy functions (`zeros`, `ones`, `eye`, etc.).
- NumPy also provides utilities for creating sequences (`arange`, `linspace`) and random numbers.

In [1]:
# Import NumPy
import numpy as np

In [2]:
# Creating a 1D array from a Python list
arr1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr1d)

1D Array: [1 2 3 4 5]


In [3]:
# Creating a 2D array from nested lists
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr2d)

2D Array:
 [[1 2 3]
 [4 5 6]]


In [4]:
# Creating an identity matrix
identity = np.eye(4)
print("Identity Matrix (4x4):\n", identity)

Identity Matrix (4x4):
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [5]:
# Creating arrays of zeros and ones
zeros = np.zeros((2, 3))
ones = np.ones((3, 2))
print("Zeros:\n", zeros)
print("Ones:\n", ones)

Zeros:
 [[0. 0. 0.]
 [0. 0. 0.]]
Ones:
 [[1. 1.]
 [1. 1.]
 [1. 1.]]


In [6]:
# Using np.arange to create evenly spaced values
ar = np.arange(0, 10, 2)
print("Arange 0 to 10 step 2:", ar)

Arange 0 to 10 step 2: [0 2 4 6 8]


In [7]:
# Using np.linspace to create evenly spaced values between limits
lin = np.linspace(0, 1, 5)
print("Linspace 0 to 1 with 5 points:", lin)

Linspace 0 to 1 with 5 points: [0.   0.25 0.5  0.75 1.  ]


In [8]:
# Creating random arrays (uniform and integers)
np.random.seed(42)  # for reproducibility
rand_uniform = np.random.rand(2, 3)  # uniform [0, 1)
rand_int = np.random.randint(0, 10, (2, 3))  # integers
print("Random Uniform:\n", rand_uniform)
print("Random Integers:\n", rand_int)

Random Uniform:
 [[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]]
Random Integers:
 [[7 4 3]
 [7 7 2]]


> **Practical Tips for Students:**
>
> 1. `np.array()` is best when converting lists or tuples for **vectorized operations**.
> 2. Use `arange` or `linspace` for **numerical sequences**; `linspace` is safer for floating points.
> 3. `zeros`, `ones`, and `eye` are useful for **initialization** in ML or simulations.
> 4. Random arrays can **simulate datasets**, test algorithms, or initialize weights.
> 5. Always check **shape** and **dimension** before applying computations.
> 6. Consider practical applications: e.g., plotting, sampling, initializing parameters, and simulating data.

## 3.2 Array Indexing & Slicing

NumPy arrays can be indexed and sliced similarly to Python lists, but also support multi-dimensional arrays.

- Use `[i]` to access a single element.
- Use `[start:stop:step]` to slice elements.
- For 2D arrays, use `[row_index, column_index]` or slices like `[row_start:row_end, col_start:col_end]`.
- NumPy arrays support **Boolean indexing**, which allows selecting elements based on a condition.
This is very useful for filtering data efficiently without using loops.

In [9]:
# ---- 1D Array Indexing and Slicing ----

arr = np.array([10, 20, 30, 40, 50, 60])
print("Original array:", arr)

# Indexing
print("Element at index 2:", arr[2])

# Slicing
print("Elements from index 1 to 4:", arr[1:5])
print("Every second element:", arr[::2])

Original array: [10 20 30 40 50 60]
Element at index 2: 30
Elements from index 1 to 4: [20 30 40 50]
Every second element: [10 30 50]


In [10]:
# ---- 2D Array Indexing and Slicing ----
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

print("Original 2D array:\n", arr2d)

# Access element at row 1, column 2
print("Element at (1,2):", arr2d[1, 2])

# Slice rows and columns
print("First two rows, last two columns:\n", arr2d[:2, 1:])

Original 2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Element at (1,2): 6
First two rows, last two columns:
 [[2 3]
 [5 6]]


In [11]:
# ---- Boolean Indexing ----
arr = np.array([10, 20, 30, 40, 50])

# Create a Boolean mask for elements greater than 25
mask = arr > 25
print("Boolean mask:", mask)

# Use mask to select elements
filtered = arr[mask]
print("Elements greater than 25:", filtered)

# Shorter version
print("Elements greater than 25 (one-liner):", arr[arr > 25])

Boolean mask: [False False  True  True  True]
Elements greater than 25: [30 40 50]
Elements greater than 25 (one-liner): [30 40 50]


## 3.3 Array Attributes and Basic Properties

NumPy arrays (`ndarray`) come with built-in attributes that describe their structure and memory usage.  
These are not just technical details — they are *practical tools* you will use frequently when analyzing data.

- **`shape`** → tells you the dimensions of your dataset (e.g., `1000 × 20` means 1000 samples, 20 features).  
- **`ndim`** → shows if your data is a vector (1D), matrix (2D), or higher-dimensional tensor (common in ML).  
- **`size`**, **`itemsize`**, and **`nbytes`** → give insight into **memory usage**, which is critical when handling large datasets.  
- **`dtype`** → defines the type of data (integer, float, boolean, etc.), affecting both performance and correctness.
- **`T`** gives the transpose of the array.
- **`real`** and **`imag`** → access the real and imaginary parts of complex-valued arrays.

In [12]:
# Checking array attributes: shape, size, dtype
arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Array:\n", arr)
print("Shape:", arr.shape)     # (rows, columns)
print("Size:", arr.size)       # total number of elements
print("Data type:", arr.dtype) # type of elements

Array:
 [[1 2 3]
 [4 5 6]]
Shape: (2, 3)
Size: 6
Data type: int64


In [13]:
# Number of dimensions (ndim)
print("Number of dimensions:", arr.ndim)

Number of dimensions: 2


In [14]:
# Item size and total memory consumption
print("Item size (bytes):", arr.itemsize)
print("Total bytes consumed:", arr.nbytes)

Item size (bytes): 8
Total bytes consumed: 48


In [15]:
# Transpose of the array (T)
print("Original array:\n", arr)
print("Transpose:\n", arr.T)

Original array:
 [[1 2 3]
 [4 5 6]]
Transpose:
 [[1 4]
 [2 5]
 [3 6]]


In [16]:
# Real and imaginary parts of a complex array
complex_arr = np.array([1+2j, 3+4j])
print("Complex array:", complex_arr)
print("Real part:", complex_arr.real)
print("Imaginary part:", complex_arr.imag)

Complex array: [1.+2.j 3.+4.j]
Real part: [1. 3.]
Imaginary part: [2. 4.]


## 3.4 Basic Operations & ufuncs

NumPy provides fast, vectorized operations that act on entire arrays without explicit loops.  
These are often referred to as **universal functions (ufuncs)**.

Why this matters:
- In pure Python, looping over elements can be very slow.  
- With NumPy, operations are applied element-wise at **compiled C speed**, making them much faster.  
- This is one of the key reasons NumPy is widely used in data science, ML, and engineering.

Examples of array operations:
- Arithmetic operations: `+`, `-`, `*`, `/`, `**`  
- Comparison operations: `<`, `>`, `==`, `!=`  
- Aggregate methods: `sum()`, `min()`, `max()`, `mean()`  
- Universal functions (`ufuncs`): `np.sqrt()`, `np.exp()`, `np.pow()`, `np.log()`, `np.sin()`, etc.

In practice, you should **avoid explicit Python loops** when working with arrays —  
instead, rely on these **vectorized operations** for speed and clarity.

In [17]:
# Basic arithmetic operations on arrays
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print("a + b:", a + b)
print("a - b:", a - b)
print("a * b:", a * b)   # element-wise multiplication
print("b / a:", b / a)   # element-wise division
print("a ** 2:", a ** 2) # element-wise power

a + b: [11 22 33 44]
a - b: [ -9 -18 -27 -36]
a * b: [ 10  40  90 160]
b / a: [10. 10. 10. 10.]
a ** 2: [ 1  4  9 16]


In [18]:
# Comparison operations on arrays
print("a > 2:", a > 2)
print("b == 20:", b == 20)
print("a != b:", a != b)

a > 2: [False False  True  True]
b == 20: [False  True False False]
a != b: [ True  True  True  True]


In [19]:
# Aggregate methods
print("Sum of a:", a.sum())
print("Minimum of b:", b.min())
print("Maximum of b:", b.max())
print("Mean of a:", a.mean())

Sum of a: 10
Minimum of b: 10
Maximum of b: 40
Mean of a: 2.5


In [20]:
# Universal functions (ufuncs)
print("Square root of a:", np.sqrt(a))
print("Exponential of a:", np.exp(a))
print("Natural log of b:", np.log(b))
print("Sine of a:", np.sin(a))

Square root of a: [1.         1.41421356 1.73205081 2.        ]
Exponential of a: [ 2.71828183  7.3890561  20.08553692 54.59815003]
Natural log of b: [2.30258509 2.99573227 3.40119738 3.68887945]
Sine of a: [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]


In [21]:
# Example 1: Normalizing student grades (scale values between 0 and 1)
grades = np.array([12, 15, 18, 10, 20])  # out of 20
normalized_grades = grades / grades.max()
print("Original grades:", grades)
print("Normalized grades:", normalized_grades)

Original grades: [12 15 18 10 20]
Normalized grades: [0.6  0.75 0.9  0.5  1.  ]


In [22]:
# Example 2: Modeling population growth using exponential function
population = 700000   # initial population of Hamedan
years = np.arange(0, 6)   # next 5 years
growth_rate = 0.02        # 2% annual growth

future_population = population * np.pow((1 + growth_rate), years)
print("Years:", years)
print("Future population estimates:", future_population.astype(int))

Years: [0 1 2 3 4 5]
Future population estimates: [700000 714000 728280 742845 757702 772856]


### Why this matters in practice

These basic operations and NumPy's universal functions (`ufuncs`) are the **building blocks** of most numerical and machine learning workflows:

- Element-wise arithmetic allows us to quickly scale, normalize, or transform features.
- Comparison operations are used to create masks or filter data (e.g., selecting all students with grades above 15).
- Aggregate methods (sum, mean, min, max) summarize large datasets and help in quick data exploration.
- Universal functions (`np.sqrt`, `np.exp`, `np.log`, `np.sin`, etc.) are highly optimized and work much faster than manually looping over elements in Python.  
  They are widely used in machine learning, statistics, and scientific computations.

In short, NumPy gives us **vectorized operations** — meaning we can process entire arrays of data in one go, which is both faster and more readable than using Python loops.

## 3.5 Basic Statistics

NumPy provides many built-in statistical functions that allow you to quickly summarize and analyze datasets.  
These are extremely important in **data analysis and preprocessing**, especially when working with large arrays.

Some of the most commonly used functions:

- `mean()` → average value of elements  
- `median()` → middle value in sorted data  
- `std()` → standard deviation (spread of data)  
- `var()` → variance (square of std)  
- `min()` / `max()` → smallest and largest values  
- `argmin()` / `argmax()` → indices of smallest and largest values  
- `sum()` → total sum of elements  
- `cumsum()` → cumulative sum (running total)  
- `prod()` → product of all elements  

💡 **Why it matters for ML & data analysis**:  
- Checking dataset statistics helps in **detecting outliers**.  
- Standard deviation and variance describe **data spread** (important for scaling features).  
- `argmin` / `argmax` are often used to **find best/worst cases** (e.g., max efficiency, min error).  

In [23]:
# Mean, Median, and Sum
arr = np.array([10, 20, 30, 40, 50])
print("Array:", arr)
print("Mean:", np.mean(arr))
print("Median:", np.median(arr))
print("Sum:", np.sum(arr))

Array: [10 20 30 40 50]
Mean: 30.0
Median: 30.0
Sum: 150


In [24]:
# Standard Deviation and Variance
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
print("Data:", data)
print("Standard Deviation:", np.std(data))
print("Variance:", np.var(data))

Data: [2 4 4 4 5 5 7 9]
Standard Deviation: 2.0
Variance: 4.0


In [25]:
# Minimum, Maximum, and Their Indices
scores = np.array([72, 88, 95, 60, 83])
print("Scores:", scores)
print("Minimum:", np.min(scores))
print("Maximum:", np.max(scores))
print("Index of Minimum:", np.argmin(scores))
print("Index of Maximum:", np.argmax(scores))

Scores: [72 88 95 60 83]
Minimum: 60
Maximum: 95
Index of Minimum: 3
Index of Maximum: 2


In [26]:
# Cumulative Sum and Product
arr2 = np.array([1, 2, 3, 4, 5])
print("Array:", arr2)
print("Cumulative Sum:", np.cumsum(arr2))
print("Product of all elements:", np.prod(arr2))

Array: [1 2 3 4 5]
Cumulative Sum: [ 1  3  6 10 15]
Product of all elements: 120


In [27]:
# Statistics along rows and columns in a 2D array
grades = np.array([
    [85, 90, 78],   # Student 1 scores in 3 subjects
    [88, 76, 92],   # Student 2
    [95, 89, 96]    # Student 3
])

print("Grades:\n", grades)

# Column-wise statistics (per subject)
print("Average score per subject:", np.mean(grades, axis=0))
print("Max score per subject:", np.max(grades, axis=0))

# Row-wise statistics (per student)
print("Total score per student:", np.sum(grades, axis=1))
print("Best score per student:", np.max(grades, axis=1))

Grades:
 [[85 90 78]
 [88 76 92]
 [95 89 96]]
Average score per subject: [89.33333333 85.         88.66666667]
Max score per subject: [95 90 96]
Total score per student: [253 256 280]
Best score per student: [90 92 96]


In [28]:
# Example: Hourly electricity load for 3 cities over 5 hours
load_data = np.array([
    [120, 135, 128],  # Hour 1: City A, B, C
    [125, 138, 130],  # Hour 2
    [130, 140, 135],  # Hour 3
    [128, 142, 138],  # Hour 4
    [132, 145, 140]   # Hour 5
])

print("Hourly Load Data (MW):\n", load_data)

# Average load per city
avg_city_load = np.mean(load_data, axis=0)
print("Average load per city:", avg_city_load)

# Max load per hour
max_hourly_load = np.max(load_data, axis=1)
print("Maximum load per hour:", max_hourly_load)

# Total load for each city
total_city_load = np.sum(load_data, axis=0)
print("Total load per city:", total_city_load)

Hourly Load Data (MW):
 [[120 135 128]
 [125 138 130]
 [130 140 135]
 [128 142 138]
 [132 145 140]]
Average load per city: [127.  140.  134.2]
Maximum load per hour: [135 138 140 142 145]
Total load per city: [635 700 671]


## 3.6 Data Manipulation — Array Shape and Arrangement

Working with real-world datasets often requires **reshaping, combining, splitting, or reordering arrays**.  
NumPy provides a rich set of functions for these tasks:

### Reshaping and flattening
- `reshape()` — change the shape of an array without changing its data
- `ravel()` or `flatten()` — convert an array to 1D

### Transposing
- `T` or `transpose()` — swap axes or transpose matrices

### Combining arrays
- `concatenate()` — join arrays along an existing axis
- `stack()` — combine arrays along a new axis (`vstack`, `hstack`, `column_stack`)

### Splitting arrays
- `split()` — divide an array into multiple sub-arrays
- `hsplit()`, `vsplit()` — split arrays horizontally or vertically

### Modifying array elements
- `append()` — add elements to an array
- `repeat()` — repeat elements along an axis

### Sorting and flipping
- `sort()` — sort elements along an axis
- `flipud()` — flip array upside down
- `fliplr()` — flip array left to right

These operations are **crucial for preparing data** for ML workflows, such as:
- Reshaping inputs for models
- Combining different datasets
- Reordering data for visualization or analysis

In [29]:
# Reshaping and Transposing Arrays

# Original array
arr = np.arange(1, 13)  # 1D array from 1 to 12
print("Original array:", arr)

# Reshape to 3x4
arr_3x4 = arr.reshape(3, 4)
print("\nReshaped to 3x4:\n", arr_3x4)

# Transpose
print("\nTransposed (3x4 -> 4x3):\n", arr_3x4.T)

Original array: [ 1  2  3  4  5  6  7  8  9 10 11 12]

Reshaped to 3x4:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Transposed (3x4 -> 4x3):
 [[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]


In [30]:
# Concatenation and Stacking

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Concatenate along axis 0 (rows)
concat_rows = np.concatenate((a, b), axis=0)
print("Concatenate along rows:\n", concat_rows)

# Concatenate along axis 1 (columns)
concat_cols = np.concatenate((a, b), axis=1)
print("\nConcatenate along columns:\n", concat_cols)

# Stack: row-wise and column-wise
stacked_row = np.vstack((a, b))
stacked_col = np.hstack((a, b))
print("\nRow-wise stack:\n", stacked_row)
print("\nColumn-wise stack:\n", stacked_col)

Concatenate along rows:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Concatenate along columns:
 [[1 2 5 6]
 [3 4 7 8]]

Row-wise stack:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Column-wise stack:
 [[1 2 5 6]
 [3 4 7 8]]


In [31]:
# Splitting Arrays

arr = np.arange(1, 13).reshape(3, 4)
print("Original array:\n", arr)

# Split into 2 sub-arrays horizontally (columns)
split_h = np.hsplit(arr, 2)
print("\nSplit horizontally (columns):")
for sub in split_h:
    print(sub)

# Split into 3 sub-arrays vertically (rows)
split_v = np.vsplit(arr, 3)
print("\nSplit vertically (rows):")
for sub in split_v:
    print(sub)

Original array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Split horizontally (columns):
[[ 1  2]
 [ 5  6]
 [ 9 10]]
[[ 3  4]
 [ 7  8]
 [11 12]]

Split vertically (rows):
[[1 2 3 4]]
[[5 6 7 8]]
[[ 9 10 11 12]]


In [32]:
# Append, Repeat, and Sort

arr = np.array([3, 1, 4, 2])

# Append elements
arr_appended = np.append(arr, [5, 0])
print("Appended array:", arr_appended)

# Repeat elements
arr_repeated = np.repeat(arr, 2)
print("Repeated array:", arr_repeated)

# Sort
arr_sorted = np.sort(arr)
print("Sorted array:", arr_sorted)

Appended array: [3 1 4 2 5 0]
Repeated array: [3 3 1 1 4 4 2 2]
Sorted array: [1 2 3 4]


In [33]:
# Flip Arrays

arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Flip vertically
flip_up_down = np.flipud(arr_2d)
print("Flip up-down:\n", flip_up_down)

# Flip horizontally
flip_left_right = np.fliplr(arr_2d)
print("\nFlip left-right:\n", flip_left_right)

Flip up-down:
 [[7 8 9]
 [4 5 6]
 [1 2 3]]

Flip left-right:
 [[3 2 1]
 [6 5 4]
 [9 8 7]]


In [34]:
# Practical Example: Preparing Features for ML

# Suppose we have two feature vectors for 5 samples
feature_age = np.array([25, 30, 22, 35, 28])        # Age of 5 individuals
feature_income = np.array([50000, 60000, 45000, 80000, 52000])  # Income

# Each feature needs to be a column vector for ML models
age_col = feature_age.reshape(-1, 1)
income_col = feature_income.reshape(-1, 1)

# Stack features horizontally to create the design matrix X
X = np.hstack((age_col, income_col))
print("Design matrix X (samples × features):\n", X)

# Suppose we have target values
y = np.array([0, 1, 0, 1, 0])  # Binary labels

# Now X and y are ready for ML model training
print("\nTarget labels y:", y)

Design matrix X (samples × features):
 [[   25 50000]
 [   30 60000]
 [   22 45000]
 [   35 80000]
 [   28 52000]]

Target labels y: [0 1 0 1 0]


### Explanation for Students

In real-world ML workflows, arrays often represent datasets where **rows are samples** and **columns are features**.  

- `reshape` lets you convert between different shapes, for example, flattening a 2D image array into a 1D vector for a model.  
- `transpose` swaps rows and columns, which can be useful when aligning datasets for matrix operations.  
- `concatenate` and `stack` help combine multiple datasets, e.g., merging feature sets or appending new samples.  
- `split` allows dividing a dataset into training and testing subsets.  
- `flipud` and `fliplr` are helpful in image processing or data augmentation tasks.  

These operations are not just abstract—they are **practical tools** you will use repeatedly when preparing data for ML models.

## Exercises — NumPy Arrays

1. **Array Creation & Indexing**
   - Create a 1D array of integers from 10 to 50 with step 5.
   - Reshape it into a 2D array with 2 rows.
   - Access the second row and the last column.

2. **Boolean Indexing**
   - From the array above, select elements greater than 20 using Boolean indexing.

3. **Array Attributes**
   - For a 3×4 array of random floats, print its shape, number of dimensions, total size, item size, and total memory consumed.

4. **Basic Operations & ufuncs**
   - Create two arrays of the same shape. Perform element-wise addition, multiplication, and find the sine of one array.
   
5. **Basic Statistics**
   - For a 2D array of integers, compute the sum, mean, minimum, and maximum along each axis.

6. **Data Manipulation**
   - Create two small arrays and demonstrate `concatenate`, `vstack`, `hstack`, `split`, `append`, `repeat`, `flipud`, and `fliplr`.
   - Reshape a 3×4 array into 2×6 and then transpose it.

7. **Practical ML-oriented Exercise**
   - Suppose you have a dataset of 5 samples, each with 4 features (random integers 0–10).  
     - Normalize the dataset by subtracting the mean and dividing by the standard deviation for each feature.  
     - Split the dataset into two arrays: first 3 samples as training, last 2 as testing.

Next → **Section 4: Data Manipulation with `Pandas`**