# **Hour 1: Foundations of NumPy**

Welcome to Hour 1 of our NumPy workshop! In this notebook, you’ll learn:
- How to create NumPy arrays
- How to inspect array attributes (shape, type, size)
- How to index and slice arrays
- How arrays can represent structured datasets

Let’s begin!

## This notebook was prepared by Mr. Mrinal Das, and this session was delivered by him.
Mr Mrinal Das, Sr. Technical Assistant, Department of MCA, Siliguri Intitute of Technology


email id:mrinal.subscribe@gmail.com


Contact No: +91 76797 74508


### **Creating NumPy Arrays**
We'll create arrays from Python lists and with built-in NumPy functions.

In [2]:
import numpy as np

# Creating a NumPy array from a Python list
list_data = [1, 2, 3, 4, 5]
array_from_list = np.array(list_data)  # Converts list to ndarray
print("Array from list:", array_from_list)

# Creating a 1D array
array_1d = np.array([10, 20, 30])
print("1D Array:", array_1d)

# Creating a 2D array (matrix-like)
array_2d = np.array([[1, 2, 3], [4, 5, 6]])  # 2 rows, 3 columns
print("2D Array:\n", array_2d)

# Create evenly spaced integers with arange (like Python range)
array_range = np.arange(0, 10, 2)  # From 0 to 8 with step size of 2
print("Range Array:", array_range)

# Create evenly spaced floats using linspace
array_linspace = np.linspace(0, 1, 5)  # 5 values from 0 to 1 (inclusive)
print("Linspace Array:", array_linspace)

# Create a 2x3 array of all zeros
zeros_array = np.zeros((2, 3))
print("Zeros Array:\n", zeros_array)

# Create a 3x2 array of all ones
ones_array = np.ones((3, 2))
print("Ones Array:\n", ones_array)

# Create a 2x2 array filled with value 7
full_array = np.full((2, 2), 7)
print("Full Array:\n", full_array)

Array from list: [1 2 3 4 5]
1D Array: [10 20 30]
2D Array:
 [[1 2 3]
 [4 5 6]]
Range Array: [0 2 4 6 8]
Linspace Array: [0.   0.25 0.5  0.75 1.  ]
Zeros Array:
 [[0. 0. 0.]
 [0. 0. 0.]]
Ones Array:
 [[1. 1.]
 [1. 1.]
 [1. 1.]]
Full Array:
 [[7 7]
 [7 7]]


- `np.array()` turns a list into a NumPy array.
- `np.arange(start, stop, step)` creates ranges of integers.
- `np.linspace(start, stop, num)` gives evenly spaced float values.
- `np.zeros()`, `np.ones()`, and `np.full()` are useful for initializing fixed-shape arrays.

### **Exploring Array Attributes**
We'll now inspect the shape, size, data type, and dimensions of arrays.

In [None]:
# Define a 2D array (2 rows, 3 columns)
sample_array = np.array([[1, 2, 3], [4, 5, 6]])

print("Sample Array:\n", sample_array)

# Shape: tuple of (rows, columns)
print("Shape:", sample_array.shape)

# ndim: number of dimensions (axes). 1D, 2D, etc.
print("Number of Dimensions:", sample_array.ndim)

# dtype: data type of the elements (int32, float64, etc.)
print("Data Type:", sample_array.dtype)

# size: total number of elements in the array
print("Total Elements:", sample_array.size)

Sample Array:
 [[1 2 3]
 [4 5 6]]
Shape: (2, 3)
Number of Dimensions: 2
Data Type: int64
Total Elements: 6


- Use `.shape` to understand how data is structured (rows x columns).
- `.ndim` helps identify if it's a flat, tabular, or higher-dimensional array.
- `.dtype` matters for memory and performance.
- `.size` is total number of elements (rows × columns).

### **Representing Structured Data**
Let's build a matrix representing sales for 3 products across 3 months.


In [None]:
# Each row = a product; each column = a month (Jan to Mar)
sales_data = np.array([
    [250, 300, 400],  # Product A
    [150, 200, 250],  # Product B
    [100, 120, 130]   # Product C
])
print("Sales Data:\n", sales_data)

Sales Data:
 [[250 300 400]
 [150 200 250]
 [100 120 130]]


- Row 0: Product A’s sales in Jan, Feb, Mar.
- Columns = months, Rows = products.
- This layout mimics real-world data structures like spreadsheets.

### **Indexing and Slicing Arrays**
Extract elements, rows, columns, and blocks using slice notation.

In [None]:
# Element at row 0, column 1 (Product A in Feb)
print("Element at [0,1]:", sales_data[0, 1])

# Entire first row (Product A)
print("First row:", sales_data[0])

# Last column (March for all products)
print("Last column:", sales_data[:, -1])

# Subarray: First 2 rows and first 2 columns
print("Block (first 2 rows, first 2 columns):\n", sales_data[:2, :2])

# Reverse the order of rows (Products C, B, A)
print("Reversed Rows:\n", sales_data[::-1])

# Every other column (Jan and Mar)
print("Every other column:\n", sales_data[:, ::2])

Element at [0,1]: 300
First row: [250 300 400]
Last column: [400 250 130]
Block (first 2 rows, first 2 columns):
 [[250 300]
 [150 200]]
Reversed Rows:
 [[100 120 130]
 [150 200 250]
 [250 300 400]]
Every other column:
 [[250 400]
 [150 250]
 [100 130]]


- `arr[i, j]` fetches a single item.
- `arr[i]` selects a row; `arr[:, j]` selects a column.
- `arr[a:b, c:d]` slices out a block.
- `[::-1]` and `::2` control stepping and reversing.

### **Exercise – Time-Series Array**
Simulate a time-series-like matrix and practice slicing rows and columns.

In [None]:
# Create a 3x4 array with values 1–12
time_series = np.arange(1, 13).reshape(3, 4)

print("Time-Series Data:\n", time_series)

# First column (Q1 for all entities)
print("Q1 (first column):", time_series[:, 0])

# Last 2 columns (e.g., Q3 and Q4)
print("Last 2 months:\n", time_series[:, -2:])

# Reverse the order of entities (rows)
print("Reversed rows:\n", time_series[::-1])

Time-Series Data:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Q1 (first column): [1 5 9]
Last 2 months:
 [[ 3  4]
 [ 7  8]
 [11 12]]
Reversed rows:
 [[ 9 10 11 12]
 [ 5  6  7  8]
 [ 1  2  3  4]]


- Simulates data for 3 items over 4 time periods.
- Slicing columns lets you get specific time ranges.
- Reversing rows is helpful for ranking or order-based analysis.

### **Summary: Hour 1 Highlights**

You learned how to:
- Create arrays using lists and functions like `arange()` and `zeros()`
- Use attributes like `.shape`, `.ndim`, `.dtype`, and `.size`
- Extract data via indexing and slicing
- Conceptualize arrays for real-world applications like sales or time-series data

Up next: Arithmetic, aggregations, and conditional logic in **Hour 2**

# **Hour 2: Numerical Operations & Conditional Logic**

In this session, you will learn how to:
- Perform arithmetic and element-wise operations
- Use aggregation functions like sum, mean, std
- Apply Boolean logic and filters to arrays
- Handle missing values (NaN) in numerical data

### **Arithmetic Operations**

NumPy supports fast, element-wise arithmetic between arrays or between an array and a scalar.


In [None]:
import numpy as np

# Define two simple arrays of the same shape
arr1 = np.array([10, 20, 30, 40])
arr2 = np.array([1, 2, 3, 4])

# Perform basic arithmetic operations
print("Addition:", arr1 + arr2)          # [11 22 33 44]
print("Subtraction:", arr1 - arr2)       # [ 9 18 27 36]
print("Multiplication:", arr1 * arr2)    # [ 10  40  90 160]
print("Division:", arr1 / arr2)          # [10. 10. 10. 10.]

# Array and scalar operation
print("Add 100 to arr1:", arr1 + 100)    # [110 120 130 140]

Addition: [11 22 33 44]
Subtraction: [ 9 18 27 36]
Multiplication: [ 10  40  90 160]
Division: [10. 10. 10. 10.]
Add 100 to arr1: [110 120 130 140]


- Arithmetic operations are applied element-by-element.
- Arrays must have the same shape or be broadcast-compatible.
- Scalars are automatically broadcasted across the array.

### **Aggregation Functions**

Use NumPy functions to summarize arrays using metrics like sum, mean, standard deviation.


In [None]:
# 2D array example
matrix = np.array([
    [5, 10, 15],
    [20, 25, 30]
])

# Full-array aggregations
print("Total Sum:", np.sum(matrix))
print("Mean Value:", np.mean(matrix))
print("Standard Deviation:", np.std(matrix))

# Axis-wise aggregations
print("Sum by column (axis=0):", np.sum(matrix, axis=0))
print("Mean by row (axis=1):", np.mean(matrix, axis=1))

Total Sum: 105
Mean Value: 17.5
Standard Deviation: 8.539125638299666
Sum by column (axis=0): [25 35 45]
Mean by row (axis=1): [10. 25.]


- `axis=0` → down columns (vertically)
- `axis=1` → across rows (horizontally)
- Aggregation functions also include `min()`, `max()`, `median()`, etc.

### **Boolean Masking and Filtering**
Boolean conditions allow you to create masks and use them to filter or transform arrays.

In [None]:
scores = np.array([60, 75, 85, 90, 45])

# Create a boolean mask for scores >= 70
pass_mask = scores >= 70
print("Pass mask:", pass_mask)

# Filter values using the mask
print("Passing scores:", scores[pass_mask])

# Combine conditions: select scores between 70 and 90 (inclusive)
print("Scores between 70 and 90:", scores[(scores >= 70) & (scores <= 90)])

Pass mask: [False  True  True  True False]
Passing scores: [75 85 90]
Scores between 70 and 90: [75 85 90]


- `scores >= 70` returns a Boolean array (True/False per element).
- Use this mask to index and filter the original array.
- Combine multiple conditions using `&`, `|`, and `~`.

### **Conditional Logic with np.where()**

Use `np.where()` to apply if/else logic across arrays.

In [None]:
# Use np.where to assign labels
labels = np.where(scores >= 70, "Pass", "Fail")
print("Labels for scores:", labels)

Labels for scores: ['Fail' 'Pass' 'Pass' 'Pass' 'Fail']


- `np.where(condition, value_if_true, value_if_false)` is vectorized if-else logic.
- Can return strings, numbers, or even other arrays.

### **Handling Missing Values (NaNs)**

NumPy provides special functions to deal with NaN (not-a-number) values in arrays.

In [None]:
# Create an array with NaN values
data_with_nan = np.array([3.0, np.nan, 7.0, np.nan, 5.0])

# Detect NaN positions
print("NaN mask:", np.isnan(data_with_nan))

# Aggregate safely while ignoring NaNs
print("Mean ignoring NaNs:", np.nanmean(data_with_nan))

# Replace NaNs with 0
filled_data = np.nan_to_num(data_with_nan, nan=0.0)
print("Replaced NaNs with 0:", filled_data)

NaN mask: [False  True False  True False]
Mean ignoring NaNs: 5.0
Replaced NaNs with 0: [3. 0. 7. 0. 5.]


- `np.isnan()` identifies missing values.
- `np.nanmean()`, `np.nansum()`, etc., compute results while ignoring NaNs.
- `np.nan_to_num()` replaces NaNs with specified values.

### **Summary: Hour 2 Key Concepts**

- Arrays support fast vectorized arithmetic and logical operations.
- Use aggregation functions to compute statistics across arrays or dimensions.
- Boolean masks are powerful for filtering data conditionally.
- `np.where()` implements element-wise if-else logic.
- Handle missing values carefully using NaN-aware NumPy functions.

# **Hour 3: Reshaping, Combining and Sorting Data**

In this session, you'll learn to:
- Reshape and flatten arrays
- Combine and split arrays
- Use advanced indexing
- Sort arrays and retrieve orderings

### **Reshaping and Flattening Arrays**

You can change the shape of arrays without changing their data using `reshape()`, `flatten()`, and `ravel()`.


In [None]:
import numpy as np

# Create a 1D array of 12 elements
arr = np.arange(12)
print("Original array:", arr)

# Reshape into 3x4 (3 rows, 4 columns)
reshaped = arr.reshape(3, 4)
print("Reshaped (3x4):\n", reshaped)

# Flatten using ravel (view) and flatten (copy)
raveled = reshaped.ravel()
flattened = reshaped.flatten()

print("Raveled (1D view):", raveled)
print("Flattened (1D copy):", flattened)

Original array: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped (3x4):
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Raveled (1D view): [ 0  1  2  3  4  5  6  7  8  9 10 11]
Flattened (1D copy): [ 0  1  2  3  4  5  6  7  8  9 10 11]


- `reshape()` changes the shape but not the data.
- `ravel()` returns a view (changes affect original).
- `flatten()` returns a copy (changes don’t affect original).

### **Combining Arrays**

You can combine arrays horizontally or vertically using stacking functions.

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Horizontal stack (along columns)
h_stack = np.hstack((a, b))
print("Horizontal Stack:\n", h_stack)

# Vertical stack (along rows)
v_stack = np.vstack((a, b))
print("Vertical Stack:\n", v_stack)

# Generic concatenate: axis=0 (rows), axis=1 (columns)
concat_axis0 = np.concatenate((a, b), axis=0)
concat_axis1 = np.concatenate((a, b), axis=1)
print("Concatenate axis=0:\n", concat_axis0)
print("Concatenate axis=1:\n", concat_axis1)

Horizontal Stack:
 [[1 2 5 6]
 [3 4 7 8]]
Vertical Stack:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
Concatenate axis=0:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
Concatenate axis=1:
 [[1 2 5 6]
 [3 4 7 8]]


- `hstack()` appends columns, `vstack()` appends rows.
- `concatenate()` gives more control with axis.
- All arrays must match shape along the axis you're *not* stacking.

### **Splitting Arrays**

Use splitting functions to divide arrays into parts.

In [None]:
# Create a 2D array
arr = np.arange(16).reshape(4, 4)
print("Original Array:\n", arr)

# Split horizontally into two 2x4 arrays
split_vert = np.vsplit(arr, 2)
print("Vertical Split (2x4 each):")
for s in split_vert:
  print(s)

# Split vertically into two 4x2 arrays
split_horiz = np.hsplit(arr, 2)
print("Horizontal Split (4x2 each):")
for s in split_horiz:
  print(s)

Original Array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
Vertical Split (2x4 each):
[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]
Horizontal Split (4x2 each):
[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


- `vsplit()` splits rows, `hsplit()` splits columns.
- You must split into equal parts (or use `array_split()` to allow uneven splits).

### **Advanced Indexing**

Use arrays of indices (fancy indexing) to select or reorder specific rows/columns.

In [None]:
arr = np.array([
    [100, 101, 102],
    [110, 111, 112],
    [120, 121, 122],
    [130, 131, 132]
])

# Select rows 0, 2 and 3
selected_rows = arr[[0, 2, 3]]
print("Selected rows 0, 2, 3:\n", selected_rows)

# Reorder columns
reordered_cols = arr[:, [2, 0, 1]]
print("Reordered columns (2, 0, 1):\n", reordered_cols)

Selected rows 0, 2, 3:
 [[100 101 102]
 [120 121 122]
 [130 131 132]]
Reordered columns (2, 0, 1):
 [[102 100 101]
 [112 110 111]
 [122 120 121]
 [132 130 131]]


- Fancy indexing allows selection/reordering using integer arrays.
- Unlike slicing, this creates a new copy.

### **Sorting and Ranking**

Use `np.sort()` to sort values and `np.argsort()` to get the index order.

In [None]:
data = np.array([45, 12, 89, 33])

# Sorted values
sorted_data = np.sort(data)
print("Sorted Data:", sorted_data)

# Indices that would sort the array
sort_order = np.argsort(data)
print("Sort Order Indices:", sort_order)

# Use sort_order to reorder the array manually
print("Data sorted using indices:", data[sort_order])

Sorted Data: [12 33 45 89]
Sort Order Indices: [1 3 0 2]
Data sorted using indices: [12 33 45 89]


- `np.sort()` returns a sorted copy.
- `np.argsort()` returns the index positions for sorting.
- You can use `argsort` to sort multiple arrays in parallel.

### **Summary: Hour 3 Key Concepts**

- Use `reshape()`, `flatten()`, `ravel()` to change array shape
- Combine arrays with `hstack()`, `vstack()`, `concatenate()`
- Split arrays with `hsplit()` and `vsplit()`
- Fancy indexing allows advanced row/column selection
- `sort()` and `argsort()` enable ordering and ranking

# **Hour 4: Deriving Insights & Data Processing Workflows**

In this session, you will learn how to:
- Compute derived metrics (e.g., growth rates, ratios)
- Analyze relationships using correlation and matrix ops
- Sample and randomize data
- Save and load arrays for reuse


### **Derived Metrics**

Perform per-row or per-column calculations like percent change or normalization.

In [5]:
import numpy as np

# Simulated sales data (rows = products, columns = months)
sales = np.array([
    [200, 220, 250, 300],  # Product A
    [150, 160, 170, 190],  # Product B
    [300, 330, 360, 390]   # Product C
])

# Row-wise sum (total sales per product)
row_sum = np.sum(sales, axis=0)
print("Total sales per product:", row_sum)

# Column-wise mean (average sales in each month)
col_mean = np.mean(sales, axis=0)
print("Average monthly sales:", col_mean)

# Percent change across months
# (current - previous) / previous
pct_change = (sales[:, 1:] - sales[:, :-1]) / sales[:, :-1]
print("Percent change:\n", pct_change)

Total sales per product: [650 710 780 880]
Average monthly sales: [216.66666667 236.66666667 260.         293.33333333]
Percent change:
 [[0.1        0.13636364 0.2       ]
 [0.06666667 0.0625     0.11764706]
 [0.1        0.09090909 0.08333333]]


- Use `axis=1` for row-level analysis, `axis=0` for column-level.
- Percent change is useful to identify trends over time.
- Shape must align for subtraction (e.g., columns must match).

### **Correlation and Matrix Operations**

Analyze relationships between variables using correlation coefficients and matrix math.

In [4]:
# Correlation matrix: variables in columns
variables = np.array([
    [1, 2, 3],
    [2, 4, 6],
    [1, 5, 7]
])

# Compute correlation coefficients
correlation_matrix = np.corrcoef(variables, rowvar=False)
print("Correlation matrix:\n", correlation_matrix)

# Dot product (e.g., sales @ weights for scoring)
weights = np.array([0.2, 0.3, 0.5])
scores = np.dot(sales[:, :3], weights)
print("Weighted score (first 3 months):", scores)

Correlation matrix:
 [[1.         0.18898224 0.2773501 ]
 [0.18898224 1.         0.99587059]
 [0.2773501  0.99587059 1.        ]]


NameError: name 'sales' is not defined

- `np.corrcoef()` shows linear relationships (1.0 = perfect positive).
- `np.dot()` or `@` is used for weighted sums or projections.

### **Sampling and Randomization**

Generate synthetic or randomized data for testing or bootstrapping.

In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Random integers
random_ints = np.random.randint(1, 100, size=(3, 4))
print("Random integers:\n", random_ints)

# Normally distributed values (mean=0, std=1)
normal_samples = np.random.normal(loc=0, scale=1, size=5)
print("Normal samples:", normal_samples)

# Random choice from list
options = ['A', 'B', 'C']
choice_sample = np.random.choice(options, size=5, replace=True)
print("Random choice sample:", choice_sample)

Random integers:
 [[52 93 15 72]
 [61 21 83 87]
 [75 75 88 24]]
Normal samples: [-0.4826188   0.16416482  0.23309524  0.11799461  1.46237812]
Random choice sample: ['A' 'A' 'A' 'C' 'C']


- Use `np.random.seed()` for consistent output across runs.
- `randint()` and `normal()` are useful for mock data or simulation.
- `choice()` picks random elements from a list or array.

### **Saving and Loading Arrays**

Persist your processed arrays to disk and load them later for reuse.

In [3]:
# Save to binary .npy format
np.save("sales_data.npy", sales)

# Load from file
loaded_sales = np.load("sales_data.npy")
print("Loaded sales array:\n", loaded_sales)

# Save/load to/from text format
np.savetxt("sales_summary.txt", sales, fmt="%d")
loaded_txt = np.loadtxt("sales_summary.txt", dtype=int)
print("Loaded text data:\n", loaded_txt)

NameError: name 'sales' is not defined

- `np.save()` and `np.load()` work with NumPy’s efficient binary format.
- `np.savetxt()` and `np.loadtxt()` handle human-readable text.
- Use these to build reusable pipelines.

### **Mini Project**

Choose or simulate a dataset (e.g., daily COVID cases, sales, weather). Do the following:

- Compute row- or column-wise summaries
- Calculate relative or percent changes
- Apply `np.where()` to flag values (e.g., anomalies)
- Compute correlations or weighted scores
- Save the result to disk for reuse

### **Summary: Hour 4 Key Concepts**

- Apply transformations like percent change and weighted scores
- Use `corrcoef()` to find relationships
- Use `dot()` for matrix operations
- Randomize or generate synthetic data for simulation
- Save processed results with `save()` or `savetxt()`