<a href="https://colab.research.google.com/github/MatykoHUN/halozat/blob/main/Numpy_Essentials.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy Essentials



[NumPy](https://numpy.org/) (Numerical Python) is the cornerstone library for numerical and scientific computing in Python. Its core is the `ndarray` object, a highly efficient, multi-dimensional array that supports a vast range of mathematical operations. Unlike Python lists, NumPy arrays are homogenous (all elements are of the same data type), which allows for significant performance gains through vectorized operations.

Key features of NumPy include:

- **Powerful N-dimensional arrays:** The `ndarray` is a fast and flexible container for large datasets.
- **Broadcasting:** A mechanism for applying ufuncs (universal functions) to arrays of different shapes, enabling efficient element-wise operations.
- **Mathematical functions:** A comprehensive collection of mathematical functions (ufuncs) for fast operations on arrays.
- **Linear algebra, Fourier transforms, and random number capabilities:** Built-in modules for common scientific tasks.
- **Integration with other libraries:** Seamlessly integrates with other Python libraries for scientific computing like SciPy, Pandas, and Matplotlib.

NumPy's efficiency and extensive functionality make it indispensable for data analysis, machine learning, simulations, and many other scientific and engineering disciplines.

In [None]:
# Imports & display prefs
import numpy as np
np.set_printoptions(precision=3, suppress=True)
print("NumPy version:", np.__version__)

## 1) Array Creation
Creating arrays is the foundation of numerical computing. NumPy offers constructors from Python data, shape-based initializers, and range/sampling utilities.

### Constructors & Initializers
- **`np.array(obj, dtype=None, copy=False)`** → builds an array from Python lists/tuples. If `dtype` is omitted, NumPy infers it.  
- **`np.asarray(obj, dtype=None)`** → similar to `array`, but avoids copying when possible (returns a **view** into the original buffer if compatible).  
- **`np.zeros(shape, dtype=float)`**, **`np.ones(shape, dtype=None)`**, **`np.full(shape, fill_value)`** → allocate arrays filled with constants.  
- **`np.eye(N, M=None, k=0)`** → identity (or offset diagonal) matrix of shape `(N, M or N)`.

In [None]:
# Constructors and initializers
a = np.array([1, 2, 3], dtype=np.int16)
b = np.asarray([1.0, 2.0, 3.0])
Z = np.zeros((2, 3))
O = np.ones((2, 3), dtype=int)
F = np.full((2, 2), 7)
I = np.eye(3)

print("a:", a, a.dtype, a.shape)
print("b:", b, b.dtype)
print(f"Z: {Z.shape}\n", Z)
print("O int:\n", O)
print("F:\n", F)
print("I:\n", I)

### Ranges & Spacing
- **`np.arange(start, stop, step, dtype=None)`** → half-open range `[start, stop)`. Beware float step rounding.  
- **`np.linspace(start, stop, num=50, endpoint=True)`** → evenly spaced points, inclusive by default (useful for plotting).  
- **`np.geomspace(start, stop, num=50)`** → geometric progression (strictly positive ranges).

In [None]:
r1 = np.arange(0, 10, 2)        # 0,2,4,6,8
r2 = np.linspace(0, 1, 5)       # 0., 0.25, 0.5, 0.75, 1.
r3 = np.geomspace(1, 1000, 4)   # 1, 10, 100, 1000
print("arange:", r1)
print("linspace:", r2)
print("geomspace:", r3)

### Exercises — Array Creation
1. Create a 1D array of integers from 5 to 10 (inclusive).
2. Create a 5x5 identity matrix.
3. Create a 3x10 array filled with pi (3.14...).

## 2) Data Types (dtypes)
Dtypes control **precision, memory**, and sometimes operation semantics. Converting between types may **copy** data and may lose precision.

### Inspecting & Casting
- **`arr.dtype`** gives the current dtype. Common: `float64`, `float32`, `int64`, `bool`.  
- **`arr.astype(new_dtype, copy=True)`** returns a **copy** with a new dtype. Casting from float to int truncates toward zero.  
- Mixed types in `np.array([...])` may upcast (e.g., ints + floats → float).

In [None]:
x = np.array([1.5, 2.9, -3.2], dtype=np.float64)
print("x:", x, "x dtype:", x.dtype)
xi = x.astype(np.int16)     # truncates towards zero
print("xi:", xi, "xi dtype:", xi.dtype)
xb = x.astype(np.bool_)     # nonzero -> True
print("xb:", xb, "xb dtype:", xb.dtype)

y = np.array([1, 2, 3, 4], dtype=np.int8)
z = y.astype(np.float32)
print("y dtype:", y.dtype, "-> z dtype:", z.dtype)

### Upcasting & Type Inference
- When combining arrays of different dtypes, NumPy uses **type promotion** (upcasting) to avoid loss of information.  
- Example: `int` with `float` → result becomes `float`.

In [None]:
a = np.array([1, 2, 3], dtype=np.int16)
b = np.array([0.1, 0.2, 0.3], dtype=np.float32)
c = a + b
print("a dtype:", a.dtype, "| b dtype:", b.dtype, "| c dtype:", c.dtype, "c:", c)

### Exercises — Data Types
1. Create a float array and cast it to `int16`. Explain which values change and **why**.
2. Given two arrays `int32` and `float32`, add them and confirm the result dtype.

## 3) Views and Reshaping
Reshaping changes the **view** of the underlying memory layout without copying when possible. **Slicing** typically returns views; **fancy indexing** returns copies.

### Views vs Copies
- **Views:** slicing (`arr[a:b, c:d]`), `reshape` (when compatible), `ravel()` (view if contiguous).  
- **Copies:** `flatten()`, `astype(...)`, fancy/integer indexing, and operations forcing reallocation.  
- To check: `view.base is arr` implies shared memory.

In [None]:
x = np.arange(12)
M = x.reshape(3, 4)     # likely a view
rv = M.ravel()          # view if contiguous
fl = M.flatten()        # guaranteed copy

print("M shares memory with x?", M.base is x)
print("ravel shares memory with M?", rv.base is M)
print("flatten shares memory with M?", fl.base is None)

### Reshaping & Adding Axes
- **`reshape(new_shape)`** keeps order; `-1` lets NumPy infer one dimension.  
- **`np.newaxis` / `None`** to add singleton dimensions for broadcasting (e.g., `(N,) → (N,1)` or `(1,N)`).

In [None]:
v = np.arange(6)               # shape (6,)
A = v.reshape(2, 3)            # (2,3)
row = v[np.newaxis, :]         # (1,6)
col = v[:, np.newaxis]         # (6,1)
print("A shape:", A.shape, "| row:", row.shape, "| col:", col.shape)

### Exercises — Views & Reshaping
1. Create a `6x6` array with values `0..35` and extract the **center `4x4`** as a **view**.  
2. Show that modifying a slice changes the original array (prove with `.base`).  

## 4) Indexing and Slicing
Indexing defines how you select or filter elements. **Basic slicing** returns views; **fancy indexing** and **boolean masks** return copies.

### Basic Slices (Views)
- Syntax: `arr[start:stop:step]` per axis.  
- Multidimensional: `arr[r0:r1, c0:c1]`.  
- Useful patterns: take columns/rows, submatrices, strides with steps > 1.

In [None]:
A = np.arange(1, 13).reshape(3, 4)
sub = A[:2, 1:3]      # rows 0..1, cols 1..2 (view)
sub[0, 0] = -999      # modifies A as well
print("A after edit via sub:\n", A)

### Fancy Indexing (Copies) & Boolean Masks
- **Fancy indexing:** `arr[[i0, i1], [j0, j1]]` picks arbitrary positions → **copy**.  
- **Boolean masks:** `mask = (arr % 2 == 0)`; `arr[mask]` filters elements → **copy**.  
- Combine masks with `&`, `|`, `~` (be careful with parentheses).

In [None]:
B = np.arange(1, 13).reshape(3, 4)
print("B:\n", B)
fancy = B[[0, 2], [1, 3]]   # (0,1) and (2,3)
mask  = (B % 2 == 0)
print("mask:\n", mask)
evens = B[mask]
print("fancy:", fancy)
print("first evens:", evens)

### Exercises — Indexing & Slicing
1. From a `6x6` array, select the **checkerboard** pattern where `(i + j) % 2 == 0`.  
2. Using fancy indexing, extract the **main diagonal** of a square matrix.  
3. Use a boolean mask to set all values greater than the **row mean** to that mean (in-place if possible).

## 5) Element-wise Operations
Vectorized arithmetic and comparisons operate element-by-element and support broadcasting. **Ufuncs** (universal functions) are optimized C loops.

### Arithmetic, Comparisons, Selection
- Operators `+ - * / // % **` are element-wise.  
- Comparisons `< <= > >= == !=` produce boolean arrays.  
- `np.where(cond, x, y)` selects element-wise between `x` and `y` based on `cond`.

In [None]:
x = np.array([0., 1., 2., 3.])
y = np.array([1., 2., 3., 4.])
print("x + y:", x + y)
print("x > y:", x > y)

ratio  = y / (x + 1)     # avoid division by zero
print("ratio:", ratio)

gt2 = y > 2
print("gt2:", gt2)

sel = np.where(gt2, y, -1)
print("selected:", sel)

### Ufuncs & Advanced Parameters
- Common ufuncs: `np.sin`, `np.cos`, `np.exp`, `np.sqrt`, `np.log`.  
- Parameters:  
  - `where=` mask elements to compute.  
  - `out=` write results into a preallocated buffer (reduces allocations).  
  - broadcasting works naturally across shapes.

In [None]:
t = np.linspace(0, 2*np.pi, 7)
print("t:", t)

sig = np.sin(t) + 0.5*np.cos(2*t)
print("sig:", sig)

mask = sig > 0
buf = np.empty_like(sig)
np.sqrt(np.abs(sig), out=buf, where=mask)   # sqrt only where non-negative
print("sqrt(abs(sig)) where sig>=0:", buf)

### Exercises: Element-wise Operations


In [None]:
# Instructions:
# 1. Create a 3x3 NumPy array, A, containing the integers from 1 to 9.
A = np.array([1])

# 2. Create a 3x3 NumPy array, B, containing only the number 10 in every position.
B = np.array([1])

# 3. Calculate the element-wise sum of A and B, storing the result in a variable named C.
# (Your code for C here)

# 4. Calculate the element-wise product of A and B, storing the result in a variable named D.
# (Your code for D here)

# Print results
print("Array A:\n", A)
print("Array B:\n", B)
# NOTE: The variables C and D must be defined above for the code below to run!
# print("\nElement-wise Sum:\n", C)
# print("\nElement-wise Product:\n", D)

In [None]:
# Instructions:
# 1. Create a 1D NumPy array, X, of size 10, containing the numbers 1 through 10.
X = np.array([1])

# 2. Multiply every element in X by 3.
# (Your code for Y here)

# 3. Apply a conditional cap to the array Y using np.where:
#    If an element's value is greater than 15, set its value to 15; otherwise, keep the original value.
#    Store the result in Z.
# (Your code for Z using np.where here)

# 4. Calculate the square of every element in Z (element-wise operation), storing the result in W.
# (Your code for W here)

# Print results
print("Original Array (X):\n", X)
# NOTE: The variables Y, Z, and W must be defined above for the code below to run!
# print("\nScaled by 3:\n", Y)
# print("\nConditionally Capped (Z):\n", Z)
# print("\nSquared Array:\n", W)

## 6) Aggregations
Reductions summarize data across axes. NaN-aware functions ignore missing values.

### Reductions & Axis
- `sum`, `mean`, `std`, `var`, `min`, `max`, `argmin`, `argmax`.  
- The `axis` argument controls direction; `keepdims=True` preserves dims for broadcasting.  
- `np.nan*` variants (`nanmean`, `nanstd`, ...) ignore NaNs.

In [None]:
C = np.arange(1, 13, dtype=float).reshape(3, 4)
C[1, 2] = np.nan

total = np.nansum(C)
row_mean = np.nanmean(C, axis=1, keepdims=True)
col_max  = np.nanmax(C, axis=0)
argmax_flat = np.nanargmax(C)

print("C:\n", C)
print("total:", total)
print("row_mean:\n", row_mean)
print("col_max:", col_max)
print("argmax_flat:", argmax_flat)

### Percentiles & Quantiles
- `np.percentile(a, q, axis=None)` or `np.quantile(a, q, axis=None)` for distribution summaries.  
- With NaNs: `np.nanpercentile` / `np.nanquantile`.

In [None]:
D = np.random.default_rng(0).normal(0, 1, size=(1000,))
p95 = np.percentile(D, 95)
q = np.quantile(D, [0.25, 0.5, 0.75])
print("p95:", p95)
print("quartiles:", q)

### Exercises — Aggregations

In [None]:
# Instructions:
# 1. You have the following array.
X = np.array([[10, 20, np.nan, 40],
              [1, 5, 2, np.nan],
              [100, 50, 20, 10]])

# 2. Compute the median of each row in X, ensuring you ignore any NaN values.
#    Store the result in a 1D array named 'row_medians'. (Hint: Use a specific np.nan-ignoring function).
# (Your code for row_medians here)

# 3. Print the original array X and the calculated row_medians.
print("Original Array (X):\n", X)
# print("\nRow Medians (Ignoring NaNs):\n", row_medians)

In [None]:
import numpy as np

# Instructions:
# 1. Create a 4x5 NumPy array, M, containing random integers between 0 and 100,
#    using the np.random.randint() function.
M = np.random.randint(0, 101, size=(4, 5))

# 2. Find the **index** (column number) of the **maximum element** for **each row** of M.
#    Store the resulting 1D array of indices in a variable named 'max_indices'.
#    (Hint: Use np.argmax with the correct axis).
# (Your code for max_indices here)
# max_indices = np.argmax(M, axis=1)

# 3. Print the original array M and the resulting max_indices.
# (Your print statements here)
# print("Original Array (M):\n", M)
# print("\nIndex of Max Element per Row:\n", max_indices)

## 7) Sorting and Uniques
Sorting reorders elements; unique functions find distinct values and counts.

### Sorting
- `np.sort(a, axis=-1)` → sorted **copy**.  
- `a.sort(axis=-1)` → **in-place** sort.  
- `np.argsort(a, axis=-1)` → indices that would sort `a`.  
- For partial sorting (top‑k), consider `np.argpartition` for efficiency.

In [None]:
d = np.array([5, 2, 5, 3, 1, 2])
sorted_d = np.sort(d)
idx = np.argsort(d)
top3_idx_unsorted = np.argpartition(d, -3)[-3:]  # positions of top 3 (unsorted)
print("d:", d)
print("sorted:", sorted_d)
print("argsort idx:", idx)
print("top3 idx (unsorted):", top3_idx_unsorted, "values:", d[top3_idx_unsorted])

### Unique Values & Counts
- `np.unique(a, return_counts=True, return_index=False, return_inverse=False)` returns sorted uniques (and optional counts).  
- For **first-occurrence order preservation**, combine with `np.unique(..., return_index=True)` and sort by indices.

In [None]:
u, counts = np.unique(d, return_counts=True)
vals_in_first_occurrence_order = u[np.argsort(np.unique(d, return_index=True)[1])]
print("unique:", u, "counts:", counts)
print("first-occurrence order:", vals_in_first_occurrence_order)

## 8) Random Sampling
Use the modern `numpy.random.Generator` for reproducible randomness.

### Creating a Generator & Core Methods
- Create with **`rng = np.random.default_rng(seed)`** for reproducibility.  
- **`rng.random(shape)`** → uniform `[0, 1)`.  
- **`rng.integers(low, high, size, endpoint=False)`** → integers.  
- **`rng.normal(loc, scale, size)`** → Gaussian samples.  
- **`rng.choice(population, size, replace=True, p=None)`** → categorical sampling with optional probabilities.

In [None]:
rng = np.random.default_rng(42)
u = rng.random((2, 3))
ints = rng.integers(10, 20, size=5)
z = rng.normal(0, 1, size=6)
choices = rng.choice(np.array(['TCP','UDP','ICMP']), size=12, p=[0.7, 0.25, 0.05])

print("u:\n", u)
print("ints:", ints)
print("z:", z)
print("choices:", choices)

### Reproducibility & State
- Fixing a **seed** ensures the same random sequence across runs (given the same NumPy version).  
- Create **separate generators** for independent streams (e.g., training vs evaluation).  
- Avoid the legacy global API (`np.random.*`) in new code.

In [None]:
rng1 = np.random.default_rng(123)
rng2 = np.random.default_rng(123)
print("Same sequence?", np.allclose(rng1.random(5), rng2.random(5)))

### Exercises — Random Sampling
1. Generate `1000` samples `N(100, 15^2)` and compute their **mean** and **std**; compare to target.  
2. Using `rng.choice`, simulate selecting protocols for **10,000 flows** with probs `[0.6, 0.35, 0.05]` and report counts.  
3. Demonstrate reproducibility by fixing a seed and re-running; show equality with `np.allclose`.