# Creating Arrays from Scratch (Advanced)

This notebook contains **advanced** practice problems **with solutions** on creating NumPy arrays from scratch.

Topics covered:
- `zeros`, `ones`, `full`, `eye`
- `arange`, `linspace` (and common pitfalls)
- dtype control and memory-aware choices
- random arrays with reproducibility (`Generator`)
- building structured arrays without Python loops

Best practices used:
- deterministic randomness
- explicit dtypes when relevant
- small self-checks (`assert`) to verify correctness


In [1]:
import numpy as np

In [2]:
# Reproducible randomness (best practice)
rng = np.random.default_rng(12345)

def check(name, arr, shape=None, dtype=None):
    """Small helper to print array summary and optionally validate shape/dtype."""
    print(f"{name}: shape={arr.shape}, dtype={arr.dtype}, min={arr.min() if arr.size else 'n/a'}, max={arr.max() if arr.size else 'n/a'}")
    if shape is not None:
        assert arr.shape == shape, f"Expected shape {shape}, got {arr.shape}"
    if dtype is not None:
        assert arr.dtype == np.dtype(dtype), f"Expected dtype {np.dtype(dtype)}, got {arr.dtype}"

## Problem 1 — Block matrix from scratch (zeros/ones/full)

Create the following **6×6** matrix `A` using only array-creation functions and stacking/concatenation:

- Top-left 3×3 block: all zeros
- Top-right 3×3 block: all ones
- Bottom-left 3×3 block: all twos
- Bottom-right 3×3 block: all threes

**Requirements**:
- Use `dtype=np.uint8`
- Do not fill values by indexing element-by-element


In [3]:
# Solution
tl = np.zeros((3, 3), dtype=np.uint8)
tr = np.ones((3, 3), dtype=np.uint8)
bl = np.full((3, 3), 2, dtype=np.uint8)
br = np.full((3, 3), 3, dtype=np.uint8)

top = np.hstack([tl, tr])
bottom = np.hstack([bl, br])
A = np.vstack([top, bottom])

check("A", A, shape=(6, 6), dtype=np.uint8)
A

A: shape=(6, 6), dtype=uint8, min=0, max=3


array([[0, 0, 0, 1, 1, 1],
       [0, 0, 0, 1, 1, 1],
       [0, 0, 0, 1, 1, 1],
       [2, 2, 2, 3, 3, 3],
       [2, 2, 2, 3, 3, 3],
       [2, 2, 2, 3, 3, 3]], dtype=uint8)

## Problem 2 — Custom diagonal pattern (eye + scaling)

Create a **10×10** integer matrix `D` such that:
- Main diagonal is `5`
- First superdiagonal (diagonal above main) is `-1`
- First subdiagonal (diagonal below main) is `-1`
- All other entries are `0`

This is a classic tridiagonal matrix.

**Requirements**:
- Use `np.eye` (possibly multiple times)
- Use `dtype=int`
- Avoid explicit Python loops


In [4]:
# Solution
n = 10
D = (
    5 * np.eye(n, dtype=int)
    - 1 * np.eye(n, k=1, dtype=int)
    - 1 * np.eye(n, k=-1, dtype=int)
)

check("D", D, shape=(10, 10), dtype=int)

# Self-checks
assert np.all(np.diag(D) == 5)
assert np.all(np.diag(D, 1) == -1)
assert np.all(np.diag(D, -1) == -1)
assert np.count_nonzero(D) == (n + (n - 1) + (n - 1))

D

D: shape=(10, 10), dtype=int64, min=-1, max=5


array([[ 5, -1,  0,  0,  0,  0,  0,  0,  0,  0],
       [-1,  5, -1,  0,  0,  0,  0,  0,  0,  0],
       [ 0, -1,  5, -1,  0,  0,  0,  0,  0,  0],
       [ 0,  0, -1,  5, -1,  0,  0,  0,  0,  0],
       [ 0,  0,  0, -1,  5, -1,  0,  0,  0,  0],
       [ 0,  0,  0,  0, -1,  5, -1,  0,  0,  0],
       [ 0,  0,  0,  0,  0, -1,  5, -1,  0,  0],
       [ 0,  0,  0,  0,  0,  0, -1,  5, -1,  0],
       [ 0,  0,  0,  0,  0,  0,  0, -1,  5, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  5]])

## Problem 3 — `arange` vs `linspace` correctness

You want values from **0 to 1 inclusive** with **step = 0.1**.

1) Create `x1` using `np.arange` in the most reasonable way.
2) Create `x2` using `np.linspace` so that it is guaranteed to include both endpoints.
3) Verify:
- `x2[0] == 0` and `x2[-1] == 1`
- `len(x2) == 11`

Hint: floating steps with `arange` can be tricky due to floating-point representation.

In [5]:
# Solution
x1 = np.arange(0.0, 1.0 + 1e-12, 0.1)  # small epsilon to reduce endpoint surprises
x2 = np.linspace(0.0, 1.0, num=11)

check("x1", x1)
check("x2", x2, shape=(11,), dtype=float)

assert x2[0] == 0.0
assert x2[-1] == 1.0
assert len(x2) == 11

x1, x2

x1: shape=(11,), dtype=float64, min=0.0, max=1.0
x2: shape=(11,), dtype=float64, min=0.0, max=1.0


(array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),
 array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]))

## Problem 4 — Memory-aware initialization (dtype choices)

You need a **400×300** array `img` to store grayscale image pixels in the range **0..255**.

1) Create it initialized to zeros using the **most memory-efficient reasonable dtype**.
2) Show its `nbytes`.
3) Create the same-shaped array as float64 zeros and compare `nbytes`.

**Requirement**: use `np.zeros` and explicit `dtype`.

In [6]:
# Solution
img = np.zeros((400, 300), dtype=np.uint8)
img_f64 = np.zeros((400, 300), dtype=np.float64)

check("img", img, shape=(400, 300), dtype=np.uint8)
check("img_f64", img_f64, shape=(400, 300), dtype=np.float64)

print("img.nbytes:", img.nbytes)
print("img_f64.nbytes:", img_f64.nbytes)
print("float64 is", img_f64.nbytes // img.nbytes, "x larger in this case")

img: shape=(400, 300), dtype=uint8, min=0, max=0
img_f64: shape=(400, 300), dtype=float64, min=0.0, max=0.0
img.nbytes: 120000
img_f64.nbytes: 960000
float64 is 8 x larger in this case


## Problem 5 — Build a coordinate grid (linspace + broadcasting)

Create 2D coordinate arrays `X` and `Y` that represent a square grid over **[-1, 1] × [-1, 1]** with **101 points per axis**.

Then create `R2 = X**2 + Y**2` (squared radius), which should be shape `(101, 101)`.

**Requirements**:
- Use `np.linspace`
- Use broadcasting (avoid Python loops)
- Do NOT use `np.meshgrid` for this exercise (practice broadcasting)


In [7]:
# Solution
axis = np.linspace(-1.0, 1.0, num=101)
X = axis[None, :]          # shape (1, 101)
Y = axis[:, None]          # shape (101, 1)
R2 = X**2 + Y**2           # broadcasts to (101, 101)

check("axis", axis, shape=(101,), dtype=float)
check("X", X, shape=(1, 101))
check("Y", Y, shape=(101, 1))
check("R2", R2, shape=(101, 101))

# Quick sanity checks
assert np.isclose(R2[50, 50], 0.0)  # center should be ~0
assert np.isclose(R2[0, 0], 2.0)    # (-1,-1): 1^2 + 1^2 = 2

R2[:5, :5]

axis: shape=(101,), dtype=float64, min=-1.0, max=1.0
X: shape=(1, 101), dtype=float64, min=-1.0, max=1.0
Y: shape=(101, 1), dtype=float64, min=-1.0, max=1.0
R2: shape=(101, 101), dtype=float64, min=0.0, max=2.0


array([[2.    , 1.9604, 1.9216, 1.8836, 1.8464],
       [1.9604, 1.9208, 1.882 , 1.844 , 1.8068],
       [1.9216, 1.882 , 1.8432, 1.8052, 1.768 ],
       [1.8836, 1.844 , 1.8052, 1.7672, 1.73  ],
       [1.8464, 1.8068, 1.768 , 1.73  , 1.6928]])

## Problem 6 — Random but reproducible simulation (Generator)

Simulate rolling **3 fair dice** for **20,000 trials**.

1) Create an integer array `rolls` of shape `(20000, 3)` with values in `{1,2,3,4,5,6}`.
2) Compute an array `sums` of shape `(20000,)` with the sum per trial.
3) Estimate `P(sum >= 15)`.

**Requirements**:
- Use `rng.integers` (not legacy `np.random.randint`)
- Use vectorized operations (no Python loops)
- Keep dtype as small as reasonable (hint: `np.uint8` works here)


In [8]:
# Solution
rolls = rng.integers(1, 7, size=(20000, 3), dtype=np.uint8)
sums = rolls.sum(axis=1, dtype=np.uint16)  # safe accumulator to avoid overflow
p = (sums >= 15).mean()

check("rolls", rolls, shape=(20000, 3), dtype=np.uint8)
check("sums", sums, shape=(20000,), dtype=np.uint16)
print("Estimated P(sum >= 15):", p)

# Sanity checks
assert rolls.min() >= 1 and rolls.max() <= 6
assert sums.min() >= 3 and sums.max() <= 18

rolls[:5], sums[:5]

rolls: shape=(20000, 3), dtype=uint8, min=1, max=6
sums: shape=(20000,), dtype=uint16, min=3, max=18
Estimated P(sum >= 15): 0.08865


(array([[4, 5, 6],
        [5, 4, 5],
        [2, 2, 6],
        [5, 6, 5],
        [1, 1, 1]], dtype=uint8),
 array([15, 14, 10, 16,  3], dtype=uint16))

## Problem 7 — Create a structured pattern without loops (arange + reshape)

Create a **7×7** integer array `B` where numbers increase left-to-right, top-to-bottom:

```
0   1   2  ...  6
7   8   9  ... 13
...             ...
42 43  44 ... 48
```

Then create `C` which is `B` but with every element multiplied by 10.

**Requirements**:
- Use `np.arange` and `reshape`
- No Python loops


In [9]:
# Solution
B = np.arange(49, dtype=int).reshape(7, 7)
C = B * 10

check("B", B, shape=(7, 7), dtype=int)
check("C", C, shape=(7, 7), dtype=int)

assert B[0, 0] == 0 and B[0, -1] == 6
assert B[1, 0] == 7
assert B[-1, -1] == 48
assert np.array_equal(C, B * 10)

B

B: shape=(7, 7), dtype=int64, min=0, max=48
C: shape=(7, 7), dtype=int64, min=0, max=480


array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39, 40, 41],
       [42, 43, 44, 45, 46, 47, 48]])

## Problem 8 — Safer initialization for later assignment (full with sentinel)

You will later fill a `(12, 8)` table with measurements. For now, you want a sentinel value that clearly indicates "missing".

1) Create `M` filled with `-1` using an integer dtype.
2) Set the first row to `0..7`.
3) Set the first column to `0..11`.

**Requirements**:
- Use `np.full` for initialization
- Use slicing for assignment
- No Python loops


In [10]:
# Solution
M = np.full((12, 8), -1, dtype=int)
M[0, :] = np.arange(8, dtype=int)
M[:, 0] = np.arange(12, dtype=int)

check("M", M, shape=(12, 8), dtype=int)

# Self-checks
assert np.array_equal(M[0, :], np.arange(8))
assert np.array_equal(M[:, 0], np.arange(12))

M

M: shape=(12, 8), dtype=int64, min=-1, max=11


array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 1, -1, -1, -1, -1, -1, -1, -1],
       [ 2, -1, -1, -1, -1, -1, -1, -1],
       [ 3, -1, -1, -1, -1, -1, -1, -1],
       [ 4, -1, -1, -1, -1, -1, -1, -1],
       [ 5, -1, -1, -1, -1, -1, -1, -1],
       [ 6, -1, -1, -1, -1, -1, -1, -1],
       [ 7, -1, -1, -1, -1, -1, -1, -1],
       [ 8, -1, -1, -1, -1, -1, -1, -1],
       [ 9, -1, -1, -1, -1, -1, -1, -1],
       [10, -1, -1, -1, -1, -1, -1, -1],
       [11, -1, -1, -1, -1, -1, -1, -1]])

## Wrap-up

You practiced creating arrays from scratch using:
- deterministic initialization (`zeros`, `ones`, `full`, `eye`)
- range-based construction (`arange`, `linspace`)
- broadcasting for grid construction
- reproducible randomness with `Generator`
- dtype choices for memory and correctness

If you want, I can generate a "challenge" version (harder constraints, more edge cases) using the same topic.