# Creating Arrays from Lists (Advanced Practice)

This notebook contains **practice problems with solutions** about creating NumPy arrays from Python lists.

**What you'll practice**:
- dtype inference vs explicit dtype
- safe/unsafe casting
- integer overflow/wrap-around
- ragged (non-rectangular) lists
- shape/size reasoning
- memory footprint (`nbytes`)

Each problem:
- starts with a **task**
- includes a **solution**
- ends with a few **asserts** to self-check


In [1]:
import numpy as np

## Problem 1 — dtype inference and controlled casting

**Task**

You are given a Python list containing a mix of integers and floats:

```python
data = [0, 1, 2, 3.5, 4]
```

1) Create `a_infer` from the list using plain `np.array(data)`.

2) Create `a_i32` that stores values as **int32**, but do it in a way that makes the conversion intent clear.

3) Explain (in code comments) what happens to the `3.5`.


In [2]:
# Solution
data = [0, 1, 2, 3.5, 4]

# 1) dtype inference: because there's a float, NumPy upcasts to floating dtype
a_infer = np.array(data)

# 2) explicit casting: we can first create the array, then cast to make intent obvious
#    (as opposed to silently truncating by passing dtype=... directly)
a_i32 = np.array(data).astype(np.int32)

# 3) float -> int conversion truncates toward zero (3.5 becomes 3)

assert a_infer.dtype.kind == 'f'
assert a_infer.tolist() == [0.0, 1.0, 2.0, 3.5, 4.0]
assert a_i32.dtype == np.int32
assert a_i32.tolist() == [0, 1, 2, 3, 4]

a_infer, a_i32

(array([0. , 1. , 2. , 3.5, 4. ]), array([0, 1, 2, 3, 4], dtype=int32))

## Problem 2 — detecting overflow risk before choosing a dtype

**Task**

You receive sensor readings as Python integers:

```python
readings = [0, 10, 200, 255, 256, 300]
```

1) Create `u8_bad = np.array(readings, dtype=np.uint8)`.

2) Create `u16_ok = np.array(readings, dtype=np.uint16)`.

3) Write a function `min_unsigned_dtype(values)` that returns the *smallest* unsigned integer dtype among
`np.uint8`, `np.uint16`, `np.uint32`, `np.uint64` that can represent **all** values.

4) Use it to create `u_best`.


In [3]:
# Solution
readings = [0, 10, 200, 255, 256, 300]

u8_bad = np.array(readings, dtype=np.uint8)
u16_ok = np.array(readings, dtype=np.uint16)

def min_unsigned_dtype(values):
    """Return the smallest unsigned integer dtype that can represent all non-negative integers in values."""
    vals = list(values)
    if len(vals) == 0:
        # empty input: choose a reasonable default
        return np.uint8
    if any(v < 0 for v in vals):
        raise ValueError("All values must be non-negative for an unsigned dtype.")
    vmax = max(vals)
    for dt in (np.uint8, np.uint16, np.uint32, np.uint64):
        info = np.iinfo(dt)
        if vmax <= info.max:
            return dt
    # In practice, Python ints can exceed uint64, but we stop here by design
    raise OverflowError("Value exceeds uint64 range.")

best_dt = min_unsigned_dtype(readings)
u_best = np.array(readings, dtype=best_dt)

# Checks
# uint8 wraps around: 256 -> 0, 300 -> 44
assert u8_bad.tolist() == [0, 10, 200, 255, 0, 44]
assert u16_ok.tolist() == readings
assert best_dt == np.uint16
assert u_best.dtype == np.uint16
assert u_best.tolist() == readings

u8_bad, u16_ok, u_best

OverflowError: Python integer 256 out of bounds for uint8

## Problem 3 — ragged lists: detecting and fixing

**Task**

You are given a ragged list (rows have different lengths):

```python
ragged = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
```

1) Try `np.array(ragged)` and observe the dtype.

2) Write a function `pad_ragged(rows, pad_value=0)` that returns a **rectangular** 2D `ndarray`
by padding shorter rows on the right.

3) Use it to create `rect` and ensure `rect.shape == (3, 4)`.


In [None]:
# Solution
ragged = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]

arr_try = np.array(ragged)
# For ragged input, NumPy typically falls back to dtype=object (array of Python lists)

def pad_ragged(rows, pad_value=0, dtype=None):
    """Pad ragged list-of-lists into a rectangular 2D NumPy array."""
    rows = [list(r) for r in rows]
    if len(rows) == 0:
        return np.array([[]], dtype=dtype if dtype is not None else float)[:0, :0]
    max_len = max(len(r) for r in rows)
    padded = [r + [pad_value] * (max_len - len(r)) for r in rows]
    return np.array(padded, dtype=dtype)

rect = pad_ragged(ragged, pad_value=0, dtype=np.int64)

assert arr_try.dtype == object or arr_try.ndim == 1
assert rect.shape == (3, 4)
assert rect.tolist() == [
    [1, 2, 3, 0],
    [4, 5, 0, 0],
    [6, 7, 8, 9],
]

arr_try, rect

## Problem 4 — shape/size reasoning from nested lists

**Task**

You are given a nested Python list:

```python
grid = [[i + 10*j for i in range(5)] for j in range(4)]
```

1) Create a 2D array `G` from `grid`.

2) Without hardcoding numbers, compute:
- number of rows
- number of columns
- total number of elements

3) Verify using `G.shape` and `G.size`.


In [4]:
# Solution
grid = [[i + 10*j for i in range(5)] for j in range(4)]
G = np.array(grid)

n_rows = len(grid)
n_cols = len(grid[0]) if n_rows > 0 else 0
n_total = n_rows * n_cols

assert G.shape == (n_rows, n_cols)
assert G.size == n_total
assert G.shape == (4, 5)
assert G[0].tolist() == [0, 1, 2, 3, 4]
assert G[1].tolist() == [10, 11, 12, 13, 14]

(G, (n_rows, n_cols, n_total))

(array([[ 0,  1,  2,  3,  4],
        [10, 11, 12, 13, 14],
        [20, 21, 22, 23, 24],
        [30, 31, 32, 33, 34]]),
 (4, 5, 20))

## Problem 5 — safe conversion from strings with validation

**Task**

You receive numeric data as strings (possibly with whitespace):

```python
raw = [" 1", "2 ", "003", "-4", "5"]
```

1) Convert to an `int32` array `x`.

2) Write a function `to_int_array(strings, dtype=np.int32)` that:
- strips whitespace
- raises a `ValueError` if any element is not a valid integer literal

3) Demonstrate it with:
```python
bad = ["1", "two", "3"]
```
(You should see an exception.)


In [5]:
# Solution
raw = [" 1", "2 ", "003", "-4", "5"]

def to_int_array(strings, dtype=np.int32):
    cleaned = [s.strip() for s in strings]
    # validate using Python int parsing, then convert to numpy
    vals = []
    for s in cleaned:
        try:
            vals.append(int(s, 10))
        except Exception as e:
            raise ValueError(f"Invalid integer value: {s!r}") from e
    return np.array(vals, dtype=dtype)

x = to_int_array(raw, dtype=np.int32)

assert x.dtype == np.int32
assert x.tolist() == [1, 2, 3, -4, 5]

# Demonstration (should raise)
bad = ["1", "two", "3"]
try:
    _ = to_int_array(bad)
    raise AssertionError("Expected ValueError for invalid input, but no error was raised.")
except ValueError as e:
    err_msg = str(e)

x, err_msg

(array([ 1,  2,  3, -4,  5], dtype=int32), "Invalid integer value: 'two'")

## Problem 6 — memory footprint and choosing a dtype

**Task**

You have a list of small non-negative integers:

```python
vals = list(range(1000))
```

1) Create `a64 = np.array(vals)` (default dtype).

2) Create `a16 = np.array(vals, dtype=np.uint16)`.

3) Compare memory usage using `.nbytes` and compute how many bytes are saved.

4) Add an assert that confirms all values are preserved in `a16`.


In [6]:
# Solution
vals = list(range(1000))

a64 = np.array(vals)
a16 = np.array(vals, dtype=np.uint16)

bytes_64 = a64.nbytes
bytes_16 = a16.nbytes
saved = bytes_64 - bytes_16

assert a16.tolist() == vals
assert saved > 0
assert np.max(a16) <= np.iinfo(np.uint16).max

bytes_64, bytes_16, saved

(8000, 2000, 6000)

## Problem 7 — enforcing rectangular input (fail fast)

**Task**

Write a function `as_matrix(rows, dtype=None)` that:
- accepts a list-of-lists
- checks that all rows have the same length
- raises `ValueError` if not rectangular
- returns a 2D NumPy array otherwise

Test it on:

```python
ok = [[1,2,3],[4,5,6]]
bad = [[1,2,3],[4,5]]
```


In [7]:
# Solution
def as_matrix(rows, dtype=None):
    rows = [list(r) for r in rows]
    if len(rows) == 0:
        return np.array([[]], dtype=dtype if dtype is not None else float)[:0, :0]
    n = len(rows[0])
    for idx, r in enumerate(rows):
        if len(r) != n:
            raise ValueError(f"Non-rectangular input: row 0 has length {n}, row {idx} has length {len(r)}")
    return np.array(rows, dtype=dtype)

ok = [[1, 2, 3], [4, 5, 6]]
bad = [[1, 2, 3], [4, 5]]

M = as_matrix(ok, dtype=np.int16)
assert M.shape == (2, 3)
assert M.dtype == np.int16
assert M.tolist() == ok

try:
    _ = as_matrix(bad)
    raise AssertionError("Expected ValueError for ragged input, but no error was raised.")
except ValueError as e:
    bad_msg = str(e)

M, bad_msg

(array([[1, 2, 3],
        [4, 5, 6]], dtype=int16),
 'Non-rectangular input: row 0 has length 3, row 1 has length 2')

## Quick recap

- NumPy arrays are **homogeneous**: mixed numeric input usually upcasts (e.g., ints + floats -> float).
- Small integer dtypes can **overflow/wrap** (especially unsigned).
- Ragged lists often become `dtype=object`; pad or validate to get a real 2D array.
- Use `.shape`, `.size`, and `.nbytes` to reason about structure and memory.
