# Level 9: Structured & Record Arrays (Advanced)

While standard `ndarray`s require all elements to be of the same data type, NumPy also provides **structured arrays** (also known as record arrays). These are arrays where each element can be a C-style struct, with different named fields and data types. This allows you to store heterogeneous data, much like a table in a database or a row in a spreadsheet.

> ⚠️ **When to use them?** Structured arrays can be useful for interfacing with C code or for storing mixed-type data in a compact way. However, for most data analysis tasks, **Pandas DataFrames are far more powerful, flexible, and user-friendly.** This topic is included for completeness, but in practice, you'll almost always reach for Pandas for this kind of data.

In [1]:
import numpy as np

## 9.1 Structured Arrays

You create a structured array by defining a special `dtype`. The `dtype` is a list of tuples, where each tuple specifies a field: `('field_name', 'data_type')`.

### Data Type Codes
- `'i4'` or `'<i4'`: 4-byte integer (32-bit)
- `'f8'` or `'<f8'`: 8-byte float (64-bit)
- `'U10'`: 10-character Unicode string
- `'S10'`: 10-byte ASCII string

In [2]:
# Define the dtype for our structured array
my_dtype = [('name', 'U10'), ('age', 'i4'), ('score', 'f8')]

# Create the array
data = [
    ('Alice', 25, 88.5),
    ('Bob', 30, 92.0),
    ('Charlie', 28, 78.5)
]
structured_arr = np.array(data, dtype=my_dtype)

print(structured_arr)

[('Alice', 25, 88.5) ('Bob', 30, 92. ) ('Charlie', 28, 78.5)]


### Accessing Data

In [3]:
# Access a specific element (row)
print("First element:", structured_arr[0])

First element: ('Alice', 25, 88.5)


In [4]:
# Access a specific field (column) by its name
print("\nAll names:", structured_arr['name'])


All names: ['Alice' 'Bob' 'Charlie']


In [5]:
# Access a specific field of a specific element
print("\nAge of the second person:", structured_arr[1]['age'])


Age of the second person: 30


## 9.2 Record Arrays

Record arrays are very similar to structured arrays, but they have a key difference: you can access fields as attributes (like `arr.name`) instead of just as dictionary keys (`arr['name']`).

In [6]:
# Create a record array from the same data
record_arr = np.rec.array(data, dtype=my_dtype)
print(record_arr)

[('Alice', 25, 88.5) ('Bob', 30, 92. ) ('Charlie', 28, 78.5)]


In [7]:
# Access the 'name' field using attribute access
print("\nAll names:", record_arr.name)


All names: ['Alice' 'Bob' 'Charlie']


In [8]:
# Access a specific element's field
print("\nScore of the third person:", record_arr[2].score)


Score of the third person: 78.5


### Why Pandas is Usually Better
Let's briefly see how you would handle this in Pandas to understand its advantages.

In [9]:
import pandas as pd

df = pd.DataFrame(data, columns=['name', 'age', 'score'])
df

Unnamed: 0,name,age,score
0,Alice,25,88.5
1,Bob,30,92.0
2,Charlie,28,78.5


With Pandas, you get:
- A much richer API for data manipulation (`.groupby()`, `.merge()`, etc.).
- Better handling of missing data.
- More flexible indexing.
- Integration with the entire data science ecosystem.

While structured arrays are a neat feature of NumPy, for tabular data, it's best to use them as a stepping stone to a Pandas DataFrame.