# Structured Data: NumPy's Structured Arrays

This section demonstrates the use of NumPy's structured arrays and record arrays, which provide efficient storage for compound, heterogeneous data. 

In [3]:
import numpy as np

name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

In [5]:
x = np.zeros(4, dtype=int)
x

array([0, 0, 0, 0])

In [7]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Here `'U10'` translates to "Unicode string of maximum length 10," `'i4'` translates to "4-byte (i.e., 32 bit) integer," and `'f8'` translates to "8-byte (i.e., 64 bit) float." We'll discuss other options for these type codes in the following section.

In [8]:
# Now we can fill array with our lists of values:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [10]:
# Get all names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [11]:
# Get first row of data
data[0]

('Alice', 25, 55.)

In [12]:
# Get the name from the last row
data[-1]['name']

'Doug'

In [16]:
# Using boolean masking
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')

## Creating Structured Arrays

Structured array data types can be specified in a number of ways. Earlier, we saw the dictionary method:

In [19]:
# Python way
np.dtype({'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [20]:
# Numpy way
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

In [21]:
# Using tuples
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

In [22]:
np.dtype('S10,i4,f8')

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

## More Advanced Compound Types

It is possible to define even more advanced compound types. For example, you can create a type where each element contains an array or matrix of values. Here, we'll create a data type with a mat component consisting of a 3×3 floating-point matrix:

In [24]:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
X = np.zeros(1, dtype=tp)
print(X[0])
print(X['mat'][0])

(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


## RecordArrays: Structured Arrays

NumPy also provides the `np.recarray` class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.

In [25]:
data_rec = data.view(np.recarray)
data_rec.age

array([25, 45, 37, 19])

In [26]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

108 ns ± 2.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3.06 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.87 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
