While often our data can be well represented by a homogeneous array of values, sometimes this is not the case. This section demonstrates the use of NumPy's structured arrays and record arrays, which provide efficient storage for compound, heterogeneous data

In [2]:
import numpy as np

Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program. It would be possible to store these in three separate arrays:

In [6]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

In [8]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Here 'U10' translates to "Unicode string of maximum length 10," 'i4' translates to "4-byte (i.e., 32 bit) integer," and 'f8' translates to "8-byte (i.e., 64 bit) float.


now that we've created an empty container array, we can fill our array with list of values

In [12]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [18]:
data[0].dtype

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [20]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

using boolean masking: 

In [23]:
data[data['age'] < 30]['name']

array(['Alice', 'Doug'], dtype='<U10')

In [27]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])  #can also be used to create datatype as set of tuples

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

NumPy also provides the np.recarray class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys. Recall that we previously accessed the ages by writing:

In [30]:
data['age']

array([25, 45, 37, 19])

In [32]:
data_rec=data.view(np.recarray)

In [34]:
data_rec.age

array([25, 45, 37, 19])

but the downside is that there is some extra overhead