### NumPy Structured Arrays

In [1]:
import numpy as np

Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program.
It would be possible to store these in three separate arrays:

In [4]:
names = ['Alice', 'Bob', 'Cathy', 'Doug']
ages = [25, 45, 37, 19]
weights = [55.0, 85.5, 68.0, 61.5]

__It would be more natural if we could use a single structure to store all of this data__. NumPy can handle this through structured arrays, which are arrays with compound data types.

In [5]:
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
data.dtype

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

``'U10'`` translates to "Unicode string of maximum length 10," ``'i4'`` translates to "4-byte (i.e., 32 bit) integer," and ``'f8'`` translates to "8-byte (i.e., 64 bit) float."

Now that we've created an empty container array, we can fill the array with our lists of values:

In [7]:
data['name'] = names
data['age'] = ages
data['weight'] = weights
data

array([('Alice', 25,  55. ), ('Bob', 45,  85.5), ('Cathy', 37,  68. ),
       ('Doug', 19,  61.5)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

Now you can refer to values either by index or by name:

In [8]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'],
      dtype='<U10')

In [9]:
data[0]

('Alice', 25,  55.)

In [10]:
data[-1]['name']

'Doug'

Boolean masking allows you to do some more sophisticated operations such as filtering on age:

In [11]:
data[data['age'] < 30]['name']

array(['Alice', 'Doug'],
      dtype='<U10')

Consider using the Pandas ``Dataframe`` object for any use cases more complicated than this.

### Creating Structured Arrays

Using the dictionary method:

In [13]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

Numerical types can be specified using Python types or NumPy ``dtype``s:

In [12]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

A compound type can also be specified as a list of tuples:

In [13]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

If the names of the types do not matter to you, you can specify the types alone in a comma-separated string:

In [14]:
np.dtype('S10,i4,f8')

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

### String format codes, explained

The first (optional) character is ``<`` or ``>``, which means "little endian" or "big endian," respectively, and specifies the ordering convention for significant bits.

The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below).
The last character or characters represents the size of the object in bytes.

| Character        | Description           | Example                             |
| ---------        | -----------           | -------                             | 
| ``'b'``          | Byte                  | ``np.dtype('b')``                   |
| ``'i'``          | Signed integer        | ``np.dtype('i4') == np.int32``      |
| ``'u'``          | Unsigned integer      | ``np.dtype('u1') == np.uint8``      |
| ``'f'``          | Floating point        | ``np.dtype('f8') == np.int64``      |
| ``'c'``          | Complex floating point| ``np.dtype('c16') == np.complex128``|
| ``'S'``, ``'a'`` | String                | ``np.dtype('S5')``                  |
| ``'U'``          | Unicode string        | ``np.dtype('U') == np.str_``        |
| ``'V'``          | Raw data (void)       | ``np.dtype('V') == np.void``        |

### Advanced Compound Types

For example, create a data type with a ``mat`` component consisting of a $3\times 3$ floating-point matrix:

In [15]:
tp = np.dtype([
    ('id', 'i8'), 
    ('mat', 'f8', (3, 3))])

X = np.zeros(1, dtype=tp)
X[0]

(0, [[ 0.,  0.,  0.], [ 0.,  0.,  0.], [ 0.,  0.,  0.]])

In [16]:
X['mat'][0]

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Why would you use this instead of  a multidimensional array, or perhaps a Python dictionary?

This NumPy ``dtype`` directly maps onto a C structure definition, so __the buffer containing the array content can be accessed directly within an appropriately written C program.__

If you find yourself writing a Python interface to a legacy C or Fortran library that manipulates structured data, you'll probably find structured arrays quite useful!

### RecordArrays: Structured Arrays with a Twist

NumPy also provides the ``np.recarray`` class, which is similar to structured arrays with one additional feature: __fields can be accessed as attributes rather than as dictionary keys.__

Recall that we previously accessed the ages by writing:

In [18]:
data['age']

array([25, 45, 37, 19], dtype=int32)

If we view our data as a record array instead, we can access this with slightly fewer keystrokes:

In [19]:
data_rec = data.view(np.recarray)
data_rec.age

array([25, 45, 37, 19], dtype=int32)

Record arrays introduce some extra overhead:

In [20]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

113 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
3.54 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.65 µs ± 18.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
