While often our data can be well **represented** by **a homogeneous array of values**, sometimes this is not the case. This section **demonstrates** the use of **NumPy's** **structured arrays** and **record arrays**, which provide efficient storage for compound, heterogeneous data. 

In [1]:
import numpy as np

Imagine that we have **several categories** of data on a number of people (say, name, age, and weight), and we'd like to **store these values** for use in a Python program. 

In [2]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

But this is a bit **clumsy**. There's nothing here that tells us that the **three arrays are related**; it would be **more natural** if we could use a single structure to store all of this data.

Recall that previously we created **a simple array** using an expression like this:

In [3]:
x = np.zeros(4, dtype=int)

We can similarly create a structured array using a **compound data type specification**:

In [4]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


Now that we've created **an empty container array**, we can fill the array with our lists of values:

In [5]:
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25,  55. ) ('Bob', 45,  85.5) ('Cathy', 37,  68. )
 ('Doug', 19,  61.5)]


As we had hoped, the data is now **arranged together** in one convenient block of memory.

The handy thing with **structured arrays** is that you can now refer to values **either by index** or **by name**:

In [6]:
# Get all names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'],
      dtype='<U10')

In [7]:
# Get first row of data
data[0]

('Alice', 25,  55.)

In [9]:
# Get the name from the last row
data[-1]['name']

'Doug'

Using **Boolean masking**, this even allows you to do some more **sophisticated operations** such as filtering on age:

In [10]:
# Get names where age is under 30
data[data['age'] < 30]['name']

array(['Alice', 'Doug'],
      dtype='<U10')

## 1. Creating Structured Arrays

For clarity, **numerical types** can be **specified** using **Python types** or **NumPy dtypes instead**:

In [11]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

A **compound type** can also be **specified** as **a list of tuples**:

In [13]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

If **the names of the types** do not **matter** to you, you can **specify** the **types** alone in a comma-separated string:

In [14]:
np.dtype('S10,i4,f8')

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])