# 02.09 - Structured Data: Structured Arrays

**Structured arrays** and **record arrays** provide efficient storage for compound, heterogeneous data in NumPy.

Although usually Pandas' <code>dataframe</code> lends itself better to these type of use cases, it is nevertheless useful to know how to use this feature. 

In [1]:
import numpy as np

In [2]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

Having related data in separate arrays is not ideal. With <code>structured arrays</code>, we can handle this situation: 

In [3]:
# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


<code>U10</code> = Unicode less than 10 characters  
<code>i4</code> = Integer 4-byte (32 bit)  
<code>f8</code> = Float 8-byte (64 bit)

In [4]:
# fill the array with our values
data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


We can now refer to values either by index or by name:

In [5]:
# Get all names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [6]:
# Get first row of data
data[0]

('Alice', 25, 55.)

In [7]:
# Get the name from the last row
data[-1]['name']

'Doug'

We can also use Boolean masking for more sophisticated filtering:

In [9]:
# Get names where age is under 40
data[data['age'] < 40]['name']

array(['Alice', 'Cathy', 'Doug'], dtype='<U10')

### Creating Structured Arrays

Structured array data types can be specified in a number of ways. Earlier, we saw the dictionary method:

In [10]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

We can also specify data types:

In [11]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

Also using a list of tuples:

In [12]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

#### Other data types

<pre>
'b' 	Byte 	          np.dtype('b')
'i' 	Signed integer 	np.dtype('i4') == np.int32
'u' 	Unsigned integer   np.dtype('u1') == np.uint8
'f' 	Floating point 	np.dtype('f8') == np.int64
'c' 	Complex float 	 np.dtype('c16') == np.complex128
'S','a' String 	        np.dtype('S5')
'U' 	Unicode string 	np.dtype('U') == np.str_
'V' 	Raw data (void)    np.dtype('V') == np.void
</pre>

### RecordArrays: Structured Arrays with a Twist

<code>np.recarray</code> are similar to <code>structured arrays</code> with one additional feature: fields can be accessed as attributes rather than as dictionary keys.

In [13]:
data_rec = data.view(np.recarray)
data_rec.age

array([25, 45, 37, 19])

However, there is a little overhead involved in this, making the computation slightly slower:

In [14]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

264 ns ± 60.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
7.42 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
9.57 µs ± 701 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
