In [1]:
import numpy as np

Remember that numpy arrays are used to store only homogenous data. For example if we have an array `arr = np.array([1,2,3,4])`, the `dtype` is `int`. `arr` cannot store any other data type in it.

But now, with structured arrays, we have the ability to store different kinds of data in one array. Let us say for example that we have different categories of data like name, age and weight that we want to store. This can be done in three different arrays:

In [2]:
name = ['Alice', 'Bob','Cathy','Doug']
age = [25,45,37,19]
weight = [55.0,85.5,68.0,61.5]

But this is a bit clumsy and in no wise suggests that the three lists are related. For example, we might have it that "Alice" is 25 years old and 55.0 kgs. But the lists are independent of each other and are not linked. Thus, it would be well to have one structure to store and represent all this data. This is where `structured arrays` come in.

Remember that we can create a simple array as follows 

`x = np.zeros(4, dtype=int)`

We can similarly create a structured array using a compound data type specification as follows:

In [3]:
data = np.zeros(4, dtype = {'names':('name','age','weight'),
                            'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


here, `'U10'` means a "Unicode string of maximum lengths 10", `i4` means "4-byte(32 bit) integer" , and `'f8'` translates to "8-byte (64 bit) float"

Since we have created an empty container array, we can now populate it with the list of values

In [4]:
data['name']=name
data['age']=age
data['weight']=weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [5]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [6]:
data[0]

('Alice', 25, 55.)

In [7]:
data[-1]['name']

'Doug'

In [8]:
data[data['age']<30]['name']

array(['Alice', 'Doug'], dtype='<U10')

# Creating Structured Arrays
Data types for structured arrays can be defined in a number of ways. We have seen the dictionary method earlier.

In [9]:
np.dtype({'names':('name','age','weight'),
          'formats':('U10', 'i4', 'f8')})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

\- Using python NumPy `dtypes` or Python types

In [10]:
np.dtype({'names':('name','age', 'weight'),
          'formats':((np.str_,10),int,np.float32)})

dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

\- Using a list of tuples

In [11]:
np.dtype([('name', 'S10'),('age','i4'),('weight','f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

# More Advanced Compound Types
below we have an example that has a type where each element contains an array or matrix of values. `mat` is used to declare a matrix and the shape is 3x3

In [12]:
tp = np.dtype([('id','i8'),('mat', 'f8', (3,3))])
X = np.zeros(1,dtype=tp)
print(X[0])
print(X['mat'][0])

(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


# RecordArrays: Structured Arrays with a Twist

In [13]:
data['age']

array([25, 45, 37, 19], dtype=int32)

In [14]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [15]:
data['weight']

array([55. , 85.5, 68. , 61.5])