# Structured Data: NumPy's Structured Arrays

In [1]:
import numpy as np


We have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program.
It would be possible to store these in three separate arrays:

In [2]:
name = ["Alice", "Bob", "Cathy", "Doug"]
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

Creating Structured Array

In [3]:
x = np.zeros(4, dtype=int)
x


array([0, 0, 0, 0])

We can similarly create a structured array using a compound data type specification:

In [7]:
# Use a compound data type for structured arrays
data = np.zeros(
    4, dtype={"names": ("name", "age", "weight"), "formats": ("U10", "i4", "f8")}
)
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]




```
# This is formatted as code
```

Here **``'U10'``** translates to **"Unicode string of maximum length 10,"** **``'i4'``** translates to "**4-byte (i.e., 32 bit) integer**," and ``'**f8**'`` translates to "**8-byte (i.e., 64 bit) float**."


Now that we've created an empty container array, we can fill the array with our lists of values:

In [8]:
data


array([('', 0, 0.), ('', 0, 0.), ('', 0, 0.), ('', 0, 0.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [9]:
data["name"] = name
data["age"] = age
data["weight"] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


One can now refer to values either by index or by name:

In [10]:
# Get all names
data["name"]

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [11]:
# Get first row of data
data[0]


('Alice', 25, 55.)

In [12]:
# Get the name from the last row
data[-1]["name"]

'Doug'

Using **Boolean masking**, perform operations such as filtering on age:

In [13]:
# Get names where age is under 30
data[data["age"] < 30]["name"]

array(['Alice', 'Doug'], dtype='<U10')

In [14]:
data[data["age"] < 30]

array([('Alice', 25, 55. ), ('Doug', 19, 61.5)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

Note that  Pandas provides a ``Dataframe`` object, which is a structure built on NumPy arrays that offers a variety of useful data manipulation functionality and much more.

## Creating Structured Arrays

Structured array data types can be specified in a number of ways.
Earlier, we saw the dictionary method:

In [15]:
np.dtype({"names": ("name", "age", "weight"), "formats": ("U10", "i4", "f8")})

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

For clarity, numerical types can be specified using Python types or NumPy ``dtype``s instead:

In [16]:
np.dtype(
    {"names": ("name", "age", "weight"), "formats": ((np.str_, 10), int, np.float32)}
)

dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

A compound type can also be specified as a list of tuples:

In [17]:
np.dtype([("name", "S10"), ("age", "i4"), ("weight", "f8")])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

If the names of the types do not matter to you, you can specify the types alone in a comma-separated string:

In [18]:
np.dtype("S10,i4,f8")

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

The shortened string format codes may seem confusing, but they are built on simple principles.
The first (optional) character is ``<`` or ``>``, which means "little endian" or "big endian," respectively, and specifies the ordering convention for significant bits.
The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below).
The last character or characters represents the size of the object in bytes.

| Character        | Description           | Example                             |
| ---------        | -----------           | -------                             | 
| ``'b'``          | Byte                  | ``np.dtype('b')``                   |
| ``'i'``          | Signed integer        | ``np.dtype('i4') == np.int32``      |
| ``'u'``          | Unsigned integer      | ``np.dtype('u1') == np.uint8``      |
| ``'f'``          | Floating point        | ``np.dtype('f8') == np.int64``      |
| ``'c'``          | Complex floating point| ``np.dtype('c16') == np.complex128``|
| ``'S'``, ``'a'`` | String                | ``np.dtype('S5')``                  |
| ``'U'``          | Unicode string        | ``np.dtype('U') == np.str_``        |
| ``'V'``          | Raw data (void)       | ``np.dtype('V') == np.void``        |