## Data type (`dtype`)

In contrast to built-in Python containers (like lists)  NumPy arrays can store elements of pre-determined type only. All elements of an array must be of the same type. To see the type of array contents you can use the `dtype` attribute.

In [None]:
import numpy as np
a = np.array([1., 2])
a.dtype

## Quiz: Data types

Guess the data type of the following arrays:

```
a = np.array([[1, 2], 
              [2, 3]])
b = np.array(['a', 'b', 'c'])
c = np.array([1, 2, 'a'])
d = np.array([np.dot, np.array])
e = np.random.randn(5) > 0
f = np.arange(5)
g = np.array([[1], [2, 3]])
```

## Exercise: Overflow

Construct the array 
```python
x = np.array([0, 1, 2, 255], dtype=np.uint8)
```

Can
you explain the difference obtained by `x + 1` and `x.astype(int) + 1`?

## Structured arrays

`dtype` may also be a structure of several items as long as they are the same type. You can find out more from the examples in the docs:

In [None]:
np.info(np.dtype)

In [None]:
rec_array = np.array([(0, 'a'), 
                      (1, 'b'),
                      (2, 'c')], dtype=[('a', np.int32), ('b', '|S1')])

Structured array is somewhat similar to a table with multiple rows and columns. You can access different fields by their name:

In [None]:
rec_array['a']

In [None]:
rec_array['b']

If you index by an integer you get a single row:

In [None]:
rec_array[0]

### Quiz

Can you predict the `shape` and `strides` of the following array:

```python
A = np.array([('a', 0),
              ('b', 1),
              ('c', 2)], dtype=[('name', '|S1'), 
                                ('value', np.int8)])
```

### Exercise: Structured data types

*This exercise was proposed by Stéfan van der Walt (https://python.g-node.org/python-summerschool-2014/_media/numpy_advanced.tar.bz2)*

Design a data-type for storing the following record:

 - Timestamp in nanoseconds (a 64-bit unsigned integer)
 - Position (x- and y-coordinates, stored as floating point numbers)

Use it to represent the following data:

```
dt = np.dtype(<your code here>)
x = np.array([(100, (0, 0.5)),
              (200, (0, 10.3)),
              (300, (5.5, 15.1))], dtype=dt)
```

Have a look at the ``np.dtype`` docstring if you need help.
After constructing ``x``, try to print all the ``x`` values for which timestamp
is greater than 100 (hint: something of the form ``y[y > 100]``).

### Exercise: Structured file I/O

*Modified from exercise by Stéfan van der Walt (https://python.g-node.org/python-summerschool-2014/_media/numpy_advanced.tar.bz2)*

Given the ``data.txt`` file with the following content:

In [None]:
%%file data.txt
#rank         lemma (10 letters max)      frequency       dispersion
21             they                        1865844         0.96
42             her                         969591          0.91
49             as                          829018          0.95
7              to                          6332195         0.98
63             take                        670745          0.97
14             you                         3085642         0.92
35             go                          1151045         0.93
56             think                       772787          0.91
28             not                         1638883         0.98

Design a suitable structured data type, then load the data from the text
file using ``np.loadtxt``.  Here's a skeleton to start with:

```python
import numpy as np
data = np.loadtxt('data.txt', dtype=...)  # Modify this line
```

Examine the data you got:
 - Extract words only
 - Extract the 3rd row
 - Print all words with ``rank < 30``
 - Sort the data according to frequency (see np.sort).

## Further reading

* NumPy Docs, http://docs.scipy.org/doc/numpy/user/basics.rec.html