## Data type (`dtype`)

In contrast to built-in Python containers (like lists)  NumPy arrays can store elements of pre-determined type only. To see the type of array contents you can use the `dtype` attribute. Let's look at two examples:

In [1]:
import numpy as np
a = np.array([1, 2, 3])
a.dtype

dtype('int64')

In [2]:
b = np.array([1., 2., 3.])
b.dtype

dtype('float64')

In the first case the numbers are 64-bit (8-byte) integers and in the second 64-bit floating point (real)  numbers. Note that NumPy auto-detects the data-type from the input. Specialised data types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers.

Note that all of the elements of an array must be of the same type. If we construct an array with different elements it will be **cast** to the "most general" type that can represent all elements. For example, array constructed from real numbers and integers will have a floating point data type:

    

In [3]:
a = np.array([1., 2])
a.dtype

dtype('float64')

Strings stored in a `dtype`, which can contain the longest of the strings:

In [4]:
np.array(['my', 'name', 'is', 'Bartosz'])

array(['my', 'name', 'is', 'Bartosz'], 
      dtype='<U7')

`<` stands for endiannes (little endian, '>' for big endian and '|' is it does not matter), 'U' for unicode ('S' for ascii strings) and `7` is the maximum length in characters (`Bartosz`)

The most generic type is the `object` type (also represented by capital `'O'`) which can represent any Python object -- even a function:

In [5]:
def f(): pass
a = np.array([f, f])
a.dtype

dtype('O')

Some of NumPy features (like element-wise functions, `np.abs`, `np.sqrt`, etc., or reductions, `np.sum`, `np.max` etc.) won't work with object arrays, but all types of indexing still work.

`object` type is most commonly encountered when constructing an array from multiple lists of different lengths:

    

In [6]:
np.array([[1], [2, 3]])

array([[1], [2, 3]], dtype=object)

### Quiz: Data types {.challenge}

Try to guess the data type of the following arrays. Then test your prediction by  constructing the arrays and check their dtype attribute.

```
a = np.array([[1, 2], 
               [2, 3]])
b = np.array(['a', 'b', 'c'])
c = np.array([1, 2, 'a'])
d = np.array([np.dot, np.array])
e = np.random.randn(5) > 0
f = np.arange(5)
```

### Exercise: Integer or real number? {.challenge}

Construct the array `x = np.array([0, 1, 2, 255], dtype=np.uint8)` (here, `uint8`
represents a single byte in memory, an unsigned integer between 0 and 255). Can
you explain the results obtained by x + 1 and x / 2? Also try `x.astype(float) + 1` and `x.astype(float) / 2`.


## Structured arrays

`dtype` may also be a structure of several items as long as they are the same type:

In [7]:
np.info(np.dtype)

 dtype()

dtype(obj, align=False, copy=False)

Create a data type object.

A numpy array is homogeneous, and contains elements described by a
dtype object. A dtype object can be constructed from different
combinations of fundamental numeric types.

Parameters
----------
obj
    Object to be converted to a data type object.
align : bool, optional
    Add padding to the fields to match what a C compiler would output
    for a similar C-struct. Can be ``True`` only if `obj` is a dictionary
    or a comma-separated string. If a struct dtype is being created,
    this also sets a sticky alignment flag ``isalignedstruct``.
copy : bool, optional
    Make a new copy of the data-type object. If ``False``, the result
    may just be a reference to a built-in data-type object.

See also
--------
result_type

Examples
--------
Using array-scalar type:

>>> np.dtype(np.int16)
dtype('int16')

Structured type, one field name 'f1', containing int16:

>>> np.dtype([('f1', np.int16)])
dtype([('f1', '<i2

In [8]:
rec_array = np.zeros(5, dtype=[('a', np.int32), ('b', np.float64)])

Note the array is one-dimensional:

In [9]:
rec_array.shape

(5,)

but each element of the array has two items:

In [10]:
rec_array[0]

(0, 0.0)

we can access each item by its name

In [11]:
rec_array[0]['a']

0

In [12]:
rec_array['a']

array([0, 0, 0, 0, 0], dtype=int32)

### Quiz

Can you predict the `shape` and `strides` of the following array:

```
A = np.array([('a', 0),
              ('b', 1),
              ('c', 2)], dtype=[('name', '|S1'), 
                              ('value', np.int8)])
```

### Exercise: Structured data types

This exercise was written by Stéfan van der Walt (https://python.g-node.org/python-summerschool-2014/_media/numpy_advanced.tar.bz2)

Design a data-type for storing the following record:

 - Timestamp in nanoseconds (a 64-bit unsigned integer)
 - Position (x- and y-coordinates, stored as floating point numbers)

Use it to represent the following data:

```
dt = np.dtype(<your code here>)
x = np.array([(100, (0, 0.5)),
              (200, (0, 10.3)),
              (300, (5.5, 15.1))], dtype=dt)
```

Have a look at the ``np.dtype`` docstring if you need help.
After constructing ``x``, try to print all the ``x`` values for which time
is greater than 100 (hint: something of the form ``y[y > 100]``).

### Exercise: Structured file I/O

modified from exercise by Stéfan van der Walt (https://python.g-node.org/python-summerschool-2014/_media/numpy_advanced.tar.bz2)

Given the ``data.txt`` file with the following content:

In [13]:
%%file data.txt
#rank         lemma (10 letters max)      frequency       dispersion
21             they                        1865844         0.96
42             her                         969591          0.91
49             as                          829018          0.95
7              to                          6332195         0.98
63             take                        670745          0.97
14             you                         3085642         0.92
35             go                          1151045         0.93
56             think                       772787          0.91
28             not                         1638883         0.98

Overwriting data.txt


Design a suitable structured data type, then load the data from the text
file using ``np.loadtxt``.  Here's a skeleton to start with:

In [14]:
import numpy as np

data = np.loadtxt('data.txt', dtype=...)  # Modify this line

TypeError: data type not understood

Examine the data you got, for example:
 - Extract words only
 - Extract the 3rd row
 - Print all words with ``rank < 30``
 - Sort the data according to frequency (see np.sort).

## Further reading

* NumPy Docs, http://docs.scipy.org/doc/numpy/user/basics.rec.html