# Understanding Data Types in Python

Effective data-driven science and computation requires understanding how data is stored and manipulated.
The main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:
```python
# Python code
x = 4
x = "four"
```
This sort of flexibility is one piece that makes Python convenient and easy to use.

### A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C.
This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example,  in the definition ``x = 10000``  ``x`` is not just a "raw" integer. It's actually a pointer to a compound C structure:

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```
which actually contains four pieces:
- ``ob_refcnt``, a reference count that helps Python silently handle memory allocation and deallocation
- ``ob_type``, which encodes the type of the variable
- ``ob_size``, which specifies the size of the following data members
- ``ob_digit``, which contains the actual integer value that we expect the Python variable to represent.

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C:

![Integer Memory Layout](cint_vs_pyint.png)

Here ``PyObject_HEAD`` is the part of the structure containing the reference count, type code, and other pieces mentioned before.

* C integer is essentially a label for a position in memory whose bytes encode an integer value.
* Python integer is a pointer to a position in memory containing all the Python object information.
* Extra information is what allows Python to be coded so freely and dynamically.
* Extra information comes at a cost, which becomes especially apparent in structures that combine many of objects.

### A Python List Is More Than Just a List

Let's consider now what happens when we use a Python data structure that holds many Python objects.
The standard mutable multi-element container in Python is the list.
We can create a list of integers as follows:

In [None]:
L = list(range(10))
L

In [None]:
type(L[0])

Because of Python's dynamic typing, we can even create heterogeneous lists:

In [None]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

* To allow these flexible types, each item in the list must contain its own extra information.
* In the special case that all variables are of **the same type**, this information is redundant: it can be much more efficient to store data in a fixed-type array.

![Array Memory Layout](array_vs_list.png)

Fixed-type NumPy-style arrays lack Python flexibility, but are much more efficient for **storing and manipulating data**.

### Fixed-Type Arrays 

Python offers several different options for storing data in efficient, fixed-type data buffers.
The built-in ``array`` module can be used to create dense arrays of a uniform type:

In [None]:
import array
L = list(range(10))
A = array.array('i', L)
A

Here ``'i'`` is a type code indicating the contents are integers.

### NumPy ``ndarray`` objects 

* provide efficient storage of array-based data
* add efficient *operations* on that data

#### Creating a NumPy array

In [None]:
import numpy as np

* Creating Arrays from Python Lists

In [None]:
np.array([1, 4, 2, 5, 3])

* NumPy upcasts different types if possible (here, integers are up-cast to floating point):

In [None]:
np.array([3.14, 4, 2, 3])

* To explicitly set the data type of the resulting array the ``dtype`` keyword can be used:

In [None]:
np.array([1, 2, 3, 4], dtype='float32')

* NumPy arrays can explicitly be multi-dimensional. Here's one way of initializing a multidimensional array using a list of lists, inner lists are treated as rows of the resulting two-dimensional array:

In [None]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

#### Creating Arrays using NumPy routines  

In [None]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

In [None]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

#### NumPy Standard Data Types

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

More advanced type specification is possible  including compound data types [NumPy documentation](http://numpy.org/).