# 02.01 - Understanding Data Types in Python

For the purpose of our analysis, it is convenient to deal with all data as arrays of numbers.  

For this reason, we will use the **NumPy** (short for _Numerical Python_) package as our tool to efficiently store and manipulate numerical arrays.

In [1]:
import numpy as np
np.__version__

'1.15.4'

In [2]:
np?

In order to understand how Python and NumPy differ in handling data, it could useful to see how variables work in Python.

One key characteristic of Python is that it is dynamically typed, in contrast to statically-typed languages such as C or Java which require each variable to be explicitely declared.  

In [3]:
# C code
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}

SyntaxError: invalid syntax (<ipython-input-3-07e0a7af6193>, line 2)

Since in Python data types are dynamically inferred, we can write the same piece of code above as:

In [None]:
# Python code
result = 0
for i in range(100):
    result += i

But, under the hood, the standard Python implementation is itself written in C. This means that every Python not only contains its value, but a pointer to a compound C structure, which contains several values.  

## Integers

For example, an integer value in Python contains:

* <code>ob_refcnt</code>    a reference count that helps Python silently handle memory allocation and deallocation
* <code>ob_type</code>      which encodes the type of the variable
* <code>ob_size</code>      which specifies the size of the following data members
* <code>ob_digit</code>     which contains the actual integer value that we expect the Python variable to represent

Key difference is:  

1. In **C**, an integer is essentially a label for a position in memory whose bytes encode an integer value;

2. In **Python**, an integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value

All this overhead gives Python the flexibility we very much enjoy, but comes at a cost in terms of both memory and execution time. 

## Lists

The standard mutable multi-element container in Python is the list. We can create a **list of integers** as follows:

In [None]:
L = list(range(10))
L

In [None]:
type(L[0])

Similarly, we create a **list of strings** as follows:

In [None]:
L2 = [str(c) for c in L]
L2

In [None]:
type(L2[0])

Because of Python's dynamic typing, we can even create heterogeneous lists:

In [None]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

This flexibility is clearly useful. However, when all values are of the same data type, this leads to redundant information being stored.

NumPy-style arrays solve this problem by storing everything as fixed-type, making them more efficient for storing and manipulating data.

## Fixed-Type Arrays

Python offers several options to create dense homogeneous arrays. For example, <code>array</code>:

<code>i</code> indicating the <code>integer</code> data type in this case.

In [None]:
import array
L = list(range(10))
A = array.array('i', L)
A

However, more useful is the <code>ndarray</code> object in NumPy, which handles storage _and_ operations.

## Arrays from Python lists

First, we can use <code>np.array</code> to create arrays from Python lists:

In [None]:
# integer array:

np.array([1, 4, 2, 5, 3])

Setting the type with <code>dtype</code>:

In [None]:
np.array([1, 2, 3, 4], dtype='float32')

Multidimensional arrays:

In [None]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

## Creating Arrays from Scratch

In [None]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

In [None]:
np.ones((3,5), dtype=float)

In [None]:
# Create a 3x5 array filled with 3.14
np.full((3,5), 3.14)

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 4)

In [None]:
# Create an array of five values evenly spaced between 0 and 0.5
np.linspace(0, 0.5, 5)

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [None]:
# Create a 3x3 identity matrix
np.eye(3)

In [None]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

## NumPy Standard Data Types

NumPy data types can be specified using a string:  

<code>np.zeroes(10, dtype="int16")</code>
    
or using the associated NumPy object:

<code>np.zeroes(10, dtype=int16</code>

Here is a list of standard NumPy data types:

<code>bool_</code> 	Boolean (True or False) stored as a byte  
<code>int_</code> 	Default integer type (same as C long; normally either int64 or int32)  
<code>intc</code> 	Identical to C int (normally int32 or int64)  
<code>intp</code> 	Integer used for indexing (same as C ssize_t; normally either int32 or int64)  
<code>int8</code> 	Byte (-128 to 127)  
<code>int16</code> 	Integer (-32768 to 32767)  
<code>int32</code> 	Integer (-2147483648 to 2147483647)  
<code>int64</code> 	Integer (-9223372036854775808 to 9223372036854775807)  
<code>uint8</code> 	Unsigned integer (0 to 255)  
<code>uint16</code> 	Unsigned integer (0 to 65535)  
<code>uint32</code> 	Unsigned integer (0 to 4294967295)  
<code>uint64</code> 	Unsigned integer (0 to 18446744073709551615)  
<code>float_</code> 	Shorthand for float64.  
<code>float16</code> 	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa  
<code>float32</code> 	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa  
<code>float64</code> 	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa  
<code>complex_</code> 	Shorthand for complex128.  
<code>complex64</code> 	Complex number, represented by two 32-bit floats  
<code>complex128</code> 	Complex number, represented by two 64-bit floats  

More data types available in the [NumPy Documentation](http://numpy.org/). 