# Data Types in Python and Numpy

Effective data-driven computation requires us to have a deep understanding of how data is stored and manipulated. This notebook covers outlines and contrasts on how arrays of data are handled in the Python itself, and how NumPy improves this. We are going to understand much of this throughout the numpy module, but right now, let's stress a little on the 

Unlike [statically typed language](https://stackoverflow.com/questions/1517582/what-is-the-difference-between-statically-typed-and-dynamically-typed-languages), python gives us the flexiblity of auto setting the value of variables in the code rather than typing the exact data-type of the same. This could be considered as one of the reasons why python is slower as compared to a compile type and statically typed language. Let's try to understand this using a simple code comparision with c. 

```c
/* C code */
int sum = 0;
for(int i=0; i<100; i++){
    sum += i;
}
```
This exact piece of code, when written in python, is as follows:
```python
# Python
result = 0
for i in range(100):
    result += i
```
Looks neat, isn't it? This code is more easily readable than the `c` code as we don't have to explicitly mention the datatypes we are working with and we can also change them easily whenever required. 

In [2]:
List = ["2", 2, 2., True]
[type(item) for item in List]

[str, int, float, bool]

So, we can see that one single python list can contain multiple data-types.

In [3]:
import numpy as np

In [7]:
# integer array:
a = np.array([3, 1, 4, 1, 5, 9, 6, 2, 5])
a

array([3, 1, 4, 1, 5, 9, 6, 2, 5])

In [8]:
a.dtype

dtype('int64')

In [9]:
a = np.array([3.14, 1, 5, 9, 6, 2, 5])
a

array([3.14, 1.  , 5.  , 9.  , 6.  , 2.  , 5.  ])

In [10]:
a.dtype

dtype('float64')

In [11]:
a = np.array([3.14, "1", 5, 9, 6, 2, 5])
a; a.dtype

dtype('<U32')

**What do we see?**
We can clearly see that whenever there are different data-types in a numpy array, rather than giving a mixed set of dtypes, it gives one singular datatype. 

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [12]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [3, 1, 4]])

array([[3, 4, 5],
       [1, 2, 3],
       [4, 5, 6]])

# A speed comparision of numpy with native python

The question should rise after reading the above lines - *Numpy restricts us from using different dtypes in a single list, then why are we still trying to learn about numpy?*
The answer to this is *speed*. The use of dynamic datatypes is one of the reasons why python becomes slower than other languages. This means it makes our code faster: way faster compared to native python. Don't believe me? have a look at it yourself. 


In [15]:
%%time
a=[]
for i in range(int(1e6)):
    a.append(i)

CPU times: user 235 ms, sys: 18.1 ms, total: 253 ms
Wall time: 249 ms


In [17]:
%%time
a = [i for i in range(int(1e6))]

CPU times: user 112 ms, sys: 38.3 ms, total: 150 ms
Wall time: 147 ms


In [18]:
%%time
a = np.arange(1e6)

CPU times: user 54.2 ms, sys: 593 µs, total: 54.8 ms
Wall time: 51.4 ms


We can clearly see that in order to create a list of one million numbers, it takes about 250ms for a python list which is only 51.4 ms using numpy. Clearly there is a 5 times improvement in speed using numpy as compared to native python. So clearly from now onwards you must start using numpy wrapper.

## Creating Arrays from Scratch

We have seen the dtypes of numpy array and how it helps us carry out larger computation by reducing the execution time. But the next question should be: how should I create arrays using numpy now? This section is going to guide you through the process of creating faster array in numpy.

In [19]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [21]:
# Create a 2x3 floating-point array filled with ones
np.ones((2, 3), dtype=float)

array([[1., 1., 1.],
       [1., 1., 1.]])

In [22]:
# Create a 2x3 array filled with value of pi
np.full((2, 3), 3.14)

array([[3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14]])

In [25]:
# Create an array filled with a linear sequence: a list of values of multiple of 5.
# Starting at 0, ending at 50, stepping by 5
# (this is similar to the built-in range() function)
np.arange(0, 50, 5)

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

In [27]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [28]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.89881275, 0.45645897, 0.29092413],
       [0.93800677, 0.90552942, 0.52288151],
       [0.01085913, 0.66908311, 0.46687694]])

In [29]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[4, 0, 5],
       [4, 5, 1],
       [5, 9, 1]])

In [30]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [31]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

NumPy Standard Data Types
NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:
```python
np.zeros(10, dtype='int16')
```
Or using the associated NumPy object:
```python
np.zeros(10, dtype=np.int16)
```

| Data type	  | Description                                                                    |
|-------------|--------------------------------------------------------------------------------|
| bool_	      | Boolean   | (True or False) stored as a byte                                   |
| int_	      | Default   | integer type (same as C long; normally either int64 or int32)      |
| intc	      | Identical to C int (normally int32 or int64)                                   |
| intp	      | Integer used for indexing (same as C ssize_t; normally either int32 or int64)  |
| int8	      | Byte (-128 to 127)                                                             |
| int16	      | Integer (-32768 to 32767)                                                      |
| int32	      | Integer (-2147483648 to 2147483647)                                            |
| int64	      | Integer (-9223372036854775808 to 9223372036854775807)                          |
| uint8	      | Unsigned integer (0 to 255)                                                    |
| uint16	  | Unsigned integer (0 to 65535)                                                  |
| uint32	  | Unsigned integer (0 to 4294967295)                                             |
| uint64	  | Unsigned integer (0 to 18446744073709551615)                                   |
| float_	  | Shorthand for float64.                                                         |
| float16	  | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa              |
| float32	  | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa            |
| float64	  | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa           |
| complex_	  | Shorthand for complex128.                                                      |
| complex64	  | Complex number, represented by two 32-bit floats                               |
| complex128  | Complex number, represented by two 64-bit floats                               |

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the [NumPy documentation](http://numpy.org/). NumPy also supports compound data types, which will be covered in Structured Data: [NumPy's Structured Arrays.](https://render.githubusercontent.com/view/02.09-Structured-Data-NumPy.ipynb)

[< 1. Introduction to numpy](1.%20Introduction%20to%20numpy.ipynb) | [>]()