__Author__: Christian Camilo Urcuqui López

__Date__: 2 October 2018

![image](../Utilities/NumPy_logo.png)


It is an useful tool for numerical tasks, it provides the mechanisms to storage and data operations as the arrays grow larger in size. Is one of the most important fundational packages computing in Python (a lot of scientific packages use it).

Numpy has some useful tools, some of them are:

+ ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations 
+ mathematical functions for fast operations on entire arrays of data without having to write loops
+ Tools for reading/writing array data to disk and working with memory-mapped files.
+ Linear algebra, random number generation, and Fourier transform capabilities.
+ A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.


The NumPy's website is https://docs.scipy.org/doc/numpy/user/index.html

NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:
+ NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy's library of algorithms written in the C language can operate on this memory without any checking or other overhead. 
+ NumPy operations perform complex computations on entire arrays without the need for Python for loops.


In [1]:
# import and how to see the numpy version

import numpy
numpy.__version__

'1.14.3'

In [12]:
# Let's see the performance of numpy, we are going to use a NumPy array of one million integers
# and the equivalent Python list:

import numpy as np

my_arr = np.arange(1000000)

my_list = list(range(1000000))


In [17]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 20.1 ms


In [16]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 793 ms


As recommendation, most of the people in data science (in the SciPy/PyData) world use numpy using np as an alias

In [2]:
import numpy as np

## The NumPy ndarray: A multimensional Array Object

One of the most important thing of NumPy is its ndarray, which is a fast, flexible container for large datasets in Python. This kind of object allows us to perform mathematical operations on whole block of data using similar sintax to the equivalent operations between scalar elements.

In order to see the NumPy functionality, we are going to start with a small random data samples.

In [18]:
import numpy as np

# random.randn allows us make random data, to this case a matrix of 2x3
data = np.random.randn(2,3)
data

array([[-1.08899403, -0.39095681, -0.5079292 ],
       [ 0.39371492, -0.59242131, -0.12346692]])

We are going to use some mathematical functions in multimensoinal array objects

In [19]:
data * 10

array([[-10.88994026,  -3.90956805,  -5.07929203],
       [  3.9371492 ,  -5.92421313,  -1.23466916]])

In [20]:
data + data

array([[-2.17798805, -0.78191361, -1.01585841],
       [ 0.78742984, -1.18484263, -0.24693383]])

A _ndarray_ is a generic multimensional container for homogeneous data (all of the elements must be the same type). Every array has a shape, a tuple means the size of each dimension, and _dtype_, an object describes the _data type_ of the array: 

In [22]:
data.shape

(2, 3)

In [23]:
data.dtype

dtype('float64')

## Creating ndarrays

The function _array_ is one of the ways to make an array. This kind of object accepts any sequence-like object and produces a new NumPy array. 

In [27]:
data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1)

print(arr1.dtype)

arr1

float64


array([6. , 7.5, 8. , 0. , 1. ])

In [29]:
data2 = [[1,2,3,4],[5,6,7,8]]

arr2 = np.array(data2)
print(arr2.dtype)
arr2

int32


array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [34]:
print("shape: "+str(arr2.shape))
print("dimension: "+str(arr2.ndim))

shape: (2, 4)
dimension: 2


We can make other king of arrays through the application of _np.array_, for example, if we need an array with 0s or 1s the can use the _zero_ method. Respectively, with the length or shape, we can make an array without values through the _empty_ method of numpy.

In [35]:
np.zeros(10) # we are defining the length

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [36]:
np.zeros((3,5)) # we are defining the shape

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [37]:
np.empty((2, 3, 2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [38]:
np.arange(1,20)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

In [41]:
np.eye(5,5) # the identify matrix 

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## NumPy Data Types

The NumPy dtypes allows us to understand and work with the kind of data contained in a _ndarray_. When we used the function _dtype_ it allowed us to get the type of variables that are in this structure. In some projects is important to have a good manage with the variables because they have an assignation of bytes in the memory, so if we have low resources one way is to administrate them through the kind of variables that we are using.

In this URL we can see the information about the variables

https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

In [42]:
arr =  np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int32')

In [43]:
# we can change the type of the variables 
float_arr =  arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [None]:
# we can define 

# References

+ McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".