## Chapter 4

# NumPy : Arrays and Vectorized Computation

NumPy internally stores data in a contiguous block of memory, independent of
other built-in Python objects.
NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead.
NumPy arrays also use much less memory than built-in Python sequences.
NumPy operations perform complex computations on entire arrays without the
need for Python for loops.
To give you an idea of the performance difference, consider a NumPy array of one
million integers, and the equivalent Python list:

In [1]:
import numpy as np

In [2]:
my_arr = np.arange(1000000)

In [3]:
my_list = list(range(1000000))

In [4]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 15.8 ms, sys: 20.4 ms, total: 36.2 ms
Wall time: 38.7 ms


In [5]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 601 ms, sys: 112 ms, total: 713 ms
Wall time: 732 ms


NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.

# 4.1 The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python. Arrays enable you to
perform mathematical operations on whole blocks of data using similar syntax to the
equivalent operations between scalar elements.

In [6]:
data = np.random.randn(2,3)

In [7]:
data

array([[-1.8122404 ,  1.97449462,  1.26997335],
       [ 1.17661161,  0.29365201,  0.79380771]])

In [8]:
data * 10

array([[-18.12240403,  19.74494621,  12.69973346],
       [ 11.76611612,   2.93652011,   7.93807714]])

In [9]:
data + data

array([[-3.62448081,  3.94898924,  2.53994669],
       [ 2.35322322,  0.58730402,  1.58761543]])

An ndarray is a generic multidimensional container for homogeneous data; that is, all
of the elements must be the same type. 

Every array has a shape , a tuple indicating the
size of each dimension, and a dtype , an object describing the data type of the array:

In [10]:
data.shape

(2, 3)

In [11]:
data.dtype

dtype('float64')

# a. Creating ndarrays

In [12]:
data1 = [1,2,3.45,6,7,22]

In [13]:
arr1 = np.array(data1)

In [14]:
arr1

array([ 1.  ,  2.  ,  3.45,  6.  ,  7.  , 22.  ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [15]:
data2 = [[1,2,3,4],[5,6,7,8]] # This is a list of lists

In [16]:
arr2 = np.array(data2)

In [17]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
inferred from the data. We can confirm this by inspecting the ndim and shape
attributes:

In [20]:
arr2.ndim

2

In [21]:
arr2.shape

(2, 4)

Unless explicitly specified, np.array tries to infer a good data
type for the array that it creates. The data type is stored in a special dtype metadata
object; for example, in the previous two examples we have:

In [22]:
arr1.dtype

dtype('float64')

In [24]:
arr2.dtype

dtype('int64')

In addition to np.array , there are a number of other functions for creating new arrays.

Specify what you want from right to left fashion

In [25]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [68]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [70]:
np.zeros((4,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [37]:
np.empty((2,3,2))

array([[[1.38578615e-316, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

NOTE: It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.

### arange
arange is an array-valued version of the built-in Python range function:

In [32]:
range(5) # normal python range function

range(0, 5)

In [29]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# b. Data Types for ndarrays