## Chapter 4

# NumPy : Arrays and Vectorized Computation

NumPy internally stores data in a contiguous block of memory, independent of
other built-in Python objects.
NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead.
NumPy arrays also use much less memory than built-in Python sequences.
NumPy operations perform complex computations on entire arrays without the
need for Python for loops.
To give you an idea of the performance difference, consider a NumPy array of one
million integers, and the equivalent Python list:

In [2]:
import numpy as np

In [2]:
my_arr = np.arange(1000000)

In [3]:
my_list = list(range(1000000))

In [4]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 15.8 ms, sys: 20.4 ms, total: 36.2 ms
Wall time: 38.7 ms


In [5]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 601 ms, sys: 112 ms, total: 713 ms
Wall time: 732 ms


NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.

# 4.1 The NumPy ndarray: A Multidimensional Array Object

One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python. Arrays enable you to
perform mathematical operations on whole blocks of data using similar syntax to the
equivalent operations between scalar elements.

In [6]:
data = np.random.randn(2,3)

In [7]:
data

array([[-1.8122404 ,  1.97449462,  1.26997335],
       [ 1.17661161,  0.29365201,  0.79380771]])

In [8]:
data * 10

array([[-18.12240403,  19.74494621,  12.69973346],
       [ 11.76611612,   2.93652011,   7.93807714]])

In [9]:
data + data

array([[-3.62448081,  3.94898924,  2.53994669],
       [ 2.35322322,  0.58730402,  1.58761543]])

An ndarray is a generic multidimensional container for homogeneous data; that is, all
of the elements must be the same type. 

Every array has a shape , a tuple indicating the
size of each dimension, and a dtype , an object describing the data type of the array:

In [10]:
data.shape

(2, 3)

In [11]:
data.dtype

dtype('float64')

# a. Creating ndarrays

In [12]:
data1 = [1,2,3.45,6,7,22]

In [13]:
arr1 = np.array(data1)

In [14]:
arr1

array([ 1.  ,  2.  ,  3.45,  6.  ,  7.  , 22.  ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [15]:
data2 = [[1,2,3,4],[5,6,7,8]] # This is a list of lists

In [16]:
arr2 = np.array(data2)

In [17]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
inferred from the data. We can confirm this by inspecting the ndim and shape
attributes:

In [20]:
arr2.ndim

2

In [21]:
arr2.shape

(2, 4)

Unless explicitly specified, np.array tries to infer a good data
type for the array that it creates. The data type is stored in a special dtype metadata
object; for example, in the previous two examples we have:

In [22]:
arr1.dtype

dtype('float64')

In [24]:
arr2.dtype

dtype('int64')

In addition to np.array , there are a number of other functions for creating new arrays.

Specify what you want from right to left fashion

In [25]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [68]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [70]:
np.zeros((4,3,2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [37]:
np.empty((2,3,2))

array([[[1.38578615e-316, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]],

       [[0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000],
        [0.00000000e+000, 0.00000000e+000]]])

NOTE: It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.

### arange
arange is an array-valued version of the built-in Python range function:

In [32]:
range(5) # normal python range function

range(0, 5)

In [29]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# b. Data Types for ndarrays

In [2]:
arr1 = np.array([1,2,3],dtype=np.float64)

In [4]:
arr2 = np.array([1,2,3],dtype=np.int32)

In [5]:
arr1

array([1., 2., 3.])

In [6]:
arr2

array([1, 2, 3], dtype=int32)

In [7]:
type(arr1)

numpy.ndarray

In [9]:
arr1.dtype

dtype('float64')

### astype
You can explicitly convert or cast an array from one dtype to another using ndarray’s
astype method:

In [11]:
arr = np.array([1,2,3,4,5])

In [17]:
type(arr)

numpy.ndarray

In [12]:
arr.dtype

dtype('int64')

In [14]:
float_arr = arr.astype(np.float64)

In [16]:
float_arr.dtype

dtype('float64')

In this example, integers were cast to floating point. If I cast some floating-point
numbers to be of integer dtype, the decimal part will be truncated:

In [18]:
arr = np.array([1.2,3.4,45.34343,6.3242])

In [19]:
arr

array([ 1.2    ,  3.4    , 45.34343,  6.3242 ])

In [20]:
arr.astype(np.int32)

array([ 1,  3, 45,  6], dtype=int32)

If you have an array of strings representing numbers, you can use astype to convert
them to numeric form

In [22]:
numeric_strings = np.array(['1.25','-9.6'])

In [25]:
numeric_strings.astype(np.float)

array([ 1.25, -9.6 ])

You can also use another array’s dtype attribute:

In [26]:
int_array = np.arange(10)

In [27]:
int_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
int_array.dtype

dtype('int64')

In [29]:
some_float_array = np.array([1.2,334.55,664.4545,4545.4545])

In [31]:
int_array.astype(some_float_array.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

# b. Arthmatic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data without any for loops.</br>
Numpy users call this vectorization.


Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [3]:
arr =np.array([[1.,2.,3.],[4.,5.,6.]])

In [4]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [5]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [6]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

Arithmetic operations with scalars propagate the scalar argument to each element in
the array:

In [7]:
1/ arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [8]:
arr * 2

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [9]:
arr ** 2

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [10]:
arr ** 1/2

array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ]])

Comparisons between arrays of the same size yield boolean arrays:

In [13]:
arr2 = np.array([[0.,4.,1.],[7.,2.,12.]])

In [14]:
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [15]:
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [12]:
arr > arr2

array([[ True, False,  True],
       [False,  True, False]])

# c. Basic Indexing and Slicing

There are many ways you may want to select
a subset of your data or individual elements

One-dimensional arrays are simple; on
the surface they act similarly to Python lists:

In [17]:
arr = np.arange(10)

In [18]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [28]:
arr[5:8]

array([12, 12, 12])

In [38]:
arr[5:8] = 12 # the value '12' is propogated to entire section 5-8 (i.e for 5,6,7). This is called broadcating

In [30]:
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [31]:
arr_slice = arr[5:8]

In [32]:
arr_slice

array([12, 12, 12])

An important first distinction from Python’s built-in lists is that array slices are views on the original array.</br>
This means that the data is not copied, and any modifications to the view will be
reflected in the source array.

In [None]:
arr_slice[1] = 12345 # this will change arr_slice AND arr as well

This is because NumPy has been
designed to be able to work with very large arrays, you could imagine performance
and memory problems if NumPy insisted on always copying data.

In [35]:
arr_slice

array([   12, 12345,    12])

In [36]:
arr # you can see that changes made to array_slice are reflected in orginal arr as well

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

If you want a copy of a slice of an ndarray instead of a view, you
will need to explicitly copy the array—for example,
arr[5:8].copy() .

In [69]:
arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [71]:
arr1

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [72]:
arr1.ndim

2

In [66]:
a = arr1[2]

In [67]:
a.ndim

1

In [68]:
arr1

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [75]:
arr1[2][1]

8

In [76]:
arr1[2,1]

8

In [77]:
l = [[1,2,3],[4,5,6],[7,8,9]]