## NumPy Basics: Arrays and Vectorized Computation

- NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.
- Here are some of the concepts it provides:
    - ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities
    - Standard mathematical functions for fast operations on entire arrays of data without having to write loops
    - Tools for reading / writing array data to disk and working with memory-mapped files
    - Linear algebra, random number generation, and Fourier transform capabilities
    - Tools for integrating code written in C, C++, and Fortran

- Data analysis functionalities include
    - Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations
    - Common array algorithms like sorting, unique, and set operations
    - Efficient descriptive statistics and aggregating/summarizing data
    - Data alignment and relational data manipulations for merging and joining together heterogeneous data sets
    - Expressing conditional logic as array expressions instead of loops with if-elif- else branches
    - Group-wise data manipulations (aggregation, transformation, function application). Much more on this in Chapter 5

### The NumPy ndarray: A Multidimensional Array Object

- One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large data sets in Python. 
-  Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.
- An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type.
- Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array.

#### Creating ndarrays

- the array function: accepts any sequence-like object (including other arrays) and produces a new NumPy array

In [1]:
import numpy as np

In [2]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

array([6. , 7.5, 8. , 0. , 1. ])

- Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array

In [3]:
m = 3
n = 2 

In [8]:
dataset = [
    
    [0,0],
    [0,0],
    [0,0]
]

In [9]:
dataset[1][1]

0

In [12]:
for i, x in enumerate(dataset):
    for j, y in enumerate(x):
        if y == 1:
            print("Buldum:", i,j)

In [13]:
data2 = [
    [1, 2.0, 3, 4], 
    [5, 6, 7, 8]
]
arr2 = np.array(data2)
arr2

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

In [16]:
arr2.ndim

2

In [17]:
arr2.shape

(2, 4)

In [18]:
# The data type is stored in a special dtype object; for example, in the above two examples we have:
arr2.dtype

dtype('float64')

- In addition to np.array, there are a number of other functions for creating new arrays.
- The functions, 'zeros' and 'ones' create arrays of 0’s or 1’s, respectively, with a given length or shape.
- 'empty' creates an array without initializing its values to any particular value.
- arange is an array-valued version of the built-in Python range function.

In [19]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [21]:
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [22]:
np.ones((2,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [23]:
# In many cases, 'empty' will return uninitialized garbage values.
arr3 = np.empty((5, 2)) 
arr3[1,1] = 1
arr3

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [25]:
np.ones(15).reshape(5,3)

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

|Function|Description|
|--------|:-----------|
|array|Convert input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the input data by default.|
|arange|Like the built-in range but returns an ndarray instead of a list.|
|ones, ones_like|Produce an array of all 1’s with the given shape and dtype. ones_like takes another array and produces a ones array of the same shape and dtype.|
|zeros, zeros_like|Like ones and ones_like but producing arrays of 0’s instead|
|empty, empty_like|Create new arrays by allocating new memory, but do not populate with any values like ones and zeros|
|eye, identity|Create a square N x N identity matrix (1’s on the diagonal and 0’s elsewhere)|

#### Data Types for ndarrays

- The data type or dtype is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data
- Dtypes are part of what makes NumPy so powerful and flexible. 
- In most cases, they map directly onto an underlying machine representation, which makes it easy 
    - to read and write binary streams of data to disk 
    - to connect to code written in a low-level language like C or Fortran.
- The numerical dtypes are named the same way: a type name, like float or int, followed by a number indicating the number of bits per element.
- A standard double-precision floating point value (what’s used under the hood in Python’s float object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as float64.

In [26]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [27]:
arr1.dtype

dtype('float64')

In [28]:
arr2.dtype

dtype('int32')

- You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method

In [29]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

dtype('int32')

In [30]:
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [31]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr = arr.astype(np.int32)
arr

array([ 3, -1, -2,  0, 12, 10])

In [32]:
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [33]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32

array([         0, 1075314688,          0, 1075707904,          0,
       1075838976,          0, 1072693248], dtype=uint32)

In [32]:
empty_uint32 = np.empty(8, dtype='U')
empty_uint32

array(['', '', '', '', '', '', '', ''], dtype='<U1')

### Operations between Arrays and Scalars

- Arrays are important because they enable you to express batch operations on data without writing any for loops.
- This is usually called vectorization.
- Any arithmetic operations between equal-size arrays applies the operation elementwise.

In [34]:
arr =  np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [35]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [36]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [37]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [38]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

### Basic Indexing and Slicing

In [39]:
arr = np.arange(10)
arr[5:8]

array([5, 6, 7])

In [40]:
arr[3:7] = 10

In [41]:
arr

array([ 0,  1,  2, 10, 10, 10, 10,  7,  8,  9])

- __An important first distinction from lists is that array slices are views on the original array.__
- This means that the data is not copied, and any modifications to the view will be reflected in the source array

In [46]:
arr_slice = arr[5:8]
arr_slice[0] = 5000

In [43]:
arr

array([ 0,  1,  2, 10, 10, 10, 10,  7,  8,  9])

In [47]:
arr_slice[:] = 64

In [48]:
arr

array([ 0,  1,  2, 10, 10, 64, 64, 64,  8,  9])

- With higher dimensional arrays, e.g in a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In [49]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [50]:
arr2d[2]

array([7, 8, 9])

In [51]:
arr2d[0][2]

3

In [52]:
arr2d[0, 2]

3

In [None]:
####### image 1: See Figure 4-1 for an illustration of indexing on a 2D array.

### Indexing with slices

- Like one-dimensional objects such as Python lists, ndarrays can be sliced using the familiar syntax
- Higher dimensional objects give you more options as you can slice one or more axes and also mix integers. 

In [53]:
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [54]:
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [56]:
arr2d[:3, 2:]

array([[3],
       [6],
       [9]])

In [55]:
arr2d[:1, 1:]

array([[2, 3]])

In [59]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [66]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [57]:
arr2d[:, 0]

array([1, 4, 7])

### Boolean Indexing

In [58]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [61]:
data = np.random.randn(7, 4)
data

array([[ 0.87238057,  1.71556134, -0.50096764,  0.22647129],
       [ 0.23519999,  0.57860326,  0.30827387, -1.03516674],
       [ 0.74166934, -0.21316138,  1.13753629,  1.62477747],
       [ 0.36052327, -0.05620037,  0.12355862, -0.11961224],
       [ 1.10835379,  0.94549279, -0.41801394, -0.29627884],
       [-0.04026592, -0.43014554,  0.83535848,  0.20211087],
       [-3.2025137 ,  1.62377072,  0.39364843, -0.12504359]])

In [59]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [62]:
data[names == 'Bob']

array([[ 0.87238057,  1.71556134, -0.50096764,  0.22647129],
       [ 0.36052327, -0.05620037,  0.12355862, -0.11961224]])

HOW DID THIS HAPPEN?

In [75]:
names != 'Bob'

array([False,  True,  True, False,  True,  True,  True])

In [63]:
data[names != 'Bob']


array([[ 0.23519999,  0.57860326,  0.30827387, -1.03516674],
       [ 0.74166934, -0.21316138,  1.13753629,  1.62477747],
       [ 1.10835379,  0.94549279, -0.41801394, -0.29627884],
       [-0.04026592, -0.43014554,  0.83535848,  0.20211087],
       [-3.2025137 ,  1.62377072,  0.39364843, -0.12504359]])

In [64]:
data[~(names == 'Bob')]

array([[ 0.23519999,  0.57860326,  0.30827387, -1.03516674],
       [ 0.74166934, -0.21316138,  1.13753629,  1.62477747],
       [ 1.10835379,  0.94549279, -0.41801394, -0.29627884],
       [-0.04026592, -0.43014554,  0.83535848,  0.20211087],
       [-3.2025137 ,  1.62377072,  0.39364843, -0.12504359]])

In [65]:
mask = (names == 'Bob') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

In [67]:
data[mask]

array([[ 0.87238057,  1.71556134, -0.50096764,  0.22647129],
       [ 0.74166934, -0.21316138,  1.13753629,  1.62477747],
       [ 0.36052327, -0.05620037,  0.12355862, -0.11961224],
       [ 1.10835379,  0.94549279, -0.41801394, -0.29627884]])

In [68]:
data[data < 0] = 0
data

array([[0.87238057, 1.71556134, 0.        , 0.22647129],
       [0.23519999, 0.57860326, 0.30827387, 0.        ],
       [0.74166934, 0.        , 1.13753629, 1.62477747],
       [0.36052327, 0.        , 0.12355862, 0.        ],
       [1.10835379, 0.94549279, 0.        , 0.        ],
       [0.        , 0.        , 0.83535848, 0.20211087],
       [0.        , 1.62377072, 0.39364843, 0.        ]])

### Fancy Indexing

- Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.

In [69]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [70]:
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [71]:
### another way of creating multidimensional arrays
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [74]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

- Take a moment to understand what just happened: the elements (1, 0), (5, 3), (7, 1),and (2, 2) were selected.
- The behavior of fancy indexing is the rectangular region formed by selecting a subset of the matrix’s rows and columns.

In [75]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

In [96]:
arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

### Transposing Arrays and Swapping Axes

In [76]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [77]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

- When doing matrix computations, you will do this very often, like for example computing the inner matrix product X<sup>T</sup>X using np.dot


In [78]:
arr = np.random.randn(6, 3)


In [79]:
np.dot(arr.T, arr)

array([[ 2.09729817, -1.30088457, -1.33686785],
       [-1.30088457,  3.19054147,  1.42375315],
       [-1.33686785,  1.42375315,  2.63676598]])

### Universal Functions: Fast Element-wise Array Functions

- A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [80]:
arr = np.arange(10)
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [81]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

- These are referred to as unary ufuncs. Others, such as add or maximum, take 2 arrays (thus, binary ufuncs) and return a single array as the result:

In [82]:
x = np.random.randn(8)
y = np.random.randn(8)

In [84]:
x

array([ 0.35754703, -0.21946983, -0.18316215,  0.24384334, -0.81410439,
       -0.08873741,  0.07096743,  0.26348861])

In [85]:
y

array([-0.05892379,  0.18746471,  0.27756368,  0.07212276, -1.31719518,
       -1.39838978, -0.01249063,  0.90729551])

In [86]:
np.maximum(x, y) # element-wise maximum

array([ 0.35754703,  0.18746471,  0.27756368,  0.24384334, -0.81410439,
       -0.08873741,  0.07096743,  0.90729551])

_Table 2. Unary ufuncs_

|Function|Description|
|--------|:-----------|
|abs, fabs|Compute the absolute value element-wise for integer, floating point, or complex values. Use fabs as a faster alternative for non-complex-valued data|
|sqrt|Compute the square root of each element. Equivalent to arr ** 0.5|
|square|Compute the square of each element. Equivalent to arr ** 2|
|exp|Compute the exponent ex of each element|
|log, log10, log2, log1p|Natural logarithm (base e), log base 10, log base 2, and log(1 + x), respectively|
|ceil|Compute the ceiling of each element, i.e. the smallest integer greater than or equal to each element|
|floor|Compute the floor of each element, i.e. the largest integer less than or equal to each element|
|rint|Round elements to the nearest integer, preserving the dtype|
|isnan|Return boolean array indicating whether each value is NaN (Not a Number)|

_Table 3. Binary universal functions_

|Function|Description|
|--------|:-----------|
|add|Add corresponding elements in arrays|
|subtract|Subtract elements in second array from first array|
|multiply|Multiply array elements|
|divide, floor_divide|Divide or floor divide (truncating the remainder)|
|power|Raise elements in first array to powers indicated in second array|
|maximum, fmax|Element-wise maximum. fmax ignores NaN|
|minimum, fmin|Element-wise minimum. fmin ignores NaN|
|mod|Element-wise modulus (remainder of division)|
|greater, greater_equal,less,<br>less_equal, equal,not_equal|Perform element-wise comparison,yielding boolean array.<br>Equivalent to infix operators >, >=, <, <=, ==, !=|