# 1- NumPy Arrays

## We'll cover the following
### Chapter Goals:
* A. Arrays
* B. Copying
* C. Casting
* D. NaN
* E. Infinity
* Time to Code!

## A. Arrays

In [1]:
import numpy as np

arr = np.array([[0, 1, 2], [3, 4, 5]],
               dtype=np.float32)
print(repr(arr))

array([[0., 1., 2.],
       [3., 4., 5.]], dtype=float32)


In [2]:
import numpy as np  # import the NumPy library

# Initializing a NumPy array
arr = np.array([-1, 2, 5], dtype=np.float32)

# Print the representation of the array
print(repr(arr))

array([-1.,  2.,  5.], dtype=float32)


In [3]:
arr = np.array([0, 0.1, 2])
print(repr(arr))

array([0. , 0.1, 2. ])


## B. Copying

In [4]:
a = np.array([0, 1])
b = np.array([9, 8])
c = a
print('Array a: {}'.format(repr(a)))
c[0] = 5
print('Array a: {}'.format(repr(a)))

d = b.copy()
d[0] = 6
print('Array b: {}'.format(repr(b)))
print('Array d: {}'.format(repr(d)))

Array a: array([0, 1])
Array a: array([5, 1])
Array b: array([9, 8])
Array d: array([6, 8])


## C. Casting


In [5]:
arr = np.array([0, 1, 2])
print(arr.dtype)
arr = arr.astype(np.float32)
print(arr.dtype)

int32
float32


## D. NaN
### Note that np.nan cannot take on an integer type.
### A common usage for np.nan is as a filler value for incomplete data

In [6]:
arr = np.array([np.nan, 1, 2])
print(repr(arr))

arr = np.array([np.nan, 'abc'])
print(repr(arr))

# Will result in a ValueError: If we uncomment line 8 and run again.
#np.array([np.nan, 1, 2], dtype=np.int32)

np.array([np.nan, 1, 2], dtype=np.float32)

array([nan,  1.,  2.])
array(['nan', 'abc'], dtype='<U32')


array([nan,  1.,  2.], dtype=float32)

## E. Infinity

### we use the np.inf special value. We can also represent negative infinity with -np.inf.

### The code below shows an example usage of np.inf. Note that np.inf cannot take on an integer type.

In [7]:
print(np.inf > 1000000)

arr = np.array([np.inf, 5])
print(repr(arr))

arr = np.array([-np.inf, 1])
print(repr(arr))

# Will result in a OverflowError: If we uncomment line 10 and run again.
#np.array([np.inf, 3], dtype=np.int32)
np.array([np.inf, 3], dtype=np.float32)

True
array([inf,  5.])
array([-inf,   1.])


array([inf,  3.], dtype=float32)


## Time to Code!

In [8]:
# The first array we'll create comes straight from a list of integers and np.nan.
# The list contains np.nan as the first element, and the integers from 2 to 5, inclusive, as the next four elements.
# Set arr equal to np.array applied to the specified list.

In [9]:
# CODE HERE

import numpy as np
arr = np.array([np.nan, 2,3,4,5])

print(repr(arr))

array([nan,  2.,  3.,  4.,  5.])


We now want to copy the array so we can change the first element to 10. This way we don't modify the original array.
Set arr2 equal to arr.copy(), then set the first element of arr2 equal to 10.

In [10]:
# CODE HERE
import numpy as np

arr = np.array([np.nan, 2,3,4,5])

arr2 = arr.copy()

arr2[0] = 10

print(repr(arr2))

array([10.,  2.,  3.,  4.,  5.])


The next two arrays will use floating point numbers. The first array will be upcast to floating point numbers,
while we manually cast the second array using np.float32.

For manual casting, we use an array's inherent astype function, which takes in the new type as an argument and
returns the casted array.

Set float_arr equal to np.array applied to a list with elements 1, 5.4, and 3, in that order.

Set float_arr2 equal to arr2.astype, with argument np.float32.

In [11]:
float_arr = np.array([1, 5.4, 3])
float_arr2 = arr2.astype(np.float32)

print(repr(float_arr))
print(repr(float_arr2))

array([1. , 5.4, 3. ])
array([10.,  2.,  3.,  4.,  5.], dtype=float32)


The final array will be a multi-dimensional array, specifically a 2-D matrix.
The 2-D matrix will have the integers 1, 2, 3 in its first row, and the integers 4, 5, 6 in its second row.
We'll also manually set its type to np.float32.

Set matrix equal to np.array with a list of lists (representing the specified 2-D matrix)
as the first argument, and np.float32 as the dtype keyword argument.

In [12]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]], dtype=np.float32)

print(repr(matrix))

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)


# 2- NumPy Basics

## We'll cover the following

### Chapter Goals:
* A. Ranged data
* B. Reshaping data
* C. Transposing
* D. Zeros and ones
* Time to Code!

## A. Ranged data:

While np.array can be used to create any array, it is equivalent to hardcoding an array.
This won't work when the array has hundreds of values. Instead, NumPy provides an option
to create ranged data arrays using np.arange. The function acts very similar to the range function in Python,
and will always return a 1-D array.

The code below contains example usages of np.arange.

In [13]:
arr = np.arange(5)
print(repr(arr))

arr = np.arange(5.1)
print(repr(arr))

arr = np.arange(-1, 4)
print(repr(arr))

arr = np.arange(-1.5, 4, 2)
print(repr(arr))

array([0, 1, 2, 3, 4])
array([0., 1., 2., 3., 4., 5.])
array([-1,  0,  1,  2,  3])
array([-1.5,  0.5,  2.5])


To specify the number of elements in the returned array, rather than the step size, we can use the np.linspace function.

This function takes in a required first two arguments, for the start and end of the range, respectively.

The end of the range is inclusive for np.linspace, unless the keyword argument endpoint is set to False.

To specify the number of elements, we set the num keyword argument (its default value is 50).

The code below shows example usages of np.linspace. It also takes in the dtype keyword argument for manual casting.

In [14]:
arr = np.linspace(5, 11, num=4)
print(repr(arr))

arr = np.linspace(5, 11, num=4, endpoint=False)
print(repr(arr))

arr = np.linspace(5, 11, num=4, dtype=np.int32)
print(repr(arr))

array([ 5.,  7.,  9., 11.])
array([5. , 6.5, 8. , 9.5])
array([ 5,  7,  9, 11])


## B. Reshaping data

The function we use to reshape data in NumPy is np.reshape. It takes in an array and a new shape as required arguments.
The new shape must exactly contain all the elements from the input array. For example, we could reshape an array with
12 elements to (4, 3), but we can't reshape it to (4, 4).

We are allowed to use the special value of -1 in at most one dimension of the new shape.
The dimension with -1 will take on the value necessary to allow the new shape to contain all the elements of the array.
 
The code below shows example usages of np.reshape.

In [15]:
arr = np.arange(8)

reshaped_arr = np.reshape(arr, (2, 4))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

reshaped_arr = np.reshape(arr, (-1, 2, 2))
print(repr(reshaped_arr))
print('New shape: {}'.format(reshaped_arr.shape))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
New shape: (2, 4)
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])
New shape: (2, 2, 2)


## Flatenning an array:

Flattening an array reshapes it into a 1D array. Since we need to flatten data quite often, it is a useful function.

In [16]:
arr = np.arange(8)
arr = np.reshape(arr, (2, 4))
flattened = arr.flatten()
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(flattened))
print('flattened shape: {}'.format(flattened.shape))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])
arr shape: (2, 4)
array([0, 1, 2, 3, 4, 5, 6, 7])
flattened shape: (8,)


## C. Transposing

We can just transpose the data, using the np.transpose function, to convert it to the proper format.

The code below shows an example usage of the np.transpose function. The matrix rows become columns after the transpose.

In [17]:
arr = np.arange(8)
arr = np.reshape(arr, (4, 2))
transposed = np.transpose(arr)
print(repr(arr))
print('arr shape: {}'.format(arr.shape))
print(repr(transposed))
print('transposed shape: {}'.format(transposed.shape))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])
arr shape: (4, 2)
array([[0, 2, 4, 6],
       [1, 3, 5, 7]])
transposed shape: (2, 4)


axes in transposing, which represents the new permutation of the dimensions.

The permutation is a tuple/list of integers, with the same length as the number of dimensions in the array.
 
It tells us where to switch up the dimensions. For example, if the permutation had 3 at index 1, it means the

old third dimension of the data becomes the new second dimension (since index 1 represents the second dimension).

In [18]:
arr = np.arange(24)

print(arr)

arr = np.reshape(arr, (3, 4, 2))

print(arr)

transposed = np.transpose(arr, axes=(1, 2, 0))
print('arr shape: {}'.format(arr.shape))
print('transposed shape: {}'.format(transposed.shape))

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[[[ 0  1]
  [ 2  3]
  [ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]
  [12 13]
  [14 15]]

 [[16 17]
  [18 19]
  [20 21]
  [22 23]]]
arr shape: (3, 4, 2)
transposed shape: (4, 2, 3)


In this example, the old first dimension became the new third dimension,
the old second dimension became the new first dimension, and the old third dimension became 
the new second dimension. The default value for axes is a dimension reversal
(e.g. for 3-D data the default axes value is [2, 1, 0]).

## D. Zeros and ones

Sometimes, we need to create arrays filled solely with 0 or 1. For example, since binary data is labeled with 0 and 1,
we may need to create dummy datasets of strictly one label. For creating these arrays, NumPy provides the functions
np.zeros and np.ones. They both take in the same arguments, which includes just one required argument, the array shape.
The functions also allow for manual casting using the dtype keyword argument.

The code below shows example usages of np.zeros and np.ones.

In [19]:
arr = np.zeros(4)
print(repr(arr))

arr = np.ones((2, 3))
print(repr(arr))

arr = np.ones((2, 3), dtype=np.int32)
print(repr(arr))

array([0., 0., 0., 0.])
array([[1., 1., 1.],
       [1., 1., 1.]])
array([[1, 1, 1],
       [1, 1, 1]])


If we want to create an array of 0's or 1's with the same shape as another array,
we can use np.zeros_like and np.ones_like.

The code below shows example usages of np.zeros_like and np.ones_like.

In [20]:
arr = np.array([[1, 2], [3, 4]])
print(repr(np.zeros_like(arr)))

arr = np.array([[0., 1.], [1.2, 4.]])
print(repr(np.ones_like(arr)))
print(repr(np.ones_like(arr, dtype=np.int32)))

array([[0, 0],
       [0, 0]])
array([[1., 1.],
       [1., 1.]])
array([[1, 1],
       [1, 1]])


## Time to Code!

In [21]:
# CODE HERE

arr = np.arange(12)

reshaped = arr.reshape(2,3,2)

print(repr(arr))

print(repr(reshaped))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]]])


In [22]:
# Using flatten() Function

In [23]:
reshaped.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [24]:
# Using transpose function()

In [25]:
# CODE HERE

flattened = reshaped.flatten()

transposed = np.transpose(reshaped, axes=(1, 2, 0))

print(transposed)

[[[ 0  6]
  [ 1  7]]

 [[ 2  8]
  [ 3  9]]

 [[ 4 10]
  [ 5 11]]]


In [26]:
zeros_arr = np.zeros(5)

ones_arr = np.ones_like(transposed)

zeros_like = np.zeros_like(transposed)

print(repr(zeros_arr))
print(repr(ones_arr))
print(repr(zeros_like))

array([0., 0., 0., 0., 0.])
array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]])
array([[[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]],

       [[0, 0],
        [0, 0]]])


In [27]:
points = np.linspace(-3.5,1.5, num = 101)
print(repr(points))

array([-3.5 , -3.45, -3.4 , -3.35, -3.3 , -3.25, -3.2 , -3.15, -3.1 ,
       -3.05, -3.  , -2.95, -2.9 , -2.85, -2.8 , -2.75, -2.7 , -2.65,
       -2.6 , -2.55, -2.5 , -2.45, -2.4 , -2.35, -2.3 , -2.25, -2.2 ,
       -2.15, -2.1 , -2.05, -2.  , -1.95, -1.9 , -1.85, -1.8 , -1.75,
       -1.7 , -1.65, -1.6 , -1.55, -1.5 , -1.45, -1.4 , -1.35, -1.3 ,
       -1.25, -1.2 , -1.15, -1.1 , -1.05, -1.  , -0.95, -0.9 , -0.85,
       -0.8 , -0.75, -0.7 , -0.65, -0.6 , -0.55, -0.5 , -0.45, -0.4 ,
       -0.35, -0.3 , -0.25, -0.2 , -0.15, -0.1 , -0.05,  0.  ,  0.05,
        0.1 ,  0.15,  0.2 ,  0.25,  0.3 ,  0.35,  0.4 ,  0.45,  0.5 ,
        0.55,  0.6 ,  0.65,  0.7 ,  0.75,  0.8 ,  0.85,  0.9 ,  0.95,
        1.  ,  1.05,  1.1 ,  1.15,  1.2 ,  1.25,  1.3 ,  1.35,  1.4 ,
        1.45,  1.5 ])


# 3. Math
## Understand how arithmetic and linear algebra work in NumPy.

### We'll cover the following

### Chapter Goals:
* Arithmetic
* Non-linear functions
* Matrix multiplication
* Time to Code!

## A. Arithmetic
One of the main purposes of NumPy is to perform multi-dimensional arithmetic. Using NumPy arrays, we can apply arithmetic to each element with a single operation.

The code below shows multi-dimensional arithmetic with NumPy.

In [28]:
arr = np.array([[1, 2], [3, 4]])
# Add 1 to element values
print(repr(arr + 1))
# Subtract element values by 1.2
print(repr(arr - 1.2))
# Double element values
print(repr(arr * 2))
# Halve element values
print(repr(arr / 2))
# Integer division (half)
print(repr(arr // 2))
# Square element values
print(repr(arr**2))
# Square root element values
print(repr(arr**0.5))

array([[2, 3],
       [4, 5]])
array([[-0.2,  0.8],
       [ 1.8,  2.8]])
array([[2, 4],
       [6, 8]])
array([[0.5, 1. ],
       [1.5, 2. ]])
array([[0, 1],
       [1, 2]], dtype=int32)
array([[ 1,  4],
       [ 9, 16]])
array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])


Using NumPy arithmetic, we can easily modify large amounts of numeric data with only a few operations. For example, we could convert a dataset of Fahrenheit temperatures to their equivalent Celsius form.

The code below converts Fahrenheit to Celsius in NumPy.

In [29]:
def f2c(temps):
  return (5/9)*(temps-32)

fahrenheits = np.array([32, -4, 14, -40])
celsius = f2c(fahrenheits)
print('Celsius: {}'.format(repr(celsius)))

Celsius: array([  0., -20., -10., -40.])


It is important to note that performing arithmetic on NumPy arrays does not change the original array, and instead produces a new array that is the result of the arithmetic operation.

# B. Non-linear functions#
Apart from basic arithmetic operations, NumPy also allows you to use non-linear functions such as exponentials and logarithms.

The function **np.exp** performs a base e exponential on an array, while the function **np.exp2** performs a base 2 exponential. Likewise, **np.log**, **np.log2**, and **np.log10** all perform logarithms on an input array, using base e, base 2, and base 10, respectively.

The code below shows various exponentials and logarithms with NumPy. Note that **np.e** and **np.pi** represent the mathematical constants e and π, respectively.

In [30]:
arr = np.array([[1, 2], [3, 4]])
# Raised to power of e
print(repr(np.exp(arr)))
# Raised to power of 2
print(repr(np.exp2(arr)))

arr2 = np.array([[1, 10], [np.e, np.pi]])

print(repr(arr2))

# Natural logarithm
print(repr(np.log(arr2)))
# Base 10 logarithm
print(repr(np.log10(arr2)))

array([[ 2.71828183,  7.3890561 ],
       [20.08553692, 54.59815003]])
array([[ 2.,  4.],
       [ 8., 16.]])
array([[ 1.        , 10.        ],
       [ 2.71828183,  3.14159265]])
array([[0.        , 2.30258509],
       [1.        , 1.14472989]])
array([[0.        , 1.        ],
       [0.43429448, 0.49714987]])


To do a regular power operation with any base, we use **np.power**. The first argument to the function is the base, while the second is the power. If the base or power is an array rather than a single number, the operation is applied to every element in the array.

The code below shows examples of using **np.power**.

In [31]:
arr = np.array([[1, 2], [3, 4]])
# Raise 3 to power of each number in arr
print(repr(np.power(3, arr)))
arr2 = np.array([[10.2, 4], [3, 5]])
# Raise arr2 to power of each number in arr
print(repr(np.power(arr2, arr)))

array([[ 3,  9],
       [27, 81]], dtype=int32)
array([[ 10.2,  16. ],
       [ 27. , 625. ]])


## C. Matrix multiplication#
Since NumPy arrays are basically vectors and matrices, it makes sense that there are functions for dot products and matrix multiplication. Specifically, the main function to use is **np.matmul**, which takes two vector/matrix arrays as input and produces a dot product or matrix multiplication.

The code below shows various examples of matrix multiplication. When both inputs are 1-D, the output is the dot product.

Note that the dimensions of the two input matrices must be valid for a matrix multiplication. Specifically, the second dimension of the first matrix must equal the first dimension of the second matrix, otherwise **np.matmul** will result in a **ValueError**.

In [32]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([-3, 0, 10])
print(np.matmul(arr1, arr2))

arr3 = np.array([[1, 2], [3, 4], [5, 6]])
arr4 = np.array([[-1, 0, 1], [3, 2, -4]])
print(repr(np.matmul(arr3, arr4)))
print(repr(np.matmul(arr4, arr3)))

# This will result in a ValueError: If we uncomment and run again.
#print(repr(np.matmul(arr3, arr3)))

27
array([[  5,   4,  -7],
       [  9,   8, -13],
       [ 13,  12, -19]])
array([[  4,   4],
       [-11, -10]])


## Time to Code!


In [33]:
arr = np.array([[-0.5, 0.8, -0.1],
                [0.0, -1.2, 1.3]])

arr2 = np.array([[1.2, 3.1],
                 [1.2, 0.3],
                 [1.5, 2.2]])

Next we'll apply some arithmetic to arr. Specifically, we'll do multiplication, addition, and squaring.

Set multiplied equal to arr multiplied by np.pi.

Then set added equal to the result of adding arr and multiplied.

Finally, set squared equal to added with each of its elements squared.

In [34]:
multiplied = arr * np.pi
added = arr + multiplied
squared = added**2

print(repr(multiplied))
print(repr(added))
print(repr(squared))

array([[-1.57079633,  2.51327412, -0.31415927],
       [ 0.        , -3.76991118,  4.08407045]])
array([[-2.07079633,  3.31327412, -0.41415927],
       [ 0.        , -4.96991118,  5.38407045]])
array([[ 4.28819743, 10.97778541,  0.1715279 ],
       [ 0.        , 24.70001718, 28.98821461]])


After the arithmetic operations, we'll apply the base e exponential and logarithm to our array matrices.

Set exponential equal to np.exp applied to squared.

Then set logged equal to np.log applied to arr2.

In [35]:
# CODE HERE

exponential = np.exp(squared)

logged = np.log(arr2)

print(repr(exponential))
print(repr(logged))

array([[7.28350596e+01, 5.85587272e+04, 1.18711726e+00],
       [1.00000000e+00, 5.33434578e+10, 3.88527393e+12]])
array([[ 0.18232156,  1.13140211],
       [ 0.18232156, -1.2039728 ],
       [ 0.40546511,  0.78845736]])


Note that exponential has shape (2, 3) and logged has shape (3, 2). So we can perform matrix multiplication both ways.

Set matmul1 equal to np.matmul with first argument logged and second argument exponential. Note that matmul1 will have shape (3, 3).

Then set matmul2 equal to np.matmul with first argument exponential and second argument logged. Note that matmul2 will have shape (2, 2).

In [36]:
matmul1 = np.matmul(logged, exponential)

matmul2 = np.matmul(exponential, logged)

print(repr(matmul1))
print(repr(matmul2))

array([[ 1.44108036e+01,  6.03529115e+10,  4.39580713e+12],
       [ 1.20754286e+01, -6.42240618e+10, -4.67776415e+12],
       [ 3.03205327e+01,  4.20590657e+10,  3.06337283e+12]])
array([[ 1.06902790e+04, -7.04197733e+04],
       [ 1.58506868e+12,  2.99914875e+12]])


# 4. Random
## Generate numbers and arrays from different random distributions.

### We'll cover the following

### Chapter Goals:
* A. Random integers
* B. Utility functions
* C. Distributions
* D. Custom sampling
* Time to Code!

## A. Random integers
Similar to the Python **random** module, NumPy has its own submodule for pseudo-random number generation called **np.random**. It provides all the necessary randomized operations and extends it to multi-dimensional arrays. To generate pseudo-random integers, we use the **np.random.randint** function.

The code below shows example usages of **np.random.randint**.

In [37]:
print(np.random.randint(5))
print(np.random.randint(5))
print(np.random.randint(5, high=6))

random_arr = np.random.randint(-10, high=30,
                               size=(4, 5))
print(repr(random_arr))

2
1
5
array([[ -7,   9,  29,  -6,  29],
       [ -7,  11,  24,  22,   7],
       [ -8,  16,  10,   7,  12],
       [ 23,  -2,  25,   2, -10]])


The np.random.randint function takes in a single required argument, which actually depends on the high keyword argument. If high=None (which is the default value), then the required argument represents the upper (exclusive) end of the range, with the lower end being 0. Specifically, if the required argument is n, then the random integer is chosen uniformly from the range [0, n).

If **high is not None**, then the required **argument** will **represent** the lower (**inclusive**) end of the range, while **high** **represents** the upper (**exclusive**) end.

The **size** keyword argument specifies the **size of the output array**, where each integer in the array is randomly drawn from the specified range. As a default, np.random.randint returns a single integer.

## B. Utility functions
Some fundamental utility functions from the np.random module are np.random.seed and np.random.shuffle. We use the np.random.seed function to set the random seed, which allows us to control the outputs of the pseudo-random functions. The function takes in a single integer as an argument, representing the random seed.

The code below uses np.random.seed with the same random seed. Note how the outputs of the random functions in each subsequent run are identical when we set the same random seed.

### Using np.random.seed() Function

In [38]:
np.random.seed(1)
print(np.random.randint(10))
random_arr = np.random.randint(3, high=100,
                               size=(2, 2))
print(repr(random_arr))

# New seed
np.random.seed(2)
print(np.random.randint(10))
random_arr = np.random.randint(3, high=100,
                               size=(2, 2))
print(repr(random_arr))

# Original seed
np.random.seed(1)
print(np.random.randint(10))
random_arr = np.random.randint(3, high=100,
                               size=(2, 2))
print(repr(random_arr))

5
array([[15, 75],
       [12, 78]])
8
array([[18, 75],
       [25, 46]])
5
array([[15, 75],
       [12, 78]])


### Using np.random.shuffle() function

The np.random.shuffle function allows us to randomly shuffle an array. Note that the shuffling happens in place (i.e. no return value), and shuffling multi-dimensional arrays only shuffles the first dimension.

The code below shows example usages of np.random.shuffle. Note that **only the rows of matrix are shuffled** (i.e. shuffling along first dimension only).

In [39]:
vec = np.array([1, 2, 3, 4, 5])
np.random.shuffle(vec)
print(repr(vec))
np.random.shuffle(vec)
print(repr(vec))

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
np.random.shuffle(matrix)
print(repr(matrix))

array([3, 4, 2, 5, 1])
array([5, 3, 4, 2, 1])
array([[4, 5, 6],
       [7, 8, 9],
       [1, 2, 3]])


## C. Distributions
Using **np.random we can also draw samples from probability distributions**. For example, we can use **np.random.uniform** to draw **pseudo-random real numbers from a uniform distribution**.

The code below shows usages of np.random.uniform.

In [40]:
print(np.random.uniform())
print(np.random.uniform(low=-1.5, high=2.2))
print(repr(np.random.uniform(size=3)))
print(repr(np.random.uniform(low=-3.4, high=5.9,
                             size=(2, 2))))

0.3132735169322751
0.4408281904196243
array([0.44345289, 0.22957721, 0.53441391])
array([[5.09984683, 0.85200471],
       [0.60549667, 5.33388844]])


The function np.random.uniform actually has no required arguments. The keyword arguments, low and high, represent the inclusive lower end and exclusive upper end from which to draw random samples. Since they have default values of 0.0 and 1.0, respectively, the default outputs of np.random.uniform come from the range [0.0, 1.0).

The size keyword argument is the same as the one for np.random.randint, i.e. it represents the output size of the array.

Another popular distribution we can sample from is the **normal (Gaussian) distribution**. The function we use is **np.random.normal**.

The code below shows usages of np.random.normal.

![image.png](attachment:image.png)

In [41]:
print(np.random.normal())
print(np.random.normal(loc=1.5, scale=3.5))
print(repr(np.random.normal(loc=-2.4, scale=4.0,
                            size=(2, 2))))

0.7252740646272712
4.772112039383628
array([[ 2.07318791, -2.17754724],
       [-0.89337346, -0.89545991]])


Like np.random.uniform, np.random.normal **has no required arguments** 
The **loc** and **scale** keyword arguments represent the **mean** and **standard deviation**, **respectively**, of the normal distribution we sample from.

## D. Custom sampling
While NumPy provides built-in distributions to sample from, we can also sample from a custom distribution with the **np.random.choice()** function.

The code below shows example usages of np.random.choice.

In [42]:
colors = ['red', 'blue', 'green']
print(np.random.choice(colors))
print(repr(np.random.choice(colors, size=2)))
print(repr(np.random.choice(colors, size=(2, 2),
                            p=[0.8, 0.19, 0.01])))

green
array(['blue', 'red'], dtype='<U5')
array([['red', 'red'],
       ['blue', 'red']], dtype='<U5')


The required argument for np.random.choice is the custom distribution we sample from. The **p keyword argument denotes the probabilities** given to each element in the input distribution. Note that the list of probabilities for **p must sum to 1**.

In the example, we set p such that 'red' has a probability of 0.8 of being chosen, 'blue' has a probability of 0.19, and 'green' has a probability of 0.01. When p is not set, the probabilities are equal for each element in the distribution (and sum to 1).

## Time to Code!

In [43]:
# CODE HERE

random1 = np.random.randint(5)

random_arr = np.random.randint(3, high = 10, size = (3, 5))

print(repr(random1))
print(repr(random_arr))

4
array([[3, 4, 6, 7, 5],
       [7, 3, 8, 6, 4],
       [5, 3, 7, 4, 5]])


In [44]:
random_uniform  = np.random.uniform(low = -2.5, high = 1.5, size = (5))

print(repr(random_uniform))

array([ 0.65711731, -2.08709597, -0.7084259 ,  1.13438201, -1.32554341])


In [45]:
random_norm = np.random.normal(loc = 2.0, scale = 3.5, size = (10, 5))

print(repr(random_norm))

array([[-0.42081263,  0.61136266, -0.40510445, -0.95821975, -0.34936146],
       [ 1.9556739 , -1.91058622,  2.82045494,  7.80930762,  4.59715456],
       [ 1.32857557, -1.10670137, -0.61505403,  7.9235911 ,  2.17782714],
       [-0.22948476,  2.6682042 ,  9.35089298,  2.42055633,  4.16021088],
       [ 3.05059612,  0.76712554, -1.99881369,  0.77730047,  1.26887018],
       [ 4.05318117,  4.93644195,  5.25885728,  2.99955564,  5.09799407],
       [-0.64039279,  6.38503854,  3.79525437,  0.95667508,  3.70981351],
       [ 1.735499  ,  5.96070286,  7.31935886,  9.64951392, -2.88773717],
       [-3.05439832,  0.23436948,  2.56012974,  5.06659122,  3.10472232],
       [-5.07770426,  0.92828596,  4.89791125,  2.80533157,  4.66703913]])


In [46]:
choices = ['a', 'b', 'c', 'd']

choice = np.random.choice(choices, p = [0.5, 0.1, 0.2, 0.2])

print(repr(choice))

'a'


In [47]:
arr = np.arange(1, 6)

print(repr(arr))

np.random.shuffle(arr)

print(repr(arr))

array([1, 2, 3, 4, 5])
array([5, 3, 4, 2, 1])


# 5. Indexing
## Index into NumPy arrays to extract data and array slices.

### We'll cover the following

### Chapter Goals:
* A. Array accessing
* B. Slicing
* C. Argmin and argmax
* Time to Code!

## A. Array accessing
Accessing NumPy arrays is identical to accessing Python lists. For multi-dimensional arrays, it is equivalent to accessing Python lists of lists.

The code below shows example accesses of NumPy arrays.

In [48]:
arr = np.array([1, 2, 3, 4, 5])
print(arr[0])
print(arr[4])

arr = np.array([[6, 3], [0, 2]])
# Subarray
print(repr(arr[0]))

1
5
array([6, 3])


## B. Slicing
NumPy arrays also support slicing. Similar to Python, we use the colon operator (i.e. **arr[:]**) for slicing. We can also use negative indexing to slice in the backwards direction.

The code below shows example slices of a 1-D NumPy array.

In [49]:
arr = np.array([1, 2, 3, 4, 5])
print(repr(arr[:]))
print(repr(arr[1:]))
print(repr(arr[2:4]))
print(repr(arr[:-1]))
print(repr(arr[-2:]))

array([1, 2, 3, 4, 5])
array([2, 3, 4, 5])
array([3, 4])
array([1, 2, 3, 4])
array([4, 5])


For multi-dimensional arrays, we can use a comma to separate slices across each dimension.

The code below shows example slices of a 2-D NumPy array.

In [50]:
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])
print(repr(arr[:]))
print(repr(arr[1:]))
print(repr(arr[:, -1]))
print(repr(arr[:, 1:]))
print(repr(arr[0:1, 1:]))
print(repr(arr[0, 1:]))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
array([[4, 5, 6],
       [7, 8, 9]])
array([3, 6, 9])
array([[2, 3],
       [5, 6],
       [8, 9]])
array([[2, 3]])
array([2, 3])


## C. Argmin and argmax#
In addition to accessing and slicing arrays, it is useful to figure out the actual indexes of the minimum and maximum elements. To do this, we use the np.argmin and np.argmax functions.

The code below shows example usages of **np.argmin** and **np.argmax**. Note that the index of element -6 is index 5 in the flattened version of arr.

In [51]:
arr = np.array([[-2, -1, -3],
                [4, 5, -6],
                [-3, 9, 1]])
print(np.argmin(arr[0]))
print(np.argmax(arr[2]))
print(np.argmin(arr))

2
1
5


The np.argmin and np.argmax functions take the same arguments. The required argument is the input array and the axis keyword argument specifies which dimension to apply the operation on.

The code below shows how the axis keyword argument is used for these functions.

In [52]:
arr = np.array([[-2, -1, -3],
                [4, 5, -6],
                [-3, 9, 1]])
print(repr(np.argmin(arr, axis=0)))
print(repr(np.argmin(arr, axis=1)))
print(repr(np.argmax(arr, axis=-1)))

array([2, 0, 1], dtype=int64)
array([2, 2, 0], dtype=int64)
array([1, 1, 1], dtype=int64)


In our example, using **axis=0** meant the function found the index of the minimum row element for each column. When we used **axis=1**, the function found the index of the minimum column element for each row.

Setting axis to **-1** just means we apply the function across the **last dimension**. In this case, **axis=-1** is equivalent to axis=1.

## Time to Code!

In [53]:
# to get the 3rd element in the 2nd row of a 2D matrix

def direct_index(data):
    # CODE HERE
    elem = data[1,2]
    # or data[1][2]
    return elem

In [54]:
data = np.array([[8,10,3],
                [2,4,5],
                [4,7,8]])

In [55]:
direct_index(data)

5

The next function, slice_data, will return two slices from the input data.

The first slice will contain all the rows, but will skip the first element in each row. The second slice will contain all the elements of the first three rows except the last two elements.

Set slice1 equal to the specified first slice. Remember that NumPy uses a comma to separate slices along different dimensions.

Set slice2 equal to the specified second slice.

Return a tuple containing slice1 and slice2, in that order.

In [56]:
def slice_data(data):
    # CODE HERE
    slice1 = data[:, 1:]

    slice2 = data[:3, :-2]

    return slice1, slice2

In [57]:
slice_data(data)

(array([[10,  3],
        [ 4,  5],
        [ 7,  8]]),
 array([[8],
        [2],
        [4]]))

The next function, argmin_data, will find minimum indexes in the input data.

We can use np.argmin to find minimum points in the data array. First, we'll find the index of the overall minimum element.

We can also return the indexes of each row's minimum element. This is equivalent to finding the minimum column for each row, which means our operation is done along axis 1.

Set argmin_all equal to np.argmin with data as the only argument.

Set argmin1 equal to np.argmin with data as the first argument and the specified axis keyword argument.

Return a tuple containing argmin_all and argmin1, in that order.

In [58]:
def argmin_data(data):
  # CODE HERE
  argmin_all = np.argmin(data)
  argmin1 = np.argmin(data, axis = 1)

  return argmin_all, argmin1

In [59]:
argmin_data(data)

(3, array([2, 0, 0], dtype=int64))

The final function, argmax_data, will find the index of each row's maximum element in data. Since there are only 2 dimensions in data, we can apply the operation along either axis 1 or -1.

Set argmax_neg1 equal to np.argmax with data as the first argument and -1 as the axis keyword argument. Then return argmax_neg1.



In [60]:
def argmax_data(data):
    # CODE HERE
    argmax_neg1 = np.argmax(data, axis = -1)

    return argmax_neg1

In [61]:
argmax_data(data)

array([1, 2, 2], dtype=int64)

# 6. Filtering
## Filter NumPy data for specific values.

### We'll cover the following

### Chapter Goals:
* A. Filtering data
* B. Filtering in NumPy
* C. Axis-wise filtering
* Time to Code!

## A. Filtering data
Sometimes we have data that contains values we don't want to use. For example, when tracking the best hitters in baseball, we may want to only use the batting average data above .300. In this case, we should filter the overall data for only the values that we want.

The key to filtering data is through basic relation operations, e.g. ==, >, etc. In NumPy, we can apply basic relation operations element-wise on arrays.

The code below shows relation operations on NumPy arrays. The ~ operation represents a boolean negation, i.e. it flips each truth value in the array.

In [62]:
arr = np.array([[0, 2, 3],
                [1, 3, -6],
                [-3, -2, 1]])
print(repr(arr == 3))
print(repr(arr > 0))
print(repr(arr != 1))
# Negated from the previous step
print(repr(~(arr != 1)))

array([[False, False,  True],
       [False,  True, False],
       [False, False, False]])
array([[False,  True,  True],
       [ True,  True, False],
       [False, False,  True]])
array([[ True,  True,  True],
       [False,  True,  True],
       [ True,  True, False]])
array([[False, False, False],
       [ True, False, False],
       [False, False,  True]])


Something to note is that **np.nan** can't be used with any relation operation. Instead, we use **np.isnan** to filter for the location of np.nan.

The code below uses **np.isnan** to determine which locations of the array **contain np.nan** values.

In [63]:
arr = np.array([[0, 2, np.nan],
                [1, np.nan, -6],
                [np.nan, -2, 1]])
print(repr(np.isnan(arr)))

array([[False, False,  True],
       [False,  True, False],
       [ True, False, False]])


## B. Filtering in NumPy
The **np.where** function takes in a required first argument, which is a boolean array where True represents the locations of the elements we want to filter for. When the function is applied with only the first argument, it returns a tuple of 1-D arrays.

The tuple will have size equal to the number of dimensions in the data, and each array represents the True indices for the corresponding dimension. Note that the arrays in the tuple will all have the same length, equal to the number of True elements in the input argument.

The code below shows how to use np.where with a single argument.

In [64]:
print(repr(np.where([True, False, True])))

arr = np.array([0, 3, 5, 3, 1])
print(repr(np.where(arr == 3)))

arr = np.array([[0, 2, 3],
                [1, 0, 0],
                [-3, 0, 0]])
x_ind, y_ind = np.where(arr != 0)
print(repr(x_ind)) # x indices of non-zero elements
print(repr(y_ind)) # y indices of non-zero elements
print(repr(arr[x_ind, y_ind]))

(array([0, 2], dtype=int64),)
(array([1, 3], dtype=int64),)
array([0, 0, 1, 2], dtype=int64)
array([1, 2, 0, 0], dtype=int64)
array([ 2,  3,  1, -3])


The interesting thing about **np.where** is that it must be applied with exactly 1 or 3 arguments. When we use 3 arguments, the first argument is still the boolean array. However, the next two arguments represent the True replacement values and the False replacement values, respectively. The output of the function now becomes an array with the same shape as the first argument.

The code below shows how to use **np.where** with **3** arguments.

In [65]:
np_filter = np.array([[True, False], [False, True]])
positives = np.array([[1, 2], [3, 4]])
negatives = np.array([[-2, -5], [-1, -8]])
print(repr(np.where(np_filter, positives, negatives)))

np_filter = positives > 2
print(repr(np.where(np_filter, positives, negatives)))

np_filter = negatives > 0
print(repr(np.where(np_filter, positives, negatives)))

array([[ 1, -5],
       [-1,  4]])
array([[-2, -5],
       [ 3,  4]])
array([[-2, -5],
       [-1, -8]])


Note that our second and third arguments necessarily had the same shape as the first argument. However, if we wanted to use a constant replacement value, e.g. -1, we could incorporate broadcasting. Rather than using an entire array of the same value, we can just use the value itself as an argument.

The code below showcases broadcasting with np.where.

In [66]:
np_filter = np.array([[True, False], [False, True]])
positives = np.array([[1, 2], [3, 4]])
print(repr(np.where(np_filter, positives, -1)))

array([[ 1, -1],
       [-1,  4]])


## C. Axis-wise filtering
If we wanted to filter based on rows or columns of data, we could use the np.any and np.all functions. Both functions take in the same arguments, and return a single boolean or a boolean array. The **required argument for both functions is a boolean array**.

The code below shows usage of **np.any** and **np.all** with a **single argument**.

In [67]:
arr = np.array([[-2, -1, -3],
                [4, 5, -6],
                [3, 9, 1]])
print(repr(arr > 0))
print(np.any(arr > 0))
print(np.all(arr > 0))

array([[False, False, False],
       [ True,  True, False],
       [ True,  True,  True]])
True
False


The np.any function is equivalent to performing a logical OR (||), while the np.all function is equivalent to a logical AND (&&) on the first argument. np.any returns true if even one of the elements in the array meets the condition and np.all returns true only if all the elements meet the condition. When only a single argument is passed in, the function is applied across the entire input array, so the returned value is a single boolean.

However, if we use a multi-dimensional input and specify the axis keyword argument, the returned value will be an array. The axis argument has the same meaning as it did for np.argmin and np.argmax from the previous chapter. Using axis=0 means the function finds the index of the minimum row element for each column. When we used axis=1, the function finds the index of the minimum column element for each row.

Setting axis to -1 just means we apply the function across the last dimension.

The code below shows examples of using np.any and np.all with the axis keyword argument.

In [68]:
arr = np.array([[-2, -1, -3],
                [4, 5, -6],
                [3, 9, 1]])
print(repr(arr > 0))
print(repr(np.any(arr > 0, axis=0)))
print(repr(np.any(arr > 0, axis=1)))
print(repr(np.all(arr > 0, axis=1)))

array([[False, False, False],
       [ True,  True, False],
       [ True,  True,  True]])
array([ True,  True,  True])
array([False,  True,  True])
array([False, False,  True])


![image.png](attachment:image.png)

In [69]:
arr = np.array([[-2, -1, -3],
                [4, 5, -6],
                [3, 9, 1]])
has_positive = np.any(arr > 0, axis=1)
print(has_positive)
print(repr(arr[np.where(has_positive)]))

[False  True  True]
array([[ 4,  5, -6],
       [ 3,  9,  1]])


## Time to Code!

In [70]:
def get_positives(data):
    # CODE HERE
    x_ind, y_ind = np.where(data > 0)

    return data[x_ind, y_ind]

In [71]:
data = np.array([[-2, -1, -3],
                [4, 5, -6],
                [3, 9, 1]])

In [72]:
get_positives(data)

array([4, 5, 3, 9, 1])

In [73]:
def replace_zeros(data):
    # CODE HERE
    zeros = np.zeros_like(data)

    zero_replace = np.where(data > 0, data, zeros)

    return zero_replace

In [74]:
replace_zeros(data)

array([[0, 0, 0],
       [4, 5, 0],
       [3, 9, 1]])

### Another solution by using broadcasting.

In [75]:
def replace_zeros(data):
    # CODE HERE
    zero_replace = np.where(data > 0, data, 0)

    return zero_replace

In [76]:
replace_zeros(data)

array([[0, 0, 0],
       [4, 5, 0],
       [3, 9, 1]])

In [77]:
def replace_neg_one(data):
    # CODE HERE
    neg_one_replace = np.where(data > 0, data, -1)

    return neg_one_replace

In [78]:
replace_neg_one(data)

array([[-1, -1, -1],
       [ 4,  5, -1],
       [ 3,  9,  1]])

Our final function, coin_flip_filter will apply a filter using a boolean array as the condition. We'll first create a boolean coin flip array with the same shape as data.

Then we filter data using bool_coin_flips as the condition. For the False values in bool_coin_flips, we replace the corresponding index in data with a 1.

Set coin_flips equal to np.random.randint with 2 as the first argument and data.shape as the size keyword argument.

Set bool_coin_flips equal to coin_flips, cast as np.bool (using the np.astype function).

Set one_replace equal to np.where with bool_coin_flips, data, and 1 as the respective arguments.

Return one_replace.

In [79]:
def coin_flip_filter(data):
    # CODE HERE
    coin_flips = np.random.randint(2, size=data.shape)
    bool_coin_flips = coin_flips.astype(bool)
    one_replace = np.where(bool_coin_flips, data, 1)
    return one_replace

In [80]:
data = np.array([[19, 20,  4, 4, 1],
                 [ 9, 2,  2,  2,  2],
                 [ 0, 8, 12,  2, 2],
                 [ 1, 2, 2,  2,  7]])

coin_flip_filter(data)

array([[ 1, 20,  4,  4,  1],
       [ 1,  2,  1,  1,  2],
       [ 1,  8,  1,  2,  2],
       [ 1,  2,  2,  1,  7]])

# 7. Statistics
## Learn how to apply statistical metrics to NumPy data.

### We'll cover the following

### Chapter Goals:
* A. Analysis
* B. Statistical metrics
* Time to Code!

## A. Analysis

The code below shows example usages of the min and max functions.
> axis values:
* 0 - Columns
* 1 - Rows

In [81]:
arr = np.array([[0, 72, 3],
                [1, 3, -60],
                [-3, -2, 4]])
print(arr.min())
print(arr.max())

print(repr(arr.min(axis=0)))
print(repr(arr.max(axis=-1)))

-60
72
array([ -3,  -2, -60])
array([72,  3,  4])


## B. Statistical metrics
NumPy also provides basic statistical functions such as **np.mean**, **np.var**, and **np.median**, to calculate the mean, variance, and median of the data, respectively.

The code below shows how to obtain basic statistics with NumPy. **Note that np.median applied without axis takes the median of the flattened array**.

In [82]:
arr = np.array([[0, 72, 3],
                [1, 3, -60],
                [-3, -2, 4]])
print(np.mean(arr))
print(np.var(arr))
print(np.median(arr))
print(repr(np.median(arr, axis=-1)))

2.0
977.3333333333334
1.0
array([ 3.,  1., -2.])


## Time to Code!

In [83]:
def get_min_max(data):
    # CODE HERE
    overall_min = data.min()
    overall_max = data.max()

    return overall_min, overall_max

In [84]:
data = np.array([[0, 72, 3],
                [1, 3, -60],
                [-3, -2, 4]])

In [85]:
get_min_max(data)

(-60, 72)

In [86]:
def col_min(data):
    # CODE HERE
    min0 = data.min(axis = 0)

    return min0

In [87]:
col_min(data)

array([ -3,  -2, -60])

In [88]:
def basic_stats(data):

    mean = np.mean(data)
    median = np.median(data)
    var = np.var(data)

    return mean, median, var

In [89]:
basic_stats(data)

(2.0, 1.0, 977.3333333333334)

# 8. Aggregation
## Use aggregation techniques to combine NumPy data and arrays.

## We'll cover the following

### Chapter Goals:
* A. Summation
* B. Concatenation
* Time to Code!

# A. Summation
As we know how to calculate the sum of individual values between multiple arrays. To sum the values within a single array, we use the np.sum function.

The function takes in a NumPy array as its required argument, and uses the axis keyword argument. If the axis keyword argument is not specified, np.sum returns the overall sum of the array. The code below shows how to use np.sum

In [90]:
arr = np.array([[0, 72, 3],
                [1, 3, -60],
                [-3, -2, 4]])
print(np.sum(arr))
print(repr(np.sum(arr, axis=0)))
print(repr(np.sum(arr, axis=1)))

18
array([ -2,  73, -53])
array([ 75, -56,  -1])


In addition to regular sums, NumPy can perform cumulative sums using np.cumsum. Like np.sum, np.cumsum also takes in a NumPy array as a required argument and uses the axis argument. If the axis keyword argument is not specified, np.cumsum will return the cumulative sums for the flattened array. The code below shows how to use np.cumsum. For a 2-D NumPy array, setting axis=0 returns an array with cumulative sums across each column, while axis=1 returns the array with cumulative sums across each row. Not setting axis returns a cumulative sum across all the values of the flattened array.

In [91]:
arr = np.array([[0, 72, 3],
                [1, 3, -60],
                [-3, -2, 4]])
print(repr(np.cumsum(arr)))
print(repr(np.cumsum(arr, axis=0)))
print(repr(np.cumsum(arr, axis=1)))

array([ 0, 72, 75, 76, 79, 19, 16, 14, 18])
array([[  0,  72,   3],
       [  1,  75, -57],
       [ -2,  73, -53]])
array([[  0,  72,  75],
       [  1,   4, -56],
       [ -3,  -5,  -1]])


# B. Concatenation
An important part of aggregation is combining multiple datasets. In NumPy, this equates to combining multiple arrays into one. The function we use to do this is np.concatenate. Like the summation functions, np.concatenate uses the axis keyword argument. However, the default value for axis is 0 (i.e. dimension 0).

Furthermore, the required argument for np.concatenate is a list of arrays, which the function combines into a single array.

The code below shows how to use np.concatenate, which aggregates arrays by joining them along a specific dimension. For 2-D arrays, not setting the axis argument (defaults to axis=0) concatenates the arrays vertically. When we set axis=1, the arrays are concatenated horizontally.

In [92]:
arr1 = np.array([[0, 72, 3],
                 [1, 3, -60],
                 [-3, -2, 4]])
arr2 = np.array([[-15, 6, 1],
                 [8, 9, -4],
                 [5, -21, 18]])
print(repr(np.concatenate([arr1, arr2])))
print(repr(np.concatenate([arr1, arr2], axis=1)))
print(repr(np.concatenate([arr2, arr1], axis=1)))

array([[  0,  72,   3],
       [  1,   3, -60],
       [ -3,  -2,   4],
       [-15,   6,   1],
       [  8,   9,  -4],
       [  5, -21,  18]])
array([[  0,  72,   3, -15,   6,   1],
       [  1,   3, -60,   8,   9,  -4],
       [ -3,  -2,   4,   5, -21,  18]])
array([[-15,   6,   1,   0,  72,   3],
       [  8,   9,  -4,   1,   3, -60],
       [  5, -21,  18,  -3,  -2,   4]])


# Time to Code!
Each coding exercise in this chapter will be to complete a small function that takes in 2-D NumPy matrices as input. 

The first function to complete is get_sums, which returns the overall sum and column sums of data.

Set total_sum equal to np.sum applied to data.

Set col_sum equal to np.sum applied to >data, with axis set to 0. Return a tuple of total_sum and col_sum, in that order.

In [93]:
def get_sums(data):
    # CODE HERE
    total_sum = np.sum(data)
    col_sum = np.sum(data, axis = 0)

    return total_sum, col_sum

In [94]:
get_sums(data)

(18, array([ -2,  73, -53]))

The next function to complete is get_cumsum, which returns the cumulative sums for each row of data.

Set row_cumsum equal to np.cumsum applied to data with axis set to 1.

Then return row_cumsum.

In [95]:
def get_cumsum(data):
    # CODE HERE
    row_cumsum = np.cumsum(data, axis = 1)

    return row_cumsum

In [96]:
get_cumsum(data)

array([[  0,  72,  75],
       [  1,   4, -56],
       [ -3,  -5,  -1]])

The final function, concat_arrays, takes in two 2-D NumPy arrays as input. It returns the column-wise and row-wise concatenations of the input arrays.

Set col_concat equal to np.concatenate applied to a list of data1, data2, in that order.

Set row_concat equal to np.concatenate applied to a list of data1, data2, in that order. The axis keyword argument should be set to 1.

Return a tuple containing col_concat and row_concat, in that order.

In [97]:
def concat_arrays(data1, data2):
    # CODE HERE
    col_concat = np.concatenate([data1, data2])
    row_concat = np.concatenate([data1, data2], axis = 1)

    return col_concat, row_concat

# 9. Saving Data
## Learn how to save and load NumPy data.

## We'll cover the following

### Chapter Goals:
* A. Saving
* B. Loading
* Time to Code!

## A. Saving
After performing data manipulation with NumPy, it's a good idea to save the data in a file for future use. To do this, we use the np.save function.

The first argument for the function is the name/path of the file we want to save our data to. The file name/path should have a ".npy" extension. If it does not, then np.save will append the ".npy" extension to it.

The second argument for np.save is the NumPy data we want to save. The function has no return value. Also, the format of the ".npy" files when viewed with a text editor is largely gibberish when viewed with a text editor.

If np.save is called with the name of a file that already exists, it will overwrite the previous file.

The code below shows examples of saving NumPy data.

In [98]:
arr = np.array([1, 2, 3])
# Saves to 'arr.npy'
np.save('arr.npy', arr)
# Also saves to 'arr.npy'
np.save('arr', arr)

# B. Loading
After saving our data, we can load it again using np.load. The function's required argument is the file name/path that contains the saved data. It returns the NumPy data exactly as it was saved.

Note that np.load will not append the ".npy" extension to the file name/path if it is not there.

The code below shows how to use np.load to load NumPy data.

In [99]:
arr = np.array([1, 2, 3])
np.save('arr.npy', arr)
load_arr = np.load('arr.npy')
print(repr(load_arr))

# Will result in a FileNotFoundError: If we uncomment line 7 and run again.
#load_arr = np.load('arr')

array([1, 2, 3])


# Time to Code!
The coding exercise in this chapter will require you to complete the save_points function, which will save some randomly generated 2-D points in a file.

You'll generate 100 (x, y) points from a uniform distribution in the range [-2.5, 2.5), then save the points to save_file.

Set points equal to np.random.uniform, with the low and high keyword arguments representing the lower and upper ends of the range. The size keyword argument should be set to (100, 2).

Call np.save with save_file as the first argument and points as the second argument.

In [100]:
def save_points(save_file):
    points = np.random.uniform(low=-2.5, high=2.5, size=(100, 2))
    np.save(save_file, points)

# Quiz

# 1- What does the arange function do?

Your Answer
### A) Returns a 1-D array of numbers based on a a given range and interval

B)
Returns a random array

C)
Squares each value of an array

D)
Returns the largest element of an array

# 2 -What happens if we try to use np.save on a file without a .npy extension?

A) The function raises an exception

B) The data is saved to the file without the .npy extension

### C) The function automatically adds the .npy extension

D)The function does nothing

# 3- What is the difference between np.sum and np.cumsum?

A) The former is computationally more expensive than the latter

B) Both perform the same thing

C) The former is significantly slower than the latter

### D) The former produces the overall sum while the latter calculates cumulative sums

# 4- Consider the 2-D array, arr. What is the output of np.sum(arr, axis=1)?

A) A 1-D array containing the column sums of arr

### B) A 1-D array containing the row sums of arr

C) A 2-D array containing the cumulative column sums of arr

D) A 1-D array containing the cumulative row sums of arr