# `NumPy` : In-Class Exercises

Author: Amit Deokar, Ph.D.

Created to supplement Chapter 4 (NumPy Basics) "Pandas for Data Analysis" book by Wes McKinney.

## What is NumPy?
- One of the key data packages for scientific computing in Python.
- Contains functionality for multidimensional arrays
- Contains high-level math functions such as linear algebra operations, Fourier transform, and pseudo random number generators.

In [2]:
import numpy as np
np.random.seed(12345)
np.set_printoptions(precision=4, suppress=True)

In [3]:
import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))
%matplotlib inline

## What is a NumPy array,  a.k.a. `ndarray`?
- Fundamental data structure, which is the basis of `scikit-learn` Python library used for machine learning.
- `scikit-learn` takes in data in the form of NumPy arrays.
- An `ndarray` has n dimensions and all elements of the array **must** be of the same type.
- Extremely efficient data structure: complex computations on entire arrays can be done with NumPy without the need for Python for loops.

## Creating NumPy arrays

In [17]:
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
arr2d

array([[1, 2, 3],
       [4, 5, 6]])

In [18]:
arr2d.shape

(2, 3)

In [19]:
arr2d.size

6

In [8]:
# a list and a ndarray of 10 million numbers compared
my_arr = np.arange(10000000)
my_list = list(range(10000000))

In [11]:
%time for _ in range(10): my_arr2 = my_arr * 2

CPU times: user 167 ms, sys: 258 ms, total: 425 ms
Wall time: 425 ms


In [12]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 6.17 s, sys: 2.14 s, total: 8.31 s
Wall time: 8.32 s


## 1. Exercise

(a) Create a 1-dimentionsional NumPy array with the following list of numbers:
`[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]`

(b) Convert it into a 3-dimensional NumPy array structured as follows:
```
array([[[11, 12, 13],
        [14, 15, 16]],

       [[17, 18, 19],
        [20, 21, 22]]])
```

(c) Create the above 3-dimensional NumPy array from scratch without converting.

**Hint**: Refer to [The N-dimensional array](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.ndarray.html) documentation.

## 2. Exercise

(a) Change the first element of the 3-dimensional array created earlier to 1.

(b) Create a new 3-dimensional array of shape (2, 3, 2) consisting of all ones.

(c) Create a new 3-dimensional array of shape (3, 3, 2) consisting of all zeros.

(d) Create a new 2-dimensional 3x3 array consisting of all 2's.

(e) Create a new 2-dimensional 3x3 array filled with all random values.

(f) Create a new 2-dimensional 3x3 identity matrix.

## NumPy Data Types Examples

## 3. Exercise

(a) Create an array of numbers from 0 to 9 using `arange` function.

(b) Change the data type of the array to float data type.

(c) Create an array consisting of the following 10 numbers and convert it to integer data type.
`[10.1, 9.2, 8.3, 7.4, 6.2, 5.2, 4.7, 3.2, 2.1, 1.0]`

(d) Create an array consisting of the following strings. Then convert them into a numeric array.
`['1.75', '1.25', '3.45', '4.50', '5.25']`

**Hint**: Refer to [`astype`](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.astype.html), which copies the array by casting it to a specified type.

## NumPy Arithmetic

## 4. Exercise

(a) Create the following arrays using one-line statements each by using `arange` and `reshape` in conjunction: 

`array([[ 1,  3,  5],
       [ 7,  9, 11],
       [13, 15, 17]])`
       
 `array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])`   

(b) Perform element-wise multiplication of the two arrays.

(c) Perform dot product of the two arrays.

(d) Perform cross product of the two arrays.

**Hint**: Refer to [Mathematical functions](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html)

## Basic Indexing and Slicing - Indexing 1-d arrays

## 5. Exercise

(a) Create the following array using a one-line statement by using `arange`: 

`array([ 1,  3,  5,  7,  9, 11, 13, 15, 17])`
       
(b) Slice the 3rd through the 6th (included) elements in the array and assign it to a new variable. (**Hint**: slicing 1-d arrays is similar to Python lists)

(c) Slice the 3rd through the 6th (included) elements in the array using negative indices and assign it to a new variable. (**Hint**: slicing 1-d arrays is similar to Python lists)

(d) Assign the value 1 to the last two elements in the array slice created in part(b) using **broadcasting**.

(e) Check the value of the original array created in part (a). Is it the same as before? Why or Why not?

(f) Create a new copy (explicitly) of the slice created in part (b). Modify the first value of the new copy to 2. Does the original slice change?

## Basic Indexing and Slicing - Indexing higher dimensional arrays

## 6. Exercise

(a) Create the following 2-dimensional array using a one-line statement using `arange` and `reshape`: 

`array([[ 1,  3,  5,  7] , [9, 11, 13, 15]])`

(b) Access the 3rd element from the 1st row of the array using integer indexes.

(c) Change all the elements in the first row to 1's using broadcasting assignment.

(d) Change the last two elements of the first and second rows to 0's by indexing with slices.

## Boolean Indexing

## 7. Exercise

(a) Create the following 2-dimensional array using a one-line statement using `arange` and `reshape`: 

`array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23],
       [25, 27, 29, 31],
       [33, 35, 37, 39]])`

(b) Create an array called `cities` of strings as follows:

`array(['Mumbai', 'Shanghai', 'New York', 'London', 'Paris'])`

(c) Create a boolean array from the above array where the city value is either `London` or `Shanghai`, and assign it to a variable called `mask`.

(d) Use the boolean array created in part (c) to subselect rows of the array in part (a).

(e) Change the last columns in the original array created in part(a) to 0, only case of rows selected by the boolean array.

## Fancy Indexing - Indexing with integer arrays

## 8. Exercise

(a) Create the following 2-dimensional array using a one-line statement using `arange` and `reshape`: 

`array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23],
       [25, 27, 29, 31],
       [33, 35, 37, 39]])`

(b) Retrieve the elements from the above array at the index locations (4, 1), (2, 3) and (0, 0) using fancy indexing.

## Unary `ufuncs`, Binary Universal Functions, Mathematical Functions, and Linear Algebra

> A **ufunc** or **universal function** is a function that performs element-wise operations on data in ndarrays. 

> Think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

## 9. Exercise

(a) Create the following 2-dimensional array using a one-line statement using `arange` and `reshape`: 

`array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23],
       [25, 27, 29, 31],
       [33, 35, 37, 39]])`

(b) Compute the element-wise square root for the above array.

**Hint**: Refer to [Universal Functions](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)

(c) Create the following 2-dimensional array using a one-line statement using `arange` and `reshape`: 

`array([[ 2,  4, 6,  8],
       [ 10, 12, 14, 16],
       [18, 20, 22, 24],
       [26, 28, 30, 32],
       [34, 36, 38, 40]])`
       
Then compute the element-wise multiplication of this array and the array created in part (a).

**Hint**: Refer to [Universal Functions](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)

(d) Compute the mean of the elements of the array created in part (c).

(e) Compute the mean of the rows of the array created in part (c).

**Hint**: Refer to [Mathematical Functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

## Pseduo-Random Number Generation - Very fast

### Comparison of NumPy with Python's builtin random module

In [182]:
# Python's built-in random module
from random import normalvariate
N = 1000000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]

1.52 s ± 225 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [183]:
N = 1000000
%timeit np.random.normal(size=N)

60.4 ms ± 6.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## 10. Exercise

(a) Create a 3x3 array by drawing a random sample from a normal distribution with mean 25 and standard deviation 5. 

(b) Check the mean and standard deviation of the random sample created.

**Hint**: Refer to [Random Sampling](https://docs.scipy.org/doc/numpy/reference/routines.random.html)