_______________
# 01. Introduction to NumPy & Vectorization
_______________
Now, since we got familiar with **Python**, it's about the time to make some proper use of it **for scientific computing**. In the following lab we will learn how to improve the speed of computations using [`NumPy`](http://www.numpy.org/) - a Python package for numeric computation that offers optimized vectorized data routines.

* __Read the text__ and have a go with the cells and the objects created by them
* __Attempt Exercises at the bottom of the notebook__
_______________
**`Note`** Basic knowledge of `python` is assumed for this part of the course, so make sure you went through and understood the material that was presented earlier. Here are some additional links in case you want to hear it from someone else once again, or in case you need a follow-up after going through this notebook: 

- [`Introduction to Jupyter notebooks`](http://bebi103.caltech.edu/2015/tutorials/t0b_intro_to_jupyter_notebooks.html)

- [`Introduction to Python for scientific computing`](http://bebi103.caltech.edu/2015/tutorials/t1a_intro_to_python.html)

- [`Python/Numpy tutorial`](http://cs231n.github.io/python-numpy-tutorial/#python)

_______________
## `Data Science Stack`
Apart from `NumPy`, we will also cover many additional libraries used for data science in Python, so you may start getting to know the better:

* [`numpy`](http://www.numpy.org/): scientific computing by using array objects


* [`pandas`](http://pandas.pydata.org/): data structures and data analysis tools


* [`matplotlib`](http://matplotlib.org/): plotting library (similar to MATLAB's plot interface)


* [`scikit-learn`](http://scikit-learn.org/stable/) machine learning library implementing many learning algorithms and useful tools.


* [`seaborn`](https://seaborn.github.io/index.html): data visualisation library which works on top of matplotlib


* [`scipy`](https://www.scipy.org/) a library based on NumPy that extends its functionality - in case you need any functions related to linear algebra, differential calculus, and signal processing.

______________
## `Morning/Evening Reads`
A list of links that are worth to follow and look through when you are having a cup of coffee in the morning or are trying to make yourself fall asleep (just joking, there's quite a lot of interesting content about AI, ML, DS, and etc., both *technical* and not).
- [`Towards Data Science`](https://towardsdatascience.com/) - articles & blog posts on data science (very broadly speaking). Often conceptual and/or ideological, but very often contains nice code starters. 
- [`KDNuggets`](https://www.kdnuggets.com/news/index.html) - an online platform on business analytics, big data, data mining, and data science. I find it more technical than TDS.
- [`Kaggle`](https://www.kaggle.com/) - a platform for machine learning competitions (you may win some prizes here, who knows). If you're really into code/solution digging (not blog post quality documentation though). 

_______________
### `Imports`
As per usual, we start by importing the packages that we will be using later. It's generally a good practice to do so at the top of a file. 

    If you have troubles importing any of these packages, make sure they are properly installed (`README` within the root of this repository).

In [1]:
import os
import sys
import numpy as np
import autotime
%load_ext autotime

# ============ Numpy  ============
[Numpy](http://www.numpy.org) is a powerful scientific computing library that provides numerous methods for working with n-dimensional arrays, which you will find highly useful in data science & machine learning. The following `Numpy` introduction is largely based on this [tutorial](http://cs231n.github.io/python-numpy-tutorial/#numpy), though some of the explanations were also borrowed from the [INF1CG](https://www.learn.ed.ac.uk/webapps/blackboard/content/listContent.jsp?course_id=_72370_1&content_id=_4289282_1) course labs (UoE).

## Why Vectorize?
`Vectorization` refers to the process of rewriting an iterative program (a program that has loops) in such a way that no loops remain. Instead of sequentially performing computations, a vectorized program performs subsets of operations at once (for trivial tasks all operations might be applied at once). 

    Vectorization is a very important and useful concept in data-science and machine learning, as problem formulation in a vectorized form can lead to extremelly large speed improvements.

Before getting into the details of how to perform computations using `NumPy`, let's see *how much faster* a vectorized operation can be. Assume that you have a large number of data stored in ```long_list``` and you want to calculate the sum of all elements in it:

In [2]:
long_list = list(range(5000000))

time: 148 ms


We have seen previously that it is possible to simply **loop through** a Python data structure in order to get the sum of the list elements. We only have to store the partial sum while we iterate through the list:

In [3]:
partial_sum = 0

for number in [1,2,3,4,5]:
    partial_sum = partial_sum + number 
    #this sort of operation is performed all the time so there exists a shorthand
    #if you want you can also write partial_sum += number

partial_sum

15

time: 7.72 ms


And since this sort of operation might be useful in the future we might want to make it a function:

In [4]:
def my_list_sum(list_of_numbers):
    partial_sum = 0
    for number in list_of_numbers:
        partial_sum += number
    return partial_sum

my_list_sum([1,2,3,4,5])

15

time: 3.15 ms


In [5]:
my_list_sum(long_list)

12499997500000

time: 230 ms


This approach is very simple and easy to understand and computing it didn't really take *that long*. Still, if we want to compare implementations we need to calculate how long it took. For that, we can use an IPython [magic command](http://ipython.readthedocs.io/en/stable/interactive/magics.html) 

```
%%timeit
```

``timeit`` calculates the average time it takes to execute a Python expression for a number of runs (and takes care of a number of issues that make estimating processing time tedious). We can use the magic command in a cell like so:

In [6]:
%%timeit
my_list_sum(long_list)

307 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time: 2.5 s


So as we've said before, it didn't really take long for this list - on a (rather slow) laptop around 31 [milliseconds](https://en.wikipedia.org/wiki/Millisecond).

Summing is a very common operation and of course there exists a build-in operator for summing in Python - we can sum elements of a list using Python's own ``sum()`` implementation.
Let's check how long it takes us to sum the elements using Python's own implementation:

In [7]:
%%timeit
sum(long_list)

52 ms ± 2.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
time: 4.27 s


On the same laptop our self-made loop takes roughly 5 times longer (your results might vary but the Python version should be considerably faster).

#### Why are built-ins faster?

When we write loops, the code is executed as Python code, i.e. the Python interpreter has to translate the Python code into bytecode instructions. The native Python operations (like ```sum, len```) are all written as optimized code and thus do not need the same amount of translation and overhead as non-native Python code.

If there exist a build-in function for the command you want to use (and you do not have very good reasons to do otherwise) use the build-in function, as:

- Less typing and thinking means less errors in your code.
- Native implementations should give you better performance.

### `Arrays` in Python
Python lists can contain different *types* of items, but if we know what type all objects in our collection are, it makes sense to explicitly state the type. In such case, the Python interpreter can take advantage of the type declaration, which results in faster computations.

**`Arrays`** are part of the Python Standard Library and provide a collection that is very similar to lists, but specifies the type of contained objects (and thus restricts all contents to be of that type). We can import the array data structure from the array library with:

In [8]:
from array import array

time: 857 µs


To create an array data structure need to specify the type that *all* items in the array will have (in this case I pick the type double, *'d'*), and as second argument the collection of elements that are contained in the array:

In [9]:
pythonArray = array('d', long_list)

time: 268 ms


Notice that we cannot declare arrays with items of different types!

In [10]:
array('d', [1,"this does not work"])

TypeError: must be real number, not str

time: 77.1 ms


Let's see if adding an array instead of a list improves performance:

In [11]:
%%timeit
sum(pythonArray)

49.9 ms ± 4.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
time: 4 s


On the same laptop we didn't get any speed-up compared to the native sum operation! Why is that?
The reason is that Python's native ``sum()`` does operate upon lists. The `Python` interpreter has to translate our array into a list to perform the operation and therefore we do not get any speedups (in fact we should be slightly slower).

As you can see at the [array documentation](https://docs.python.org/3/library/array.html), the number of operations defined on the Python array implementation is very restricted - and all operations that *are implemented* (like ```reverse``` or ```count```) are already heavily optimized for `Python` lists.

This was a bit underwhelming, but let's give it one last chance - in the form of `NumPy`'s implementation of arrays and the ***additional support for vectorized functions***.

In [12]:
numpyArray = np.array(long_list) #we will discuss NumPy array creation in the next section
np.sum(numpyArray) #notice that this is NumPys sum implementation

12499997500000

time: 476 ms


In [13]:
%%timeit
np.sum(numpyArray)

4.61 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
time: 3.79 s


As you can see this is *way faster* than our fastest implementation, and this difference in speed will only increase with the size of our data (If you don't believe it, try a larger range for our long_list).

***Why is NumPy so much faster than the Python array?*** The *NumPy ndarray object is of fixed size* and *all elements are the same datatype* as the Python array. In addition to the array data structure, *Numpy operations are performed as optimized code* on the array data structure.

Changing your iterative (loopy) programs to operate on arrays and use vectorized functions in `NumPy` can drastically improve the performance, whilst also in a shorter amount code.

##  `Arrays` in Numpy
A main Numpy object is a ***N-dimensional array*** [`ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html), which serves as a container (grid of values) for large arrays of data of the same type. The objects can be used along with a set of provided universal functions [`ufuncs`](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) to perform a variety of scientific computations efficiently.

- The ***number of dimensions*** is the ***rank*** of the array; 
- The ***shape*** of an array is a tuple of integers giving the ***size of the array along each dimension***. 

**`N.B.`** This use of the word `rank` is not the same as the meaning in linear algebra.

Let's start by having a look at how we can create different forms of arrays in NumPy. To start with, we can initialize numpy arrays from nested Python [lists](http://www.tutorialspoint.com/python/python_lists.htm), and access elements using square brackets:

In [14]:
a = np.array([1, 2, 3])  # Creates a rank 1 array (i.e. vector)
a

array([1, 2, 3])

time: 2.04 ms


In [15]:
charArray = np.array(['a', 'b', 'c'])
charArray

array(['a', 'b', 'c'], dtype='<U1')

time: 5.2 ms


In [16]:
floatArray = np.array([1, 2, 3.0])
floatArray

array([1., 2., 3.])

time: 2.98 ms


In [17]:
boolArray = np.array([True, False])
boolArray

array([ True, False])

time: 2.64 ms


For Python arrays we had to declare the type of the contained elements - here we didn't do any of this, but NumPy has set the type by **inferring the optimal data type when you create an array**.

We can access the data type of a NumPy array with:

In [18]:
a.dtype  # Prints the type of object a (array)

dtype('int64')

time: 3.7 ms


In [19]:
charArray.dtype

dtype('<U1')

time: 5.19 ms


In [20]:
floatArray.dtype

dtype('float64')

time: 2.01 ms


In [21]:
boolArray.dtype

dtype('bool')

time: 2.81 ms


Notice that this tells you the **type of the contained data**!  If you ask for the type of object ```boolArray``` we instead get ```numpy.ndarray```.

In [22]:
type(boolArray)  # Prints the type of object a (array)

numpy.ndarray

time: 3.25 ms


We can also include an optional argument to explicitly specify the data type, like so:

In [23]:
a = np.array([1,2,3], dtype='int8')

time: 1.16 ms


One of the most basic property of an array we might be interested in is its *shape*. The NumPy function ``numpy.shape`` returns the shape of an array as a tuple of integers. Each number in the tuple denotes the length of the corresponding array dimension. Let's see an example:

In [24]:
a.shape  # Prints the number of elements for each dimension

(3,)

time: 2.61 ms


In [25]:
a.ndim   # Prints the number of dimensions of object a (array)

1

time: 2.43 ms


In [26]:
A = np.array([1,2,3,4,5,6,7,8,9,10])
A.shape

(10,)

time: 4.37 ms


In [27]:
A.ndim

1

time: 2.41 ms


The array A has length 10 in the first dimension (rows) and no other dimensions.

In [28]:
B = np.array([[1,2,3],[4,5,6]])
B.shape

(2, 3)

time: 3.99 ms


In [29]:
B.ndim

2

time: 3.57 ms


The array B has length 2 in the first dimension (rows) and length 3 in the second dimension (columns).

If instead, we want to know the total number of elements in the array we use ```size```:

In [30]:
A.size

10

time: 2.2 ms


In [31]:
B.size

6

time: 2.16 ms


<div class="alert alert-info" role="alert">
<h1>Exercises</h1>

<ol>

<li>
Assume you have a large database of user behavior on a video-streaming platform. You want to store different information about the movies in your database in an array. Since your database is very big, the data type in which you store this data will make a big difference in how much space you have to allocate for it on your server. Have a look at the table of different [NumPy types](https://docs.scipy.org/doc/numpy/user/basics.types.html) and think about which data type you would pick for the following data:

<ul style="list-style-type: none;">
   <li><input type="checkbox"> The number of videos watched by each user.</li>
    <li><input type="checkbox"> The average rating of a each movie (ratings from 0 to 10).</li>
    <li><input type="checkbox"> The size of each movie (in MB, in GB?).</li>
    <li><input type="checkbox"> If a user has watched a specific movie.</li>
    <li><input type="checkbox"> The title of the movie that the user has watched.</li>
    <li><input type="checkbox"> The first letter of the title.</li>
</ul>

</li>
<li>
Check what happens if you declare the following NumPy arrays and explain how NumPy handles these arrays (inspecting the resulting array and using the table on [NumPy types](https://docs.scipy.org/doc/numpy/user/basics.types.html).

<ul style="list-style-type: none;">
   <li><input type="checkbox"> numpy.array([-1,2,3],dtype='uint8')</li>
   <li><input type="checkbox"> numpy.array([50,80,250,256])</li>
    <li><input type="checkbox"> numpy.array([50,80,250,256],dtype='float64')</li>
    <li><input type="checkbox"> numpy.array([50,80,250,256],dtype='uint8')</li>
</ul>
</li>
</ol>

</div>

In [32]:
2**8

256

time: 2.75 ms


In [33]:
np.array([-1,2,3],dtype='uint8')

array([255,   2,   3], dtype=uint8)

time: 7.67 ms


In [34]:
np.array([50,80,250,256]).dtype

dtype('int64')

time: 2.48 ms


In [35]:
np.array([50,80,250,256],dtype='float64')

array([ 50.,  80., 250., 256.])

time: 3.34 ms


In [36]:
np.array([50,80,250,256],dtype='uint8')

array([ 50,  80, 250,   0], dtype=uint8)

time: 2.55 ms


In [37]:
a

array([1, 2, 3], dtype=int8)

time: 1.9 ms


In [38]:
print(a[0], a[1], a[2], a[-1], a[-2], a[-3])  # Select array elements by index (starts at 0)

1 2 3 3 2 1
time: 1.48 ms


In [39]:
try:
    a[3]  # Will error
except IndexError as e:
    print('{}'.format(e))
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

index 3 is out of bounds for axis 0 with size 3
time: 1.53 ms


In [40]:
a[0] = 5  # Change an element of the array

time: 454 µs


In [41]:
a

array([5, 2, 3], dtype=int8)

time: 1.95 ms


We can create a N-dimensional array by passing a each row as a a sequence-like object (for example a list). If we want to create a 3x3 array:

In [42]:
b = np.array([[1,2,3],[4,5,6],[7,8,9]])
b

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

time: 2.58 ms


In [43]:
b = np.array([[1,2,3],[4,5,6]]) # Create a rank 2 array
b

array([[1, 2, 3],
       [4, 5, 6]])

time: 2.39 ms


### Array `Creation`
In addition to `numpy.array`, there are several other ways for creating these objects:

1. **Using some pre-set matrix types** (generally, the first argument refers to the shape of the resulting array).

In [44]:
# Empty
np.zeros((2, 2))

array([[0., 0.],
       [0., 0.]])

time: 1.99 ms


In [45]:
# Filled with ones
np.ones((1, 2))

array([[1., 1.]])

time: 1.3 ms


In [46]:
# Filled with N
np.full((2, 2), 7)

array([[7, 7],
       [7, 7]])

time: 3.16 ms


In [47]:
# Identity matrix
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

time: 3.31 ms


In [48]:
# Filled with random floats between 0 and 1
np.random.random((2, 2))

array([[0.90604261, 0.17986387],
       [0.08006341, 0.39510405]])

time: 2.33 ms


In [49]:
# Filled with normally distributed random floats defined using mean and std
mu = 2     # mean
sigma = .2 # std
np.random.normal(mu, sigma, (4,1)), np.random.normal(mu, sigma, 10)

(array([[1.85539252],
        [1.96141145],
        [2.02943781],
        [1.64177565]]),
 array([2.02812491, 1.82529831, 1.80908858, 2.17305294, 1.89606748,
        1.94595732, 1.71282718, 2.1109176 , 2.15225037, 1.62470041]))

time: 7.45 ms


2. **From a list**

In [50]:
some_list = [1, 4, 6, 8]
e = np.array(some_list)
e

array([1, 4, 6, 8])

time: 3.38 ms


In [51]:
some_list = [[1, 4, 6, 8], [2, 2, 4, 4]]
f = np.array(some_list, dtype=float)
f

array([[1., 4., 6., 8.],
       [2., 2., 4., 4.]])

time: 3.18 ms


3. **Appending an existing array**

In [52]:
g = np.array([])
for ii in range(10):
    g = np.append(g, ii)
g

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

time: 2.39 ms


Be careful with data types, as numpy will do some inference on your behalf...it may not be what you want/intended.

In [53]:
np.append(g, 'hello')

array(['0.0', '1.0', '2.0', '3.0', '4.0', '5.0', '6.0', '7.0', '8.0',
       '9.0', 'hello'], dtype='<U32')

time: 2.48 ms


In [54]:
e

array([1, 4, 6, 8])

time: 2.07 ms


In [55]:
e.dtype

dtype('int64')

time: 1.53 ms


In [56]:
np.append(e, 2.0)

array([1., 4., 6., 8., 2.])

time: 1.83 ms


In [57]:
np.append(e, 2.0).dtype

dtype('float64')

time: 2.63 ms


<div class="alert alert-info" role="alert">
<h1>Exercises</h1>
<ol>
<li>
<ul style="list-style-type: none;">
   <li><input type="checkbox"> Create a (3,3) array of ones.</li>
    <li><input type="checkbox"> Create a (7,2) array of zeros.</li>
    <li><input type="checkbox"> Why can't you create a (3,2) identity matrix?</li>
    <li><input type="checkbox"> Create an 1D-array of 25 numbers [0..25] (check numpy.arange).</li>
    <li><input type="checkbox"> Create an 1D-array of 25 evenly spaced numbers between 0,100 (check numpy.linspace).</li>
    <li><input type="checkbox"> What is the difference between arange and linspace?</li>
    <li><input type="checkbox"> What does numpy.logspace do?</li>
    <li><input type="checkbox"> Create a (0:5,0:5) meshgrid (numpy.mgrid).</li>
    <li><input type="checkbox"> Create a (5,5) array of all zeros apart from the diagonal, which is [0,1,2,3,4,5].</li>
    <li><input type="checkbox"> Create a (3,3) array of random values, uniformly distributed between [0,1] (numpy.random).</li>
    
</ul>
</li>
</ol>
</div>

In [58]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

time: 4.69 ms


In [59]:
np.zeros((7,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

time: 2.3 ms


In [60]:
# No such ID matrix
np.eye((3))[:2]

array([[1., 0., 0.],
       [0., 1., 0.]])

time: 2.74 ms


In [61]:
np.arange(26)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25])

time: 2.9 ms


In [62]:
np.linspace(1,25, num = 25)

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.])

time: 2.53 ms


In [63]:
#numpy.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)[source]
#Return numbers spaced evenly on a log scale.

np.logspace(1, 25)

array([1.00000000e+01, 3.08884360e+01, 9.54095476e+01, 2.94705170e+02,
       9.10298178e+02, 2.81176870e+03, 8.68511374e+03, 2.68269580e+04,
       8.28642773e+04, 2.55954792e+05, 7.90604321e+05, 2.44205309e+06,
       7.54312006e+06, 2.32995181e+07, 7.19685673e+07, 2.22299648e+08,
       6.86648845e+08, 2.12095089e+09, 6.55128557e+09, 2.02358965e+10,
       6.25055193e+10, 1.93069773e+11, 5.96362332e+11, 1.84206997e+12,
       5.68986603e+12, 1.75751062e+13, 5.42867544e+13, 1.67683294e+14,
       5.17947468e+14, 1.59985872e+15, 4.94171336e+15, 1.52641797e+16,
       4.71486636e+16, 1.45634848e+17, 4.49843267e+17, 1.38949549e+18,
       4.29193426e+18, 1.32571137e+19, 4.09491506e+19, 1.26485522e+20,
       3.90693994e+20, 1.20679264e+21, 3.72759372e+21, 1.15139540e+22,
       3.55648031e+22, 1.09854114e+23, 3.39322177e+23, 1.04811313e+24,
       3.23745754e+24, 1.00000000e+25])

time: 2.57 ms


In [68]:
#Create a (0:5,0:5) meshgrid (np.mgrid).
np.mgrid[0:5,0:5]

array([[[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2],
        [3, 3, 3, 3, 3],
        [4, 4, 4, 4, 4]],

       [[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]])

time: 2.29 ms


In [None]:
def create(N=100):
    a = np.zeros((N, N))
    np.fill_diagonal(a, #Your range object)

In [67]:
#Create a (100,100) array of all zeros apart from the diagonal, which is [0,1,...,100].
a = np.zeros((6, 6))
np.fill_diagonal(a, [0,1,2,3,4,5])
a

array([[0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 0., 2., 0., 0., 0.],
       [0., 0., 0., 3., 0., 0.],
       [0., 0., 0., 0., 4., 0.],
       [0., 0., 0., 0., 0., 5.]])

time: 2.59 ms


### Array `Indexing & Slicing`

Slicing is the most common way to index arrays, and works similarly to Python list indexing, but there are also other options, such as integer and boolean array indexing. 

In [69]:
a = np.array([1,2,3,4,5,6,7,8,9,10])
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

time: 3.16 ms


In [70]:
a[1:3]

array([2, 3])

time: 2.68 ms


In [71]:
a[1:6:2]

array([2, 4, 6])

time: 3.29 ms


In [74]:
a[1::2]

array([ 2,  4,  6,  8, 10])

time: 2.71 ms


Higher dimensional arrays consist of an array of one-dimensional arrays, i.e. providing a single index will return the n-th element in the first dimension (which is an array for non 1D-arrays).

In [75]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

time: 3.33 ms


In [78]:
a[0]

array([1, 2, 3, 4])

time: 2.48 ms


In [81]:
a[1][1]

6

time: 7.12 ms


In [82]:
a[1,1]

6

time: 3.58 ms


<div class="alert alert-warning" role="alert">
<h1>Warning</h1>
Accessing the index directly with `Array[row, column]` is more efficient then the nested access, `Array[row][column]`. In the nested case the intermediate array `Array[row]` is created and only then accessed, whereas `Array[row, column]` does not create this intermediate result.
</div>

In [83]:
a[:,0]

array([1, 5, 9])

time: 1.81 ms


In [84]:
a[:][0]

array([1, 2, 3, 4])

time: 2.55 ms


In [89]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

time: 2.04 ms


In [91]:
b = a[:2, 1:3]
b

array([[2, 3],
       [6, 7]])

time: 2.71 ms


In [93]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

time: 3.46 ms


In [94]:
a.shape

(3, 4)

time: 5.01 ms


In [97]:
a[:,:,np.newaxis].shape

(3, 4, 1)

time: 2.16 ms


In [98]:
a[:,:,np.newaxis]

array([[[ 1],
        [ 2],
        [ 3],
        [ 4]],

       [[ 5],
        [ 6],
        [ 7],
        [ 8]],

       [[ 9],
        [10],
        [11],
        [12]]])

time: 2.76 ms


`Warning` **slice of an array is a view into the same data, so modifying it will modify the original array**. E.g.:

`b[0, 0]` is the same piece of data as `a[0, 1]`, but modifying `b` will modify `a`.

<div class="alert alert-warning" role="alert">
<h1>Warning</h1>

Slicing a lists creates a new object, but **slicing an array creates a reference to the original (sub-) array** (in NumPy called a [view](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.view.html)).


This might lead to some confusion, but we can use this to our advantage for modifying arrays efficiently. By selecting a view on our original data and passing it around we can modify the original data by modifying the view (this is beyond this introduction, but if you are curious have a look at the documentation for an [example](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.view.html).
</div>

In [101]:
a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

time: 2.38 ms


In [102]:
b

array([[2, 3],
       [6, 7]])

time: 2.71 ms


In [99]:
b[0, 0]

2

time: 2.91 ms


In [100]:
a[0, 1]

2

time: 2.71 ms


In [106]:
b[0, 0] = 77
a[0, 1]

1

time: 2.4 ms


In [104]:
c = a.copy()
c

time: 450 µs


In [107]:
c[0, 0] = 100
a[0, 0]

101

time: 1.92 ms


In [109]:
a

array([[101,  77,   3,   4],
       [  5,   6,   7,   8],
       [  9,  10,  11,  12]])

time: 3.52 ms


In [110]:
a[:,[0,2]]

array([[101,   3],
       [  5,   7],
       [  9,  11]])

time: 4.08 ms


In [111]:
x = np.array(
    [[False, False, False,  True],
     [False,  True, False,  True],
     [False,  True, False,  True]])
x

array([[False, False, False,  True],
       [False,  True, False,  True],
       [False,  True, False,  True]])

time: 2.04 ms


In [113]:
a.shape, x.shape

((3, 4), (3, 4))

time: 2.78 ms


In [115]:
a

array([[101,  77,   3,   4],
       [  5,   6,   7,   8],
       [  9,  10,  11,  12]])

time: 2.62 ms


In [116]:
a[x] = 0
a

array([[101,  77,   3,   0],
       [  5,   0,   7,   0],
       [  9,   0,  11,   0]])

time: 2.41 ms


<div class="alert alert-info" role="alert">
<h1>Exercises</h1>

<ol>
<li>
Slicing is a very important concept for array access. Have a look at the documentation on [Basic Slicing and Indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) to find out what the following slices do on an array A:

<ul style="list-style-type: none;">
   <li><input type="checkbox">  A is 1D: A[-3:3:-1]</li>
   <li><input type="checkbox">  A is 1D: A[3:]</li>
   <li><input type="checkbox">  A is 2D: A[1:]</li>
   <li><input type="checkbox">  A is 1D: A[:]</li>
   <li><input type="checkbox">  A is 2D: A[:]</li>
   <li><input type="checkbox">  A is 1D: A[::2]</li>
   <li><input type="checkbox">  A is 2D: A[::2]</li>
   <li><input type="checkbox">  A is 2D: A[::2,::2]</li>
    
</ul>
</li>
</ol>
</div>

### Boolean `Indexing`
With boolean indexing we can select a subset of our array based on a logical condition. For example:

In [118]:
np.arange(55)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54])

time: 2.78 ms


In [120]:
A = np.arange(55)
A > 2

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

time: 3.11 ms


As you can see NumPy applies the logical condition (greater than 2) to each element in the array. This works equally for multidimensional arrays:

In [121]:
A = np.array([np.arange(25),np.arange(25)])
A

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24]])

time: 2.26 ms


In [122]:
A > 2

array([[False, False, False,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True],
       [False, False, False,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True]])

time: 3.26 ms


We can use the array of type boolean to index subsets of our array like so:

In [123]:
A = np.arange(50)
A

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

time: 2.33 ms


In [124]:
A>5

array([False, False, False, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True])

time: 2.47 ms


In [125]:
A[A>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
       23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
       40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

time: 2.64 ms


In [127]:
rule1 = A > 5
rule2 = (A % 2 == 0)
# .....

time: 611 µs


In [131]:
A[rule1 & rule2]

time: 889 µs


In [None]:
A[rule1 | rule2]

## Arrays `Computing`

We have seen how we can create arrays in NumPy and how we can access individual elements, or larger element collections from NumPy arrays. To finish this lab, we will have a quick look at the possibilities that NumPy provides us to perform optimized computations on arrays. 

As the central data structure in NumPy is the array, the computations upon these n-dimensional arrays belong to the field of [Linear Algebra](https://en.wikipedia.org/wiki/Linear_algebra), and NumPy provides implementations for most common operations, e.g. matrix multiplication, decompositions, determinants, etc..

### Scalars

In [134]:
A = np.ones(4)
A

array([1., 1., 1., 1.])

time: 2.1 ms


In [135]:
A + 0.5

array([1.5, 1.5, 1.5, 1.5])

time: 2.58 ms


In [140]:
B = np.ones([2,2])
B - 0.3

array([[0.7, 0.7],
       [0.7, 0.7]])

time: 2.35 ms


Equally, we can also subtract, divide, multiply  or exponentiate scalars:

In [141]:
A - 0.001 #or A - 1e-3 

array([1.499, 1.499, 1.499, 1.499])

time: 2.74 ms


In [143]:
A /3

array([0.5, 0.5, 0.5, 0.5])

time: 2.23 ms


In [144]:
2.5 * A

array([3.75, 3.75, 3.75, 3.75])

time: 2.88 ms


In [147]:
C = np.arange(15)
C

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

time: 3.33 ms


In [148]:
C * 3

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42])

time: 2.16 ms


In [151]:
3 ** C

array([      1,       3,       9,      27,      81,     243,     729,
          2187,    6561,   19683,   59049,  177147,  531441, 1594323,
       4782969])

time: 2.55 ms


In [152]:
C ** 3

array([   0,    1,    8,   27,   64,  125,  216,  343,  512,  729, 1000,
       1331, 1728, 2197, 2744])

time: 2.94 ms


### Array `Math`
Basic mathematical functions (arithmetic operations) operate **elementwise** on arrays, and are available both as operator overloads and as functions in the numpy module:

In [153]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
x

array([[1., 2.],
       [3., 4.]])

time: 3.56 ms


In [154]:
y = np.array([[5, 6], [7, 8]], dtype=np.float64)
y

array([[5., 6.],
       [7., 8.]])

time: 1.95 ms


#### Elementwise sum, equivalent expressions:

In [155]:
x + y

array([[ 6.,  8.],
       [10., 12.]])

time: 5.93 ms


In [156]:
np.add(x, y)

array([[ 6.,  8.],
       [10., 12.]])

time: 3.74 ms


#### Elementwise difference, equivalent expressions:

In [157]:
x - y

array([[-4., -4.],
       [-4., -4.]])

time: 3.66 ms


In [158]:
np.subtract(x, y)

array([[-4., -4.],
       [-4., -4.]])

time: 2.63 ms


#### Elementwise product, equivalent expressions:

In [159]:
x * y

array([[ 5., 12.],
       [21., 32.]])

time: 2.66 ms


In [162]:
np.multiply(x, y)

array([[ 5., 12.],
       [21., 32.]])

time: 3.56 ms


#### Elementwise division, equivalent expressions:

In [163]:
x / y

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

time: 3.1 ms


In [164]:
np.divide(x, y)

array([[0.2       , 0.33333333],
       [0.42857143, 0.5       ]])

time: 2.53 ms


#### Elementwise square root

In [165]:
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

time: 2.61 ms


In [166]:
x ** (0.5)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

time: 2.23 ms


#### Dot product and matrix multiplicaiton

**`N.B.`** `*` is elementwise multiplication, not matrix multiplication. We instead use the `np.dot` function or `.dot` method to compute the inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. `dot` is available both as a function in the numpy module and as an instance method of array objects:

In [263]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
v = np.array([9, 10])
w = np.array([11, 12])

time: 1.17 ms


##### Matrix vector product

In [264]:
x.dot(v)  # using x's method

array([29, 67])

time: 1.66 ms


In [265]:
np.dot(x, v)  # using the numpy function

array([29, 67])

time: 2.32 ms


##### Matrix matrix product

In [266]:
x

array([[1, 2],
       [3, 4]])

time: 1.59 ms


In [267]:
y

array([[5, 6],
       [7, 8]])

time: 1.85 ms


In [268]:
x.dot(y)  # using x's method

array([[19, 22],
       [43, 50]])

time: 2.06 ms


In [269]:
np.dot(x, y)  # using the numpy function

array([[19, 22],
       [43, 50]])

time: 2.12 ms


### `Mathematical Functions`

Numpy provides many useful functions for performing computations on arrays; one of such is `sum`:

In [195]:
x = np.array([[1, 2], [3, 4]])
x

array([[1, 2],
       [3, 4]])

time: 3.3 ms


In [196]:
np.sum(x)  # Compute sum of all elements

10

time: 5.92 ms


In [197]:
np.sum(x, axis=0)  # Compute sum of each column - sum *over rows* i.e. dimension 0

array([4, 6])

time: 2.56 ms


In [199]:
np.sum(x, axis=1)  # Compute sum of each row - sum *over columns* i.e. dimension 1

array([3, 7])

time: 3.87 ms


In [201]:
np.sin(x)

array([[ 0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 ]])

time: 3.43 ms


In [202]:
??np.sin

time: 74.6 ms


For the full list of mathematical functions, check the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

Apart from computing mathematical functions using arrays, we frequently need to **reshape** or otherwise manipulate data in arrays. The simplest example of this type of operation is **transposing** a matrix; to transpose a matrix, simply use the `T` attribute of an array object:

In [203]:
np.arange(4)

array([0, 1, 2, 3])

time: 2.27 ms


In [204]:
x = np.arange(4).reshape((2, 2))
x

array([[0, 1],
       [2, 3]])

time: 3.39 ms


In [205]:
x.T

array([[0, 2],
       [1, 3]])

time: 2.32 ms


In [206]:
np.transpose(x) # Equivalent expression

array([[0, 2],
       [1, 3]])

time: 7.89 ms


In [207]:
# Note that taking the transpose of a rank 1 array (a vector) does nothing:
v = np.array([1, 2, 3])
v

array([1, 2, 3])

time: 5.28 ms


In [208]:
v.T

array([1, 2, 3])

time: 2.78 ms


In [209]:
x.reshape((4, 1))

array([[0],
       [1],
       [2],
       [3]])

time: 3.67 ms


In [210]:
x.reshape((4,))

array([0, 1, 2, 3])

time: 2.88 ms


In [212]:
y = np.arange(27).reshape((3, 3, 3))
y

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

time: 2.63 ms


In [213]:
y.shape

(3, 3, 3)

time: 3.08 ms


In [214]:
y.reshape((3, -1))

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26]])

time: 2.54 ms


In [215]:
y.reshape((3, -1)).shape

(3, 9)

time: 2.19 ms


### Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. 

In [216]:
x = np.arange(12).reshape((4, 3))
x

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

time: 2.64 ms


In [217]:
x+2

array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10],
       [11, 12, 13]])

time: 3.51 ms


Broadcasting is especially useful when we have a smaller and a larger arrays, and want to use the smaller one multiple times to perform some operation on the larger. For example, suppose that we want to add a constant vector to each row of a matrix. 

In [218]:
v = np.array([1, 0, 1])
v

array([1, 0, 1])

time: 3.11 ms


In [219]:
x + v  # Add v to each row of x using broadcasting

array([[ 1,  1,  3],
       [ 4,  4,  6],
       [ 7,  7,  9],
       [10, 10, 12]])

time: 2.61 ms


`x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

* If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
* The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
* The arrays can be broadcast together if they are compatible in all dimensions.
* After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
* In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.

So be careful with shapes...

In [220]:
y = x.T
y

array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

time: 3.56 ms


In [221]:
try:
    y + v  # Add v to each column of y using broadcasting...?
except ValueError as e:
    print(e)
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

operands could not be broadcast together with shapes (3,4) (3,) 
time: 1.81 ms


And especially careful with vectors!

In [222]:
try:
    y + v.T  # Add v to each column of y using broadcasting...?
except ValueError as e:
    print(e)
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

operands could not be broadcast together with shapes (3,4) (3,) 
time: 1.42 ms


In [223]:
y + v.reshape((3, 1))  # Add v to each column of y using broadcasting!

array([[ 1,  4,  7, 10],
       [ 1,  4,  7, 10],
       [ 3,  6,  9, 12]])

time: 2.92 ms


In [224]:
print('x shape:', x.shape)
print('v shape:', v.shape)
print('y shape:', y.shape)

x shape: (4, 3)
v shape: (3,)
y shape: (3, 4)
time: 1.27 ms


### Numpy Function Example
Here's an example of moving average calculation with NumPy using the [`cumsum`](https://numpy.org/devdocs/reference/generated/numpy.cumsum.html) function. You may want to check the speed difference as compared to the methods that we designed previously using basic Python.

In [None]:
a = np.arange(20)
a

In [None]:
np.cumsum(a)

In [None]:
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

In [None]:
moving_average(a)

In [None]:
moving_average(a, n=4)

__________
## Numpy Docs
__________
This brief introduction has touched upon most of the important things that you ought to know about NumPy, but, obviously, it is far from complete.

Check out the [NumPy reference](https://docs.scipy.org/doc/numpy-1.13.0/reference/) to find out much more, and complete the following short exercises to yet again test your understanding of simple Numpy functions and objects.

Feel free to use the official [documentation](http://docs.scipy.org/doc/) should you need it, and don't worry if you need to google some of the solutions (after all, that's a big part of any programmer's work).
__________
# `Exercises`
The following short exercises test your understanding of simple numpy functions and objects. Make sure you can complete them and feel free to reference the official [documentation](http://docs.scipy.org/doc/) should you need it. **`N.B.`** You may need to google some solutions.

#### ========== Question 1 ==========
Print your numpy version and configuration.

In [306]:
# Your code goes here
print(np.__version__)
np.show_config()

1.16.5
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/ondes/anaconda3/envs/fastai/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/ondes/anaconda3/envs/fastai/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/ondes/anaconda3/envs/fastai/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/ondes/anaconda3/envs/fastai/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
time: 3.83 ms


#### ========== Question 2 ==========
Create a zero vector of size 5.

In [307]:
# Your code goes here
np.zeros(5)

array([0., 0., 0., 0., 0.])

time: 2.37 ms


#### ========== Question 3 ==========
Create a zero vector of size 5 of type integer. Set the third element to 1.

In [308]:
# Your code goes here
a = np.zeros(5, dtype=int)
a[2] = 1
a

array([0, 0, 1, 0, 0])

time: 1.88 ms


#### ========== Question 4 ==========
Create a vector ranging from 0 to 9. 

In [309]:
# Your code goes here
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

time: 3.04 ms


#### ========== Question 5 ==========
Create a vector ranging from 10 to 29.

In [310]:
# Your code goes here
np.arange(10, 30)

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
       27, 28, 29])

time: 1.88 ms


#### ========== Question 6 ==========
Create a vector ranging from 0 to 9 and reverse it.

In [311]:
# Your code goes here
np.arange(0, 10)[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

time: 1.76 ms


#### ========== Question 7 ==========
Create a 5 x 3 zero matrix.

In [312]:
# Your code goes here
np.zeros((5, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

time: 2.43 ms


#### ========== Question 8 ==========
Create this matrix...without copy pasting it ;)
```
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])
```

In [313]:
# Your code goes here
a = np.arange(9).reshape(3,3)
a.T

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

time: 2.3 ms


#### ========== Question 9 ==========
Create a 3 X 3 identity matrix.

In [314]:
# Your code goes here
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

time: 2.54 ms


#### ========== Question 10 ==========
Create a 2 X 2 X 2 array with random values (drawn from a normal distribution).

In [315]:
# Your code goes here
np.random.randn(2, 2, 2)

array([[[ 0.62453654, -0.44995577],
        [ 0.46381577,  0.37353941]],

       [[-0.16134281, -0.19798543],
        [-1.23071858,  0.11024016]]])

time: 3.71 ms


#### ========== Question 11a ==========
Create a 5 x 4 array with random values and find the minimum and maximum values.

In [316]:
# Your code goes here
a = np.random.randn(5, 4)
print(a)
print("Minimum: ", np.min(a))
print("Maximum: ", np.max(a))

[[ 0.48001545 -0.97897067  0.9623854  -1.49735592]
 [ 1.61417042 -0.52268688  0.15147197  0.37107364]
 [ 1.77385477  0.46822344 -0.34373052 -1.11728658]
 [-0.8483709   0.70547566  0.33813179 -0.6420878 ]
 [ 1.05954427  0.33108887 -0.02776498 -1.41391152]]
Minimum:  -1.497355920657755
Maximum:  1.773854770306508
time: 1.7 ms


#### ========== Question 11b ==========
Return the *index* (i.e. the location within the matrix) of the max or min values

In [317]:
# Your code goes here
idx = a.argmax()    # or...
idx = np.argmax(a)  # ...are acceptable...but a[idx] would fail

time: 1.23 ms


In [318]:
np.where(a == a.max())  # is also fine

(array([2]), array([0]))

time: 1.83 ms


#### ========== Question 12 ==========
Find the mean value of the array in 11.

In [319]:
# Your code goes here
np.mean(a)

0.043163495876740766

time: 3.19 ms


#### ========== Question 13 ==========
Find the row means of the array in 11.

In [320]:
# Your code goes here
np.mean(a, axis=1)

array([-0.25848144,  0.40350729,  0.19526528, -0.11171281, -0.01276084])

time: 2.82 ms


#### ========== Question 14 ==========
Find the column means of the array in 11.

In [321]:
# Your code goes here
np.mean(a, axis=0)

array([ 8.15842802e-01,  6.26085885e-04,  2.16098732e-01, -8.59913636e-01])

time: 2.73 ms


#### ========== Question 15 ==========
Create a list with elements 2.2, 3.5, 0, 4, 0. and convert into numpy array. Find the indices of non-zero elements.

In [322]:
# Your code goes here
a = [2.2, 3.5, 0, 4, 0.]
a = np.asarray(a)  # or np.array(a)
np.nonzero(a)

(array([0, 1, 3]),)

time: 3.02 ms


#### ========== Question 16 ==========
Crate two normally distributed random matrices of shape (5, 4) and (4, 2). Print their matrix product.

In [323]:
# Your code goes here
a = np.random.randn(5, 4)
b = np.random.randn(4, 2)
np.dot(a,b)

array([[ 0.24004063,  0.49994152],
       [-0.39509886,  1.45097764],
       [ 1.56867966,  1.45841682],
       [-0.48285705,  0.0702741 ],
       [ 0.29240205,  3.38413446]])

time: 3.95 ms


#### ========== Question 17 ==========
Crate a random matrix of shape (5, 3) and a random vector of size 3. Use broadcasting to add the two arrays.

In [324]:
# Your code goes here
a = np.random.randn(5, 3)
b = np.random.randn(3)
a + b

array([[-1.64527732, -0.64210277,  1.19817481],
       [-2.77051393, -0.11608541, -0.34260038],
       [-1.12175557, -0.17247304,  1.0608133 ],
       [ 0.611254  ,  0.03318393,  1.81007001],
       [-2.29972014, -1.60076785,  1.86937711]])

time: 3.13 ms


_________
# Additional References

NumPy offers many more operations as optimized computations on multidimensional arrays and is a fundamental tool for anyone interested in doing scientific computing in Python.

If you want to deepen your understanding of NumPy, here are some good starting points:

- [The SciPy NumPy tutorial](http://www.scipy-lectures.org/intro/numpy/index.html)
- [100 Exercises with solutions in Numpy](http://www.labri.fr/perso/nrougier/teaching/numpy.100/)

For a visual introduction to Linear Algebra:
- [Essence of Linear Algebra](https://www.youtube.com/watch?v=kjBOesZCoqc)