# Very Important before Starting this lesson
## YouTube Channel - Mohit Sanjay Sharma
### Professor - Federica Bianco
#### Videos to see before starting the following lesson:
- Linear Algebra
- Linear Algebra 2 

## 04 - 00 Introduction to Numpy
`Numpy` (Numerical Python) is an opensource library for performing scientific computation in python. Numpy let's you work with arrays and matrices in a natural way unlike lists where you have to loop through individual elements to perform any numerical operation. 
> This would probably be a good time to refresh your memory on what are arrays and matrices.. here is something that you need to know to get started
> - Arrays are simply a collection of values of *same type* indexed by integers.
> - Matrices are defined to be multi-dimensional array indexed by rows, columns and dimensions.

The methods in numpy are designed with high performance in mind. Numpy arrays are stored more efficiently than an equivalent data structure in python such as lists and arrays. This especially pays off when you are using really large arrays (i.e large data sets). Major portion of numpy is written in C and thus the computations are faster than the pure python code. Numpy is one of the part of the scientific stack in Python. It actually used to be a part of major scientific package called SciPy but was later separated and now scipy uses numpy for almost all of its major tasks.

Numpy is a very huge topic and we will barely scratch the surface in this bootcamp but it will be enough to get you all up to speed for starting with your Master's courseworks. 

In this module (and sub-modules) we will be looking at ways for effectively loading, storing, and manipulating in-memory. We will be dealing with some of the datasets comprising of wide range of sources such as images, sound clips, text data etc. but to reach that stage, lets first get a working understanding of `Numpy Arrays`.

Numpy is a third party module that has been installed for you on CUSP CDF or if you followed the installation instructions for your machine, you should have it. Before we can begin using Numpy, we have to first `import` it in python. You must remember from the File IO module where we imported built-in csv module. Similarly, to import numpy module, you can type

In [1]:
import numpy as np
print(np.__version__)

1.13.1


We use `import numpy as np` so that we wont have to type `numpy` everytime we want to use the module, instead we can use `np`.

A gentle reminder to use tab-completion(`<TAB>`) and `?` to explore and access the documentation for anything that you are looking for.

Example:
```ipython
In [1]: np.<TAB>
```
or 
```ipython
In [2]: np.__version__?
```

## 04 - 01 Numpy Array Basics

Numpy's main object is the homogeneous multidimensional array. Numpy's array class is called `ndarray`. It is a table of numbers, indexed by a `tuple` of positive integers. In numpy dimensions are called as axes. The number of axes is known as rank. 

Numpy arrays are similar to Python lists with few differences such as: 
 - All the elements in a numpy array must be of same datatype. 
 - You can't change the size of a numpy array (atleast not without making a full copy.. we'll see this a little later)
 - Numpy arrays are easy to construct and to manipulate.
 - Numpy arrays support “vectorized” operations like elementwise addition and multiplication without having to run a `for` loop explicitly in python.
 
We'll cover basic array manipulations here:
- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Creating arrays*: Different ways of creating the Arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

### 04 - 01.01 Attributes of Arrays

In [2]:
from __future__ import print_function
import numpy as np
# Single dimensional Array from a list
arr = np.array([1, 2, 3, 4], dtype=float)
print('Type: ',type(arr))
print('Shape: ',arr.shape)
print('Dimension: ',arr.ndim)
print('Itemsize: ',arr.itemsize)
print('Size: ',arr.size)

Type:  <class 'numpy.ndarray'>
Shape:  (4,)
Dimension:  1
Itemsize:  8
Size:  4


The above is one of the many ways in which a numpy array can be created. The `np.array()` in above case takes two arguments: the `list` to be converted to numpy array and the `datatype` (`dtype`) of **every** member of the list. 

There are many different attributes of `ndarray` class and by now you should be able to understand how to access those attributes and get help for them (Hint: `<TAB>` completion). 

Let's understand at some of the attributes that we printer above.

##### ndarray.ndim
It is the number of axes or dimensions of the array.

##### ndarray.shape
It is the dimension of the array. This is a tuple of integers indicating the size of the array in each dimension. For matrix with n rows and m columns, the shape will be (m, n). The shape attribute is thus a tuple. For single dimensional arrays, the second element of the tuple will be None (as it is on our case).

##### ndarray.dtype
It is an object describing the type of the elements in the array. Remember that all the elements need to be of same datatype in a numpy array. Additionally numpy provides its own int16, int32, float64 and so on.

##### ndarray.itemsize
The size in bytes of each element of the array. For example an array of elements of type float64 has itemsize of $\frac{64}{8} = 8$ and one complex32 has item size of $\frac{32}{8} = 4$.

##### ndarray.data
This is the buffer containing the actual elements of the array. Normally this attribute is not used as numpy offers many fancy indexing facilities.

Let's take a look at another example:

In [3]:
# Elements have to be of same datatype
arr = np.array([1, 2.0, "ucsl"])
print("Datatype: ", arr.dtype)

Datatype:  <U32


Since we did not pass the `dtype` parameter, Numpy saw that there are mixed types and it converts the datatype of all the elements to type `U`nicode`32` (or `S`tring`32` if you are using Python2). 

> To know all the datatypes supported by Numpy, you can type
```ipython
In [2]: np.typeDict
```
and check the output

If we would've passed the `dtype` as `float` or anything other than a type of `string` or `unicode`, we would've recevied a value error. (Try it!)

### 04 - 01.02 Creating Arrays
There are many different ways in which a numpy array can be created. We saw one in the above example. Lets look at some other ways of creating arrays

In [4]:
arr1 = np.arange(5, dtype=float)
print('arange() with float dtype: \n',arr1)
# Divide the range between start and stop in equal `num` intervals
arr2 = np.linspace(0, 8, num=5)
print('\n linspace(): \n', arr2)
arr3 = np.ones((2, 3), dtype=float)
print ('\n ones(): \n',arr3)
arr4 = np.zeros((2,3), dtype=float)
print ('\n zeros(): \n',arr4)
arr5 = np.empty((2, 4))
print('\n Empty: \n',arr5)  # Your output may be different..
arr6 = np.ones_like(arr1)
print('\n Ones_like(): \n',arr6)
arr7 = np.diag(arr1)
print('\n Diagonal array: \n',arr7)

arange() with float dtype: 
 [ 0.  1.  2.  3.  4.]

 linspace(): 
 [ 0.  2.  4.  6.  8.]

 ones(): 
 [[ 1.  1.  1.]
 [ 1.  1.  1.]]

 zeros(): 
 [[ 0.  0.  0.]
 [ 0.  0.  0.]]

 Empty: 
 [[  0.00000000e+000  -2.68679659e+154   2.26498184e-314   2.24886081e-314]
 [  2.24867949e-314   2.26502100e-314   2.26502037e-314   5.56270716e-309]]

 Ones_like(): 
 [ 1.  1.  1.  1.  1.]

 Diagonal array: 
 [[ 0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  2.  0.  0.]
 [ 0.  0.  0.  3.  0.]
 [ 0.  0.  0.  0.  4.]]


#### np.arange()

is the same as the range function that we used previously. This method will however return a numpy array.

#### np.zeros() and np.ones()

as the name suggests, generate new arrays of specified dimensions filled with these values. These are most commonly used functions to create new arrays.

#### np.empty()

This function creates an array whose initial content is random and depends on the state of the memory. If not specified, the data type of the created array is float64

#### np.ones_like()  , np.zeros_like() and np.empty_like()

These functions create a new array with the same dimensions and type as the existing one but with the values as either ones or zeros or random value.

#### np.diag()

As the name suggests, this will construct a diagonal array

Let's take a look at an example for creating multi-dimensional array

In [5]:
arr2d = np.arange(27).reshape(3, 9)
print("2D array: \n{}\n".format(arr2d))
arr3d = np.arange(27).reshape(3,3,3)
print("3D array: \n{}\n".format(arr3d))

2D array: 
[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]]

3D array: 
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]



Numpy displays the arrays in a similar way to nested lists but with the following layout:
- the last axis is printed from left to right
- the second to last axis is printed from top to bottom.
- the rest rest are also printed from top to bottom with each slice separated from the next by an empty line
Simply put, single dimensional array are printed as rows, bi dimensional and multi-dimensional are printed as matrices and as lists of matrices respectively.

> We will look at reshaping of arrays later in this module

### 04 - 01.03 Array Indexing
Numpy arrays are indexed in the same way as lists are so accessing the elements for single dimensional array is equivalent to accessing elements in a list

In [6]:
arr = np.arange(3, 10)
print(arr[4])

7


You can also use negative indexing like we did for lists

In [7]:
print(arr[-3])

7


Multi-dimensional array items can be accessed using comma-separated tuple of indexes

In [8]:
arr3d = np.arange(27).reshape(3,3,3)
dim, row, col = 2, 1, 0
print("3D array: \n", arr3d, end="\n\n")
print("Element at {dim}, {row}, {col} is: {val}".format(dim=dim, 
                                                        row=row, 
                                                        col=col, 
                                                        val=arr3d[dim, row, col]))

3D array: 
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

Element at 2, 1, 0 is: 21


### 04 - 01.04 Array Slicing
Slicing extracts the portion of a sequence by specifying a lower and upper bound. The lower bound element is included, but the upper-bound element is not included in slicing. Just like lists, there is a third parameter step which means the strides to be taken between the elements. If any of these are unspecified, they default to the values `start=0, stop=size of dimension, step=1`. Each of these parameters are separated by colons (`:`)

In [9]:
arr = np.linspace(0, 8, num=5)
print("Original Array: \n", arr, end="\n\n")
# let the slicings begin
print("arr[:3]: ", arr[:3])
print("arr[-5:5:2]: ", arr[-5:5:2])
print("arr[::2]: ", arr[::2])
# Reverse the elements
print("arr[::-1]: ", arr[::-1])
# Reverse every other array from index 2
print("arr[2::-2]: ", arr[2::-2])

Original Array: 
 [ 0.  2.  4.  6.  8.]

arr[:3]:  [ 0.  2.  4.]
arr[-5:5:2]:  [ 0.  4.  8.]
arr[::2]:  [ 0.  4.  8.]
arr[::-1]:  [ 8.  6.  4.  2.  0.]
arr[2::-2]:  [ 4.  0.]


For multi-dimensional array, we specify in rows, columns format.

In [10]:
# Array of random integers between low and high of fixed size(mxn)
arr = np.random.randint(low=0, high=100, size=(3,4))
print("2D array: \n", arr, end="\n\n")
# first row, three columns
print("first row, three columns: \n", arr[:1, :3], end="\n\n")
# all rows, third column
print("all rows, third column: \n", arr[:, 3], end="\n\n")
# changing dimensions 
print("reversing rows and columns together: \n",
     arr[::-1, ::-1], end="\n\n")

2D array: 
 [[64 47 67 61]
 [10 73 21 78]
 [88 11 76 48]]

first row, three columns: 
 [[64 47 67]]

all rows, third column: 
 [61 78 48]

reversing rows and columns together: 
 [[48 76 11 88]
 [78 21 73 10]
 [61 67 47 64]]



Slices are references to the original array in memory. Changing the values in a slice also changes the original array

In [11]:
arr1 = np.arange(5)
# slice arr1
arr2 = arr1[3:5]
print("arr1: \n", arr1, end="\n\n")
print("Sliced array: \n", arr2)
print('\nBefore changing, arr2[0]: \n',arr2[0])
# change value for 0th element of the slice
arr2[0] = 99
print('\nAfter changing arr2[0], arr1: \n',arr1)

arr1: 
 [0 1 2 3 4]

Sliced array: 
 [3 4]

Before changing, arr2[0]: 
 3

After changing arr2[0], arr1: 
 [ 0  1  2 99  4]


### 04 - 01.05 Reshaping Arrays
We have been using `reshape` function to view a one dimensional array as a multi-dimensional array. This nifty method only works if your new array shape matches the size of the original array i.e `size = m x n`

One can also row and column elements using `newaxis` method

In [12]:
arr = np.random.randint(low=0, high=100, size=12)
print("Original Array: \n", arr, end="\n\n")
print("Reshaped to 3 x 4: \n", arr.reshape(3,4), end="\n\n")
print("Row vector : \n", arr[np.newaxis, :], end="\n\n")
print("Column vector : \n", arr[:, np.newaxis], end="\n\n")

Original Array: 
 [61 49  0 14 61  5 36 24  5 69 93 49]

Reshaped to 3 x 4: 
 [[61 49  0 14]
 [61  5 36 24]
 [ 5 69 93 49]]

Row vector : 
 [[61 49  0 14 61  5 36 24  5 69 93 49]]

Column vector : 
 [[61]
 [49]
 [ 0]
 [14]
 [61]
 [ 5]
 [36]
 [24]
 [ 5]
 [69]
 [93]
 [49]]



### 04 - 01.06 Concatenating Arrays
Just like Python Lists, you can concatenate two arrays using Numpy's `concatenate()`, `hstack()` and `vstack()` functions. 

However, you must remember that just like lists, when you combine a Numpy array, an actualy copy of both the arrays are made. If you created the two arrays separately, they are randomly scattered in memory, and there is no way to represent them as a `view` Numpy array. It is always advisible to know the size of array that you will be needing before-hand so that you can start with one big array, and have each of the small arrays be a `view` to the big array (you can leverage the power of slicing!)

In [13]:
# Creating two 1D arrays separately
arr1 = np.arange(10)
arr2 = np.arange(10, 20)
arr3 = np.concatenate((arr1, arr2))
print("Arr1: \n{}".format(arr1), end="\n\n")
print("Arr2: \n{}".format(arr2), end="\n\n")
print("Concatenated Array: \n{}".format(arr3))

Arr1: 
[0 1 2 3 4 5 6 7 8 9]

Arr2: 
[10 11 12 13 14 15 16 17 18 19]

Concatenated Array: 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


Concatenation of two multi-dimensional arrays, it is better to use the `hstack()` and `vstack()` for stacking against horizontal and vertical axis respectively.

In [14]:
arr1 = np.random.randint(1, 10, 8).reshape(2, 4)
arr2 = np.random.randint(90, 100, 8).reshape(2, 4)
# stacking horizontally
hs_arr = np.hstack((arr1, arr2))
# stacking vertically
vs_arr = np.vstack((arr1, arr2))
print("Arr1: \n{}".format(arr1), end="\n\n")
print("Arr2: \n{}".format(arr2), end="\n\n")
print("Horizontally Stacked Array: \n{}".format(hs_arr), end="\n\n")
print("Vertically Stacked Array: \n{}".format(vs_arr), end="\n\n")

Arr1: 
[[2 8 1 9]
 [6 5 5 2]]

Arr2: 
[[90 93 91 96]
 [97 90 96 99]]

Horizontally Stacked Array: 
[[ 2  8  1  9 90 93 91 96]
 [ 6  5  5  2 97 90 96 99]]

Vertically Stacked Array: 
[[ 2  8  1  9]
 [ 6  5  5  2]
 [90 93 91 96]
 [97 90 96 99]]



### 04 - 01.07 Splitting Arrays
Just like concatenating multiple arrays into one, Numpy's `split()`, `hsplit()` and `vsplit()` allows splitting of one array into multiple smaller ones.

In [15]:
arr1 = np.arange(20)
np.split(arr1, (2, 8, 10, 14))

[array([0, 1]),
 array([2, 3, 4, 5, 6, 7]),
 array([8, 9]),
 array([10, 11, 12, 13]),
 array([14, 15, 16, 17, 18, 19])]

`np.split()` takes the array that we want to split as the first argument and as a second argument, it requires a list or tuple of the index of the elements at which we want to split the array. More the number of split-points, there will be one more subarray i.e `N` split-points, leads to `N + 1` subarrays.

Similarly for multi-dimensional arrays, we can use `hsplit()` and `vsplit()`

In [16]:
arr2d = np.random.randint(0, 9, (3,3))
print("Original Array: \n{}".format(arr2d), end="\n\n")
# split along horizontal axis
arr1, arr2 = np.hsplit(arr2d, [2])
print("First Split: \n{}".format(arr1), end="\n\n")
print("Remaining Split: \n{}".format(arr2), end="\n\n")

Original Array: 
[[6 4 5]
 [6 6 7]
 [7 4 4]]

First Split: 
[[6 4]
 [6 6]
 [7 4]]

Remaining Split: 
[[5]
 [7]
 [4]]



## 04 - 02 Ufuncs
Python's default implementation does some operations slowly. This is in part due to the dynamic and interpreted nature of the language. It is this feature that allows `type`s to be flexible but since the type has to be checked at every operation, the sequences of operations cannot be compiled down to efficient machine code as in languages like C.
Lets take a look at python native implementation of this:



In [17]:
from __future__ import print_function
import numpy as np
def get_sin(arr):
    # Create an empty output array of same size as input
    output = np.empty_like(arr)
    for i in range(len(output)):
        output[i] = np.sin(arr[i])
    return output

In [18]:
input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time get_sin(input_arr)

CPU times: user 13.7 s, sys: 67.2 ms, total: 13.8 s
Wall time: 13.8 s


array([ 0.47306933, -0.32729591, -0.40434032, ..., -0.08669627,
       -0.00790195,  0.23159298])

> - ipython adds some commands to add further enhancements to the interactivity of ipython. These commands begin with `%` and are known as magic commands.
> - `%time` gives information about the time taken to execute a python statement.
> - There are many built-in magic commands .. and as always, since the magic commands start with `%`, you can simply type `%` in one of the code blocks and press `?` or `Shift + <TAB>` after it to get the docstring.
> - Remeber that these magic commands are specific only to ipython (and jupyter notebooks). These cannot be implemented in native python code.

Even though the above implementation is correct and might look optimized for people who are familiar with languages like C and Java, the above loop takes significant amount of time (check the `total` CPU times) and is horribly inefficient due to the reasons we mentioned above. 

This is where Numpy's `ufunc`s come to save the day. NumPy provides a convenient interface into these kinds of statically typed, compiled routine. This is known as a `vectorized operation`. This can be accomplished by simply performing an operation on the array, which will then be applied to each element. 

> The vectorized approach is designed to push the `loop` part of the code into the compiled layer that underlies NumPy, leading to much faster execution.

Let's take a look at Numpy ufunc based solution for same example

In [19]:
input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time np.sin(input_arr)

CPU times: user 146 ms, sys: 103 µs, total: 146 ms
Wall time: 146 ms


array([-0.58934641, -0.98125783, -0.2890509 , ..., -0.84633972,
        0.70741223,  0.08185639])

Thats much faster, right?

You can also use these ufuncs on multi-dimensional array. 

In [20]:
arr = np.random.randint(1, 100, (3, 4))
# take reciprocal
print("Original Array: \n{}".format(arr), end="\n\n")
print("Reciprocal: \n{}".format(1/arr), end="\n\n")

Original Array: 
[[ 4 49 71 87]
 [78 68 22 63]
 [85 11 38 97]]

Reciprocal: 
[[ 0.25        0.02040816  0.01408451  0.01149425]
 [ 0.01282051  0.01470588  0.04545455  0.01587302]
 [ 0.01176471  0.09090909  0.02631579  0.01030928]]



### 04 - 02.01 Array Mathematics

#### .. 02.01.01 Arithmetic Operations
Python's native operators can be directly used as a convinient wrapper for Numpy's ufuncs to `broadcast` the operation over all the elements of that array.

In [22]:
x = np.arange(-5, 5)
print("x      =", x)
print("x + 10  =", x + 10) # wrapper for np.sum 
print("x - 10  =", x - 10) # wrapper for np.subtract
print("x * 4  =", x * 4)  # wrapper for np.multiply
print("x / 4  =", x / 4)  # wrapper for np.divide
print("x % 4  =", x % 4)  # wrapper for np.mod
print("x // 4 =", x // 4) # wrapper for np.floor_divide
print("x ** 2 =", x ** 2) # wrapper for np.power
print("abs(x) =", abs(x)) # wrapper for np.abs

x      = [-5 -4 -3 -2 -1  0  1  2  3  4]
x + 10  = [ 5  6  7  8  9 10 11 12 13 14]
x - 10  = [-15 -14 -13 -12 -11 -10  -9  -8  -7  -6]
x * 4  = [-20 -16 -12  -8  -4   0   4   8  12  16]
x / 4  = [-1.25 -1.   -0.75 -0.5  -0.25  0.    0.25  0.5   0.75  1.  ]
x % 4  = [3 0 1 2 3 0 1 2 3 0]
x // 4 = [-2 -1 -1 -1 -1  0  0  0  0  1]
x ** 2 = [25 16  9  4  1  0  1  4  9 16]
abs(x) = [5 4 3 2 1 0 1 2 3 4]


> The above operations have been performed on the array of a particular datatype and so the result will have the same datatype as the array that is being operated on. However when you perform any operation on an array that results in a different datatype or on multiple arrays of different datatypes, the type of the resulting array will correspond to the more *precise* one. This is also known as `upcast`ing.

> In the above example, check the output of division (` / `). Can you find the type of that array?

When standard mathematical operations are used with numpy arrays, they are applied on an element-by-element basis and a new array is created and filled with the result. This means that the arrays should be of same size when any mathematical operation is performed on them.

In [23]:
arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=4)
print("Array1: \n{}".format(arr1), end="\n\n")
print("Array2: \n{}".format(arr2), end="\n\n")
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")

Array1: 
[ 1.  2.  3.  4.]

Array2: 
[  4.   8.  12.  16.]


 Array2 - Array1: 
 [  3.   6.   9.  12.]



However, if there was a size mismatch, then we would receive a `ValueError`

In [24]:
arr2 = np.linspace(4, 16, num=3)
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

> Well you might wonder why was it that we did not get a broadcast error when we performed addition of a single number over an array.. We shall look at this in the module on Broadcasting.

#### .. 02.01.02 Trignometric Functions
Just like Arithemetic operations, Numpy provides a bunch of trignometric `ufuncs`. Lets take a look at some

In [25]:
input_arr = np.random.uniform(-1, 1, 5)
print("Input Array: \n{}".format(input_arr), end="\n\n")
print("sin: \n{}".format(np.sin(input_arr)), end="\n\n")
print("cos: \n{}".format(np.cos(input_arr)), end="\n\n")
print("tan: \n{}".format(np.tan(input_arr)), end="\n\n")
print("arcsin: \n{}".format(np.arcsin(input_arr)), end="\n\n")
print("arccos: \n{}".format(np.arccos(input_arr)), end="\n\n")
print("arctan: \n{}".format(np.arctan(input_arr)), end="\n\n")

Input Array: 
[ 0.52094311 -0.53378574 -0.01271532 -0.09193751 -0.84963032]

sin: 
[ 0.49769837 -0.50879607 -0.01271497 -0.09180805 -0.75103637]

cos: 
[ 0.86735018  0.86088708  0.99991916  0.99577672  0.66026084]

tan: 
[ 0.57381479 -0.59101372 -0.012716   -0.09219742 -1.13748435]

arcsin: 
[ 0.54795545 -0.56307115 -0.01271566 -0.09206752 -1.01528392]

arccos: 
[ 1.02284088  2.13386748  1.58351199  1.66286385  2.58608024]

arctan: 
[ 0.48026138 -0.49030948 -0.01271463 -0.09167978 -0.70427941]



#### .. 02.01.03 Logarithmic Functions
Numpy provides logarithmic ufuncs for different `base`s

In [26]:
input_arr = np.random.randint(1, 7, 5)
print("x        =", input_arr)
print("ln(x)    =", np.log(input_arr))
print("log2(x)  =", np.log2(input_arr))
print("log10(x) =", np.log10(input_arr))

x        = [3 3 2 3 3]
ln(x)    = [ 1.09861229  1.09861229  0.69314718  1.09861229  1.09861229]
log2(x)  = [ 1.5849625  1.5849625  1.         1.5849625  1.5849625]
log10(x) = [ 0.47712125  0.47712125  0.30103     0.47712125  0.47712125]


Counterpart of Logs, we also have exponential ufuncs

In [28]:
input_arr = np.random.randint(1, 7, 5)
print("x     =", input_arr)
print("e^x   =", np.exp(input_arr))
print("2^x   =", np.exp2(input_arr))
print("10^x   =", np.power(10, input_arr))

x     = [3 2 4 4 4]
e^x   = [ 20.08553692   7.3890561   54.59815003  54.59815003  54.59815003]
2^x   = [  8.   4.  16.  16.  16.]
10^x   = [ 1000   100 10000 10000 10000]


## 04 - 03 Broadcasting and more Computation
### 04 - 03.01 Broadcasting
Broadcasting is a set of rules for applying `ufuncs` (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes. It is an important functionality to leverage the power of Numpy. 

If you remember from previous module, `ufunc` operations are performed element-by-element wise. Lets take a look at adding a scalar (we did this in Arithmetic subsection of previous module)

In [29]:
from __future__ import print_function
import numpy as np
arr1 = np.random.randint(1, 40, 5)
num  = 5
print("Arr1: \n{}".format(arr1), end="\n\n")
print("num : \n{}".format(num), end="\n\n")
print("Sum : \n{}".format(arr1+num), end="\n\n")

Arr1: 
[32  3 35  1 29]

num : 
5

Sum : 
[37  8 40  6 34]



Broadcasting allows these types of binary operations to be performed on arrays of different sizes just as we added a scalar (think of a scalar as a zero-dimensional array) to the array.

We can think of this as an operation that *stretches* or *duplicates* the value `5` into the array `[5, 5, 5, 5, 5]`, and adds it to the array. This duplication does not actually take place during Broadcasting but it is a useful logic to remember when you talk about broadcasting.

Just like adding scalar, we can perform broadcasting on multi-dimensional arrays as well.. however there are rules to be followed for `broadcasting` to work.

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is *padded* with ones on its leading (left) side.
- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Lets add two arrays of different sizes

In [30]:
arr1 = np.ones((3, 4))
arr2 = np.arange(4)

Lets match these arrays to our set of Rules.

Rule 1: Shape mismatch!
- arr1 is of shape `m1 x n1 = 3 x 4`
- arr2 is of shape `m2 x n2 = 1 x 4`

Rule 2: **Stretch** `m2` or the first dimension of arr2 to match `m1` or the first dimension of arr1. So Now,
- arr1 is of shape `m1 x n1 = 3 x 4`
- arr2 is of shape `m2 x n2 = 3 x 4`

Rule 3: Doesnt apply since m1 x n1 = m2 x n2

In [31]:
arr1 + arr2

array([[ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.]])

Now lets look at an example where we add arr1 to the transpose of arr1 itself. Lets first print out the transpose and then we shall apply the rules as we did for previous example

In [32]:
print("Arr1: \n{}".format(arr1))
print("arr1.shape: {}".format(arr1.shape), end="\n\n")
print("Arr1 Transpose: \n{}".format(arr1.T))
print("arr1.T.shape: {}".format(arr1.T.shape))

Arr1: 
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
arr1.shape: (3, 4)

Arr1 Transpose: 
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
arr1.T.shape: (4, 3)


Lets apply our rules:

Rule 1: Shape mismatch!
- arr1 is of shape   `m1 x n1 = 3 x 4`
- arr1.T is of shape `m2 x n2 = 4 x 3`

Rule 2: **Stretch** `m1` or the first dimension of arr1 to match `m2` or the first dimension of arr1.T. So now,
- arr1 is of shape   `m1 x n1 = 4 x 4`
- arr1.T is of shape `m2 x n2 = 4 x 3`

Rule 3: `n1` and `n2` or the second dimension of both the arrays are definitely not `1` and they don't match! This will raise a `ValueError`

In [34]:
arr1 + arr1.T

ValueError: operands could not be broadcast together with shapes (3,4) (4,3) 

So whats important is that the second dimension of both the arrays need to match! The first dimension can be stretched to match the size of the largest array. Thats how broadcasting works!

Take a look at the example from previous module where we got a ValueError when we tried broadcasting on two arrays of different shape:
```ipython
arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=3)
arr1 + arr2
```
Can you solve it now?

### 04 - 03.02 Aggregation
When performing analysis on any dataset, most of the times the first thing that you would end up doing is finding the summary statistics of the datasets. Things like maximum, minimum, mean, variance etc. is the first thing you would look at (for the relevant columns). Numpy provides such fast-performing aggregation ufuncs. Lets take a look at some

#### .. 03.02.01 sum
As the name suggests, this function will return the sum of all the values of an array

In [35]:
arr = np.random.randint(1, 700, 10000)
print("Sum of 1D: {}".format(np.sum(arr)))

Sum of 1D: 3512556


In [36]:
arr2d = np.random.uniform(1, 700, (3, 4))
print("Sum of 2D: {}".format(np.sum(arr2d)))

Sum of 2D: 3367.6994912525442


#### .. 03.02.02 max and min
This will find the maximum and minimum value in an array.

In [37]:
print("Max of 1D arr: {}".format(np.max(arr)))
print("Min of 1D arr: {}".format(np.min(arr)))
print("Max of 2D arr: {}".format(np.max(arr2d)))
print("Min of 2D arr: {}".format(np.min(arr2d)))

Max of 1D arr: 699
Min of 1D arr: 1
Max of 2D arr: 645.9831763574393
Min of 2D arr: 109.20059579276229


We can also get minimum and maximum along a particular axis
> - axis 0 = along the column
> - axis 1 = along the row

In [38]:
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Max of 2D arr along axis 0: {}".format(np.amax(arr2d, axis=0)))
print("Min of 2D arr along axis 0: {}".format(np.amin(arr2d, axis=0)))

2D Array: 
[[ 645.98317636  385.74125906  258.07934239  267.74809833]
 [ 347.05736495  113.66832238  109.20059579  250.37412055]
 [ 142.99869166  363.72522448  199.15259457  283.97070074]]

Max of 2D arr along axis 0: [ 645.98317636  385.74125906  258.07934239  283.97070074]
Min of 2D arr along axis 0: [ 142.99869166  113.66832238  109.20059579  250.37412055]


#### .. 03.03.03 std
Compute standard deviation along a particular axis

In [39]:
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Std along axis 0: \n{}".format(np.std(arr2d, axis=0)))
print("Std along axis 1: \n{}".format(np.std(arr2d, axis=1)))

2D Array: 
[[ 645.98317636  385.74125906  258.07934239  267.74809833]
 [ 347.05736495  113.66832238  109.20059579  250.37412055]
 [ 142.99869166  363.72522448  199.15259457  283.97070074]]

Std along axis 0: 
[ 206.55641541  123.3949589    61.21783045   13.71843097]
Std along axis 1: 
[ 156.43897571   99.69714687   83.80990997]


We will take a look at more of these in module on matplotlib so that while we will be printing the output, we will also be able to plot the results for a better understanding. If you are intersted to know about more such functions, you can check the link on official documentation here: [`Numpy Statistics Routines`](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html)