# NumPy

***

## Description: A brief tutorial where you will learn how to create, manipulate and perform mathematical operations with Numpy ndarrays

***

## Overview
- Understanding NumPy arrays
- Making NumPy arrays 
- Array manipulation in NumPy
- Indexing and slicing in NumPy
- Broadcasting
- Matrix operations with NumPy

***

## Pre-requisites
- Basics of Python
- Elementary knowledge of matrix operations

***

## Learning Objective
- Learn to leverage NumPy arrays for tasks of varying complexity
- Perform data analysis using NumPy

## Chapter 1: Arrays in NumPy

***

### Description: This chapter will introduce you to the NumPy library, its benefits and how to use it to create a NumPy array

### 1.1 What are arrays?

***

#### What is NumPy?

It stands for "Numerical Python". `NumPy` is a Python module that provides fast and efficient array operations of **homogeneous data**. It is the core library for scientific computing in Python providing a high-performance multidimensional array object, and tools for working with arrays.

NumPy is one of the many packages that is extremely essential in your data science journey, because this library equips you with an array data structure that offers some benefits over the traditional data structures of Python like lists.  


#### NumPy Arrays

The central feature of NumPy is the array object class, also called the **ndarray**. Arrays are very similar to lists in Python, **except that every element of an array must be of the same type** (in lists you can hold data which have different types), typically a numeric type like `float` or `int`. It is very much similar to a n-dimensional matrix which looks like:

<img src='../images/31.png'>

Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists. You can chose to create arrays of *n* dimensions (Python list is an array of pointers to Python objects, at least $4$ bytes per pointer plus $16$ bytes for even the smallest Python object; $4$ for type pointer, $4$ for reference count, $4$ for value  and the memory allocators rounds up to $16$. A NumPy array is an array of uniform values -- single-precision numbers takes $4$ bytes each, double-precision ones, $8$ bytes). 

#### Creating NumPy arrays

The syntax of creating a NumPy array is:

`numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)`

Here, the arguments 

- `object`: Any object exposing the array interface 
- `dtype`: Desired data type of array, optional
- `copy`: Optional. By default (true), the object is copied
- `order`: C (row major) or F (column major) or A (any) (default)
- `subok`: By default, returned array forced to be a base class array. If true, sub-classes passed through
- `ndim`: Specifies minimum dimensions of resultant array

Lets see how you can create a simple array using NumPy by first importing the package `numpy as np`

```python
import numpy as np
a = np.array([1,2,3,4])               # creates a 1-dimensional array
b = np.array([[1,2,3,4], [5,6,7,8]])    # creates a 2-dimensional array
print(a)
print('----')
print(b)
```
Its output will be

```python
[1 2 3 4]
----
[[1 2 3 4]
 [5 6 7 8]]

```

#### Advantages of using NumPy

- Absolutely free since open-sourced 
- Faster access in reading and writing items
- Time and space complexity of tasks is much lower when compared with traditional data structures
- Has a lot of built-in functions for linear algebra

### 1.2 Important array features

***

Now that you know how to create a NumPy array, let us look at the most essential features of one and discuss them in details. We will be taking two arrays to illustrate the features 
~~~python
a = np.array([1,2,3,4])
b = np.array([[1,2,3,4], [5,6,7,8]])
~~~
The attributes of both the arrays `a` and `b` are discussed below:


- **Shape**: It returns a tuple consisting of array dimensions i.e. tells us how many items are present in each dimension and can be found using the `.shape` attribute of the `ndarray` object.

![2](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-405/b44ac493-9ec3-4794-8f9f-37ca39cd1772/file.png)

*Note that the 1-D array `a` has shape of $(4, )$ and not $(4, 1)$*


-  **Dimensions**: It gives the number of dimensions and can be found using the `.ndim` attribute of `ndarray` object. 

![3](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-366/127483a2-efe2-42e0-874f-5e09924b8369/file.png)



- **Size**: It tells the total number of items in the array as a whole. More precisely it is the product of the elements of the `.shape` attribute of the array.  


![4](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b314/c2a65e46-2be0-4b9f-8447-9aaa4e8a7910/file.png)
 

- **Datatype**: As the name suggests, it informs about the type of data in the array. Since a NumPy array consists of homogeneous data only, you will get only a single `dtype`. 


![5](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b626/ff1bdaa0-b24a-4e1a-bb7c-e49dad2a7712/file.png)


NumPy offers support to a much greater variety of numerical types than base Python does like `int8`, `int16`, `float32`, `float16`, `bool_`, `complex_` etc.

- **Itemsize**: It represents the number of bytes in each element of the array.

![6](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-760/c8ac814a-0a8a-4fe5-b364-a298a506a00a/file.png)


With the help of the following attributes you can get the necessary information of the NumPy array as a whole along with its elements.

## Check attributes 

In this task you will check out the important attributes for a NumPy array

### Instructions
- Make a Numpy array of the first $10$ natural numbers using the `np.arange(1, 11)` command (you will learn how to create such forms later on during the course)
- Check out its dimension and save it as `dim`
- Reshape the above array into a $(5, 2)$ array using the `.reshape(5, 2)` on the previous array. You have to create a new array altogether for this operation so do not forget to assign to a new variable `reshaped`
- Check out the new dimension and save it as `new_dim`
- Print out `dim` and `new_dim`

In [20]:
# Code starts here

# initialize NumPy array
array = np.arange(1, 11)

# check dimensions
dim = array.ndim
print(dim)

# reshaped array
reshaped = array.reshape(5,2)

# check shape
new_dim = reshaped.ndim
print(new_dim)

# Code ends here

1
2


### Hints
- Initialize the NumPy array using `array = np.arange(1,11)`
- To check dimensions of the above array use `array.ndim`
- Now to reshape to $5*2$ shape do it using `reshaped = array.reshape(5,2)`
- To check dimensions for `reshaped` use `reshaped.ndim`

## Quiz

1. By which method/attribute can you check the shape of NumPy array? 
    
    a. shape
    
    b. ndim
    
**ANS**: a. shape


2. Mathematically, what does the size attribute of NumPy ndarray equals to?

    a. Product of the elements of shape attribute
    
    b. Length of the NumPy ndarray
    
**ANS**: a. Product of the elements of shape attribute


3. Are all the elements inside a NumPy ndarray of the same data type?

    a. TRUE
    
    b. FALSE
    
**ANS**: a. TRUE


4. Which attribute is used to get the number of bytes in each element of the array?

    a. itemsize
    
    b. size
    
**ANS**: a. itemsize

## Chapter 2: Array creation routines with NumPy

***

### Description: In this chapter you will learn about different ways to create NumPy arrays

### 2.1 From low-level ndarray constructor

***

You already know creating a NumPy array using the `numpy.array()` command. Now, lets look at some of the other ways to make NumPy arrays taking the help of the low-level `ndarray` constructor (we will be using `np` as an alias for `numpy`).

| Command | Description | Example |
| --- | --- | --- |
| `np.empty()` | Creates an uninitialized array of specified shape and dtype | ![7](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-218/80ec8866-348b-446d-b91c-adb5ff5eba2c/file.png) |
| `np.zeros()` | Creates a new array of specified size, filled with zeros | ![8](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-543/9a93ab5e-a87e-4f7c-a519-0de3884ab293/file.png) |
| `np.ones()` | Creates a new array of specified size and type, filled with ones | ![9](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b184/9bae5d65-59a6-44c3-9602-3f682f91fbaa/file.png) | 
| `np.full()` | Creates a new array of given shape and type, filled with  a constant value | ![10](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b860/24d442b5-fbb7-44af-90f3-81c952428d58/file.png) | 
| `np.eye()` | Creates a 2-D array with ones on the diagonal and zeros elsewhere | ![11](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b16/c5b276f8-5015-41dd-a634-fc682cd4b744/file.png) | 

## Create your own array

In this task you will practice creating new arrays of different types

### Instructions
- Create and print out a $4$x$4$ identity matrix with dtype `float32`

In [21]:
# Code starts here

# identity 4x4 matrix
array = np.eye(4, dtype='float32')

# display
print(array)

# Code ends here

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


### Hints
- Create $4x4$ identity matrix with 'float32' dtype as `array = np.eye(4, dtype='float32')`

### 2.2 From existing data

***

New NumPy arrays can also be created from already existing ones. Let us look at some ways of doing so.


1. **np.asarray( )**: This command is very similar to the **np.array()** command. Some examples are:

- Converting a list to array

![12](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-853/d4094c58-977f-4ea7-ac74-ea607d35fc58/file.png)


- Converting tuples into array

![13](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-294/0a5d5f22-8304-4afe-8d3d-b4aca83e401a/file.png)


2. **np.fromiter( )**: This function creates a NumPy ndarray from an iterable. For example:

![14](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b372/efd3a28e-b9cb-4ba9-8ef8-5f8c0863f90a/file.png)



Here we are creating arrays from list `a`, tuple `b`, `range( )` and finally a string `d`. 

### 2.3 From numerical ranges

***

In vector mathematics, it is necessary to generate a set of numbers within some predefined range. You can create them easily with the help some NumPy functions. Lets look at some of them.

1. **np.arange( )**: It returns an array containing evenly spaced values within a given range. 

    **Syntax** :`numpy.arange(start, stop, step, dtype)`.

    Here, numbers are created in the range of $[start, stop-1]$. We can also specify steps and data type using the `step` and `dtype` hyperparameters.


![15](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b831/46d42070-c3d9-4d59-b2b2-bf94f156646d/file.png)



In the above example, we first created an array from $1$ to $19$ with step size $1$ and dtype`int32` and in the second one we created one from $1$ to $19$ with step size $2$ and dtype `float32`.

2. **np.linspace( )**:  It also returns an array within a range but not according to the step size as in the case of `.arange()` but according to the number of values we want within that range. 

    **Syntax**: `numpy.linspace(start, stop, num, endpoint, retstep, dtype)`

    Here, `start` and `stop` means the same as in **.arange( )** but the difference lies in `num`, which gives us the number of equally spaced numbers you want to insert within the range $[start, stop-1]$. The `endpoint` argument generally is by default set to the `stop` value (try changing it for some interesting results)


![16](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-221/a6ec92e5-fde4-4c17-b67e-bf96bbb543fd/file.png)



In the above example we have created $100$ evenly spaced numbers in the range $[1, 20]$

3. **np.logspace( )**: This function returns an array containing numbers that are evenly spaced on a log scale. Start and stop endpoints of the scale are indices of the base, usually 10. 

    **Syntax**: `numpy.logspace(start, stop, num, endpoint, base, dtype)`

    Here, the range of values is $[{base}^{start}, {base}^{stop}]$ with `num` being the number of equally spaced values on **log** scale within the range.

![17](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b842/8e4d451a-7dec-45ad-ab3a-ae62bccd28f0/file.png)


In the example above, there are $100$ values in the range of $[10^0, 10^2]$. 

**There are many more methods to create NumPy arrays and we have listed down a few which you will be frequently using. If you're motivated enough then go its official [tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html)**

**Feedback:** Add tasks for 2.2 and 2.3

**Status**
- Added single task for both topics

## Store marks and IDs as arrays

In this task you are going to store the marks and IDs of three students as NumPy arrays.

### Instructions
- The marks are $[20, 30, 40]$ and IDs are $[0,2,4]$
- Store array for marks in a variable `marks` and the one for IDs in another variable `ids` 
- Display both of them

In [22]:
# import packages
import numpy as np

# Code starts here

# marks array
marks = np.array([20, 30, 40])
# ID array
ids = np.linspace(0,4,3)

# display marks and id array
print(marks)
print(ids)

# Code ends here

[20 30 40]
[0. 2. 4.]


### Hints
- Store the array for marks as `marks = np.array([20, 30, 40])`
- Store the array foi IDs as `ids = np.linspace(0,4,3)`

## Quiz

***

1. What is the output of the follwing code snippet?
```python
import numpy as np
a = np.arange(9).reshape(3,3)
print(a[1,1])
```

    a. 2
    
    b. 3
    
    c. 4
    
    d. 5
    
**ANS:** c. 4

**Explaination:** Elements from $0$ to $8$ will be in the array and the element in the secnd row and second column will be $4$


2. What does the following code snippet display?

```python
import numpy as np
print(np.eye(5))
```

    a. 5x5 Identity matrix
    
    b. 2x2 Identity matrix
    
    c. 3x3 Identity matrix
    
    d. 5x5 matrix consisting of 1s
    
**ANS**: a. 5x5 Identity matrix


3. Can you iterate over a NumPy ndarray using a for loop in the same manner as you do with Python lists?

    a. YES
    
    b. NO
    
**ANS**: a. YES


4. What does np.arange(5) do?

    a. Creates an array from 0-4
    
    b. Creates an array from 1-5
    
**ANS** a. Creates an array from 0-4

## Chapter 3: Power of Numpy

***

### Description: In this chapter you will learn about the three concepts- Indexing, Vectorization and Broadcasting, that lends supreme power to NumPy

### 3.1 Indexing and Slicing

***

You now know how to create different types of NumPy arrays and check their features. But how about accessing a particular value or taking a chunk of values from the array itself? In this topic we are going to discuss exactly that. **Like Python lists, index starts at $0$ for arrays as well**. 


### Indexing

Array indexing and slicing is exactly similar like Python indexing and slicing. It follows the same pattern of `array[start:stop:step]`. Let us look at an example to observe this behaviour.


![18](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b306/d1929d56-d636-41ee-acde-3206bb1667b2/file.png)


**Integer array indexing**: Integer array indexing allows you to construct arbitrary arrays using the data from another array.Let us understand from the example in the below image


![19](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-74/f309aee7-b500-4125-a915-15f058464a2c/file.png)


*Explaination*: The `print` statements in line numbers $4$ and $7$ yield the same result, likewise in lines $10$ and $13$.  In the first case, `a[[0, 1, 2], [0, 1, 0]]` essentially means we are indexing the the value in first row-first column, second row-second column and third row-first column, which is the same as `a[0, 0], a[1, 1], a[2, 0]]`. Similarly you should be able to deduce the logic behind the second case.

**Boolean indexing**: This type is generally used for comparison purposes. For ex: How about checking if how many numbers in the array are greater than $50$(say)? It can be performed using a simple comparison operator ($>=, >, ==, <, <=$)

A boolean index array is of the same shape as the array-to-be-filtered and it contains only `True` and `False` values. You can filter those you want using the concept of **masking** . For ex: If for some array `a` and boolean condition `condition = a > 2`, `a[condition]` will result in an array that contains only the numbers in array `a` that are greater than $2$. 

Let us look at an example below:
![20](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-581/3dd8b315-5a70-407f-a55b-6ee28ba26fd1/file.png)

## Filter imaginary numbers 

In this task you will be filtering out complex elements from an array.

### Instructions
- Create a $(4,)$ array with values $3$, $4.5$, $3 + 5j$ and $0$. Save it to a variable `array`
- Create a boolean condition `real` to retain only real numbers and use that to return the array of real numbers using the `.isreal(array)`. `.isreal(array)` returns a Boolean value which is `True` if the number inside the array is a real number and `False` otherwise
- Now apply this Boolean condition i.e. `real` on `array` using Boolean indexing (explained in the topic) by `array[real]`
- Similarly create a Boolean condition `imag` to retain only complex numbers which you can do it with `.iscomplex(array)`. This time create an array `imag_array` which contains only complex numbers using the Boolean condition `array[imag]`

In [23]:
# Code starts here

# initialize array
array = np.array([3, 4.5, 4+5j, 0])

# boolean filter
real = np.isreal(array)
real_array = array[real]
print(real_array)

# boolean filter
imag = np.iscomplex(array)
imag_array = array[imag]
print(imag_array)

# Code ends here

[3. +0.j 4.5+0.j 0. +0.j]
[4.+5.j]


### Hints
- To create a $(4,)$ array use `array = np.array([3, 4.5, 4+5j, 0])`
- The Boolean condition to filter out imaginary numbers is given by `real = np.isreal(array)`
- Array containing only real numbers is given by `real_array = array[real]`
- The Boolean condition to filter out real numbers is given by `imag = np.iscomplex(array)`
- Array containing only complex numbers is given by `real_array = array[imag]`

**Feedback:** More explaination on isreal() and iscomple() like what it returns etc

**Status**
- Added explaination for both methods

### 3.2 Vectorization

***

### What is vectorization?

Vectorization is the ability of NumPy by which we can perform operations on entire arrays rather than on a single element. When looping over an array or any data structure in Python, there’s a lot of overhead involved. Vectorized operations in NumPy delegate the looping internally to highly optimized C and Fortran functions, making for cleaner and faster Python code.

### Examples

You have already come across such an example in Boolean indexing. 

~~~python
import numpy as np
a = np.array([1,2,3,4,5,6,7])
print(a[a > 2])
~~~

The above codeblock will output the array `[3, 4, 5, 6, 7]` as it compares each element being greater than or less than $2$.

No let us look at how you can do some elementary vectorized operations like addition, subtraction, multiplication etc. Images below depict the type of operations and their corresponding output. 

1. **Addition**: Two ways to go about it, using either `+` or `np.add()`

![21](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-497/c8d6e3ff-c789-4787-817c-936580071960/file.png)


2. **Subtraction**: Two ways, `-` or `np.subtract()`

![22](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-901/8dff8b55-8b13-436e-a541-7a01f657ccf5/file.png)


3. **Multiplication**: Two ways, `*` or `np.multiply()`

![23](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b424/7f3d60e4-3f0d-4e04-a5ad-d8916beb4a06/file.png)


4. **Division**: Two ways, `/` or `np.divide()`

![24](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b763/d459e173-78d4-4820-953e-d147248c016d/file.png)


5. **Square root transform**: Use `np.sqrt()`

![25](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-607/2b9805c2-61fa-4fad-919f-e13c3ca7bc70/file.png)


6. **Log transform**: Use `np.log()`

![26](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-290/58ec3527-8852-45d1-8c3e-57429b7860fd/file.png)



### Aggregrate operations

Aggregration operations are those where we perform some operation on the entire array. Some commonly used aggregrate operations are listed below:

| Command | Description |
| --- | --- |
| a.sum() | Array-wise sum |  
| a.min() | Array-wise minimum value |  
| a.max(axis=0) | Maximum value of an array row |  
| a.cumsum(axis=1) | Cumulative sum of the elements |  
| a.mean() | Mean |  
| a.median() | Median |  
| a.corrcoef()	| Correlation coefficient |  
| np.std(a) | Standard deviation |  


### Array comparison

You already saw how you can perform element-wise comparison of array elements. With NumPy you also perform entire array comparisons. Use the command `np.array_equal()` for array comparison. It is illustrated with examples below:

![27](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b258/8a5175f1-0958-49e5-ab58-822c5acb000a/file.png)

![28](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b-964/bf46c8ca-694a-4e6c-8dab-552326056ac0/file.png)


### Understanding Axes notation

In NumPy, an axis refers to a single dimension of a multidimensional array. By changing `axis` you can compute across dimensions, whereas not specifying `axis` will result in computation over the entire array. 

![29](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b184/41947945-8f0c-4073-9796-62bd7439f74c/file.png)


In the image above you can calculate sum over the rows, columns and the entire array just by playing arounf the parameter `axis`. Try it more on arrays with 3 or more dimensions!

## BUY or SELL?

In this task you will combine vectorization and aggregration methods to solve a BUY-SELL problem. You have a range of prices at 3 intervals of a day for 2 consecutive days and you can buy and sell only once. First you will buy and then sell for the maximum profit. 

### Instructions 
- Initialize an array `prices = [[40, 35, 20], [21, 48, 70]]`. Heres, $40$ is the market price during the first interval of first day and $35$ during the second interval and so on. Similarly $21$ is the price during the first half of second day and $48$ during the second half
- Flatten the array with the `.flatten()` method. This method will convert your 2-D array into a 1-D array
- Find the minimum over the flattened array using `np.min()`, this will be your buying price
- Also find the index of buying price by first converting the array into a list using `list(array)` and then use the `.index(buying_price)` attribute to pick the index of the buying price
- Create a new subset starting from the index (created in the previous step) till the end of the array and find the maximum value using `np.max()` in it. This will be your selling price
- Find the difference between buying and selling price and print out its output

In [30]:
# initialize array for prices
price = [[40,35,20],[21,48,70]]
prices = np.array(price)

# Code starts here

# flatten the array
prices = prices.flatten()

# minimum price
buying_price = np.min(prices)

# index of buying price
index = list(prices).index(buying_price)

# create subset
new_prices = prices[index:]

# selling price
selling_price = np.max(new_prices)

# profit
profit = selling_price - buying_price

# display
print(profit)

# Code ends here

50


### Hints
- Initialize array `prices=np.array([[40,35,20],[21,48,70]])` containing the prices
- To flatten `prices` use `prices=prices.flatten()`
- To find the minimum buying price, use `buying_price=np.min(prices)`
- To calculate the index of `buying_price` in `prices` use `index = list(prices).index(buying_price)`
- To create the subset of prices after `buying_price` use `new_prices = prices[index:]`
- Calculate the selling price as `selling_price = np.max(new_prices)`
- Calculate the profit as `profit = selling_price - buying_price`

### 3.3 Broadcasting

***

### What is broadcasting?

Have you wondered how this operation `np.array([1, 2, 3]) + 4` was successfully carried out? It was all due to the broadcasting power of NumPy arrays. Lets discuss about this property in details.

In NumPy you do not need arrays to be of the same shape while performing operations among them until these conditions are satisfied:

- If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
- The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size $1$ in that dimension.
- The arrays can be broadcast together if they are compatible in all dimensions.
- After broadcasting, each array behaves as if it had shape equal to the element-wise maximum of shapes of the two input arrays.
- In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.



### Visual intuition of broadcasting

![30](https://storage.googleapis.com/ga-commit-live-stag-uat-data/account/b92/11111111-1111-1111-1111-000000000000/b833/fc772cca-8c7d-4a38-94d6-c954e1a2db88/file.png)

## Normalize a 5x5 random matrix

In this task you will normalize i.e. subtract the minimum value and divide by the range

### Instructions
- Create a random 5x5 matrix with the help of `np.random.random((5,5))` and save it as `Z`
- Calculate the minimum and maximum value with the help of `.min()` and `.max()` methods respectively for the 5x5 array. Save them as `Zmax` and `Zmin` for maximum and minimum respectively
- Now using the power of broadcasting subtract the minimum value from each element and divide by their range (maximum - minimum) to normalize. Save this normalized as `Z_norm` 
- Print the standardized array

In [37]:
np.random.seed(21)

# Code starts here

# create random 5x5 array
Z = np.random.random((5,5))

# minimum and maximum values
Zmax, Zmin = Z.max(), Z.min()

# normalize
Z_norm = (Z - Zmin)/(Zmax - Zmin)

# display
print(Z_norm)

# Code ends here

[[0.02856942 0.28190767 0.73703555 0.         0.19423813]
 [0.03072817 0.29577917 0.67690496 0.3019365  0.59225784]
 [0.05053881 0.89136471 0.1176393  0.16494209 0.49987233]
 [0.88746023 0.77705951 1.         0.77743756 0.38217481]
 [0.40796162 0.72901978 0.26247412 0.87734633 0.93959001]]


### Hints
- Create $5$x$5$ random matrix as `Z = np.random.random((5,5))`
- Calculate the maximum and minimum values as `Zmax, Zmin = Z.max(), Z.min()`
- TO normalize first subtract the minimum using `Z - Zmin` and then divide by the range (maximum- minimum) using `Zmax - Zmin` and save it as `Z_norm`

## Quiz

***

1. Which function returns its argument with a modified shape (array) and which one modifies the array itself?

    a. reshape,resize
    
    b. resize,reshape

    c. reshape2,resize

    d. all of the Mentioned
    
**ANS:** a. reshape, resize

**Explaination:** Go through them: [NumPy resize](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.resize.html) and [NumPy reshape](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.reshape.html)


2. If you have a NumPy array, in a variable `bats`, how would you access the data item at the 5th column, 2nd row?

    a. bats[2,5]

    b. bats(1,4)

    c. bats[1,4]

    d. bats[4,1]
    
**ANS:** c. bats[1,4] as indexing starts from $0$


3. The property by which NumPy can perform element-wise operation is called:

    a. Vectorization
    
    b. Broadcasting
    
**ANS**: a. Vectorization


4. What gets printed?

```python
import numpy as np

ary = np.array([1,2,3,5,8])

ary = ary + 1

print (ary[1])
```
    a. 1
    
    b. 2
    
    c. 3
    
    d. 5
    
**ANS**: c. 3

### 3.4 Why NumPy arrays over traditional Python data structures?

***

**Similarities between NumPy arrays and lists**

NumPy arrays and Python lists are in fact very similar to one another in many aspects. In fact Numpy arrays with more than a single dimension can be thought of as nested lists. Other common similarities are:
- Both are used for storing data
- Both are mutable
- Both can be indexed and iterated through
- Both can be sliced


**Advantages of using NumPy**

Numpy data structures perform better in:

- **`Size`**: Numpy data structures take up less space
- **`Performance`**: They have a need for speed and are faster than lists
- **`Functionality`**: NumPy have optimized functions such as linear algebra operations built in.


**Memory power of NumPy**

- **`For Python Lists`**  Let us assume a case where we add only integers to the list. For every new element, we need another **eight bytes** for the reference to the new object. The new integer object itself consumes $28$ bytes. The size of a list `lst` can be calculated with: $64 + 8len(lst) + \text{size of the elements} + 28len(lst)$

<img src='../images/list_mem.png'>

- **`For NumPy arrays`**: NumPy takes up less space. This means that an arbitrary integer array of length **n** in NumPy needs: $96 + 8n$ bytes. So more the numbers you need to store; the better you do with NumPy arrays.

<img src='../images/numpy_mem.png'>


**Speed of NumPy vs Python lists**

Code snippet Speed comparison while performing element-wise addition of two Python lists and element-wise addition of two NumPy.   

```python
np.random.seed(21)

# import packages
import time
import numpy as np

# initialize variable
num = 10000

# initialize lists
l1, l2 = [i for i in range(num)], [i+2 for i in range(num)]

# initialize arrays
a1, a2 = np.array(l1), np.array(l2)

# start time
start_list = time.time()

# element-wise addition for both lists
sum_lists = [i+j for i, j in zip(l1, l2)]

# stop time
stop_list = time.time()

# display time
print(stop_list - start_list)


# start time
start_array = time.time()

# element-wise addition for both arrays
sum_arrays = a1 + a2

# stop time
stop_array = time.time()

# display time
print(stop_array - start_array)
```

When the above code was run, it displayed time taken for list operation as `0.0020928382873535156`seconds against `0.00028586387634277344` seconds with NumPy arrays. So, NumPy arrays are more suited for these kind of operations.


**A Parting Thought: Don’t Over-Optimize**

When you are working with large data, its important to optimize code with the help of NumPy. However, there is a subset of cases where avoiding a native Python for-loop isn’t possible. But always remember, **"Premature optimization is the root of all evil."** Programmers may incorrectly predict where in their code a bottleneck will appear, spending hours trying to fully vectorize an operation that would result in a relatively insignificant improvement in runtime.

There’s nothing wrong with for-loops sprinkled here and there. Often, it can be more productive to think instead about optimizing the flow and structure of the entire script at a higher level of abstraction.

## Chapter 4: Practice problems

***

### Description: In this chapter you will practice some more problems using NumPy; the main goal being to impart a more practical understanding of the module.

### 4.1 Find roots of quadratic equation

***

You all know how to find out the roots of a quadratic equation $ax^2 + bx + c = 0$. The roots depend on the nature of the **discriminant** $b^2 - 4ac$. The image below shows the behaviour of the roots according to the discriminant

<img src='../images/roots.jpg' align=centre>

Well, one way to find out the roots is to first calculate the discriminant and then solve for the roots. Using NumPy's `.roots()` method you can calculate it in a single line. Now, this function can solve for any degree polynomial and since you will solve a quadratic equation, this function will take $3$ arguments which represent the co-efficients of the powers of $x$.

### Instructions
- The quadratic equation that you will be solving is $x^2 -4x + 4 = 0$
- Make an array `coeff` to store the coefficients (1,-4, 1) of the quadratic equation
- Pass this array `coeff` as argument to `np.roots()` method which will return the roots for the quadratic equation $x^2 -4x + 4 = 0$. Save it as `roots`
- Print out `roots` to see your result

In [2]:
# import packages
import numpy as np

# Code starts here

# co-efficients of x
coeff = np.array([1, -4, 4])

# roots of equation
roots = np.roots(coeff)

# display roots
print(roots)

# Code ends here

[2. 2.]


### Hints
- Store the coefficients as `coeff = np.array([1, -4, 4])`
- Calculate roots of the equation as `roots = np.roots(coeff)`

### 4.2 Convert from Centigrade to Fahrenheit

***

Now, lets convert degrees in Centigrade scale to Fahrenheit scale. In case you have forgot the formula; $\frac{C}{5} = \frac{F - 32}{9}$ where $C$ is the temperature in Centigrade scale and $F$ the temperature in Fahrenheit scale. 

You will be converting the temperatures $[0, 10, 25, 32, 80, 99.99]$ in Centigrades to Fahrenheit

### Instructions
- Store the given temperatures in an array `centigrade_temps`
- Define a function `convert` which takes one argument `C`(Centigrade temperature) that achieves the conversion from centigrade to fahrenheit. 
- The conversion formula is given, the body of `convert` must return the Fahrenheit temperature
- Use `convert` on `centigrade_temps` to convert Centigrade to Fahrenheit scale and store it as `fahrenheit_temps`

In [39]:
# Code starts here

# centigrade temperatures
centigrade_temps = np.array([0, 10, 25, 32, 80, 99.99])

# function for conversion
def convert(C):
    return 5*C/9 - 5*32/9

# display fahrehheit temperatures
fahrenheit_temps = convert(centigrade_temps)
print(fahrenheit_temps)

# Code ends here

[-17.77777778 -12.22222222  -3.88888889   0.          26.66666667
  37.77222222]


### Hints
- Store centigrade temperatures as `centigrade_temps = np.array([0, 10, 25, 32, 80, 99.99])`
- The function `convert` must be written as
```python
def convert(C):
    return 5*C/9 - 5*32/9
```
- Store the fahrenheit temperatures as
```python
fahrenheit_temps = convert(centigrade_temps)
```

### 4.3 Solving system of equations with NumPy

***

One of the more common problems in linear algebra is solving a matrix-vector equation. Here is an example. We seek the vector x that solves the equation $Ax = b$ where $A$ and $b$ are matrices and we are required to find the matrix $x$ which satisfies the equation.

The usual way to go forward is to first inverse the matrix $A$, and then multiply it with the transpose of $b$ (if it is not a column matrix). NumPy provides a simple method `.linalg.solve()` to find the matrix $x$.

### Instructions
- You have a system of linear equation $Ax = b$ where 

$$A = \begin{bmatrix}
    2 & 1 & 2 \\
    3 & 0 & 1 \\
    1 & 1 & -1
\end{bmatrix}$$

$$b = \begin{bmatrix}
        -3 \\
        5 \\
        -2
        \end{bmatrix}$$
- Use `.linalg.solve()` to solve for the system of linear equation. Go through the documentation for [numpy.linalg.solve](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.linalg.solve.html) to better understand its behaviour
- Check that your solution is correct with the function `np.allclose(np.dot(A,x), b)` and save it to a variable `check`. Print out `check` to see if it is `True` or else try again!

In [40]:
# Code starts here

# initialize matrix A and b
A = np.array([[2,1,2], [3,0,1], [1,1,-1]])
b = np.array([-3, 5, 2])

# Solve for x
x = np.linalg.solve(A, b)

# Check solution
check = np.allclose(np.dot(A,x), b)
print(check)

# Code ends here

True


### Hints
- The matrix `A` is given by `A = np.array([[2,1,2], [3,0,1], [1,1,-1]])`
- The matrix `b` is given by `b = np.array([-3, 5, 2])`
- Calculate `x` as `x = np.linalg.solve(A, b)`
- The condition for `check` is given by `check = np.allclose(np.dot(A,x), b)`

### 4.4 Finding a non-singular 3x3 matrix

***

Take a pen and paper for this task to find a $3×3$ nonsingular matrix $A$ satisfying $3A=A^2+AB$, where $$B = \begin{bmatrix}
2 & 0 & -1 \\
0 & 2 & -1 \\
-1 & 0 & 1
\end{bmatrix}$$


### Instructions
- First assume that $A$ is a non-singular matrix
- Next resolve the equation $3A = A^2 + AB$ until it reduces to a simpler form for you to carry out simple array addition
- Array $A$ should be coming as $A = 3I - B$ where $I$ is the Identity matrix
- Initialize array `B` using `np.array()` and `I` using `np.identity()`
- Save resultant array as `A` which is given mathematically by $A = 3I - B$ 
- Display matrix $A$

In [42]:
# Code starts here

# initialize array A and Identity matrix
B = np.array([[2,0,-1], [0,2,-1], [-1,0,1]])
I = np.identity(3)

# calculate result
A = 3*I - B
print(A)

# Code ends here

[[1. 0. 1.]
 [0. 1. 1.]
 [1. 0. 2.]]


### Hints
- The array `B` is given by `B = np.array([[2,0,-1], [0,2,-1], [-1,0,1]])`
- The identity matrix `I` is given by `I = np.identity(3)`
- The array `A` is given by `A = 3*I - B`

## Concept Level Quiz

1. What gets printed?

```python
import numpy as np

a = np.array([1,2,3,5,8])
b = np.array([0,3,4,2,1])
c = a + b
c = c*a

print (c[2])
```
    a. 21
    
    b. 2
    
    c. 3
    
    d. 10
    
**ANS**: a. 21


2. What gets printed?

```python
import numpy as np

a = np.array([[1,2,3],[0,1,4]])
b = np.zeros((2,3), dtype=np.int16)
c = np.ones((2,3), dtype=np.int16)
d = a + b + c
print(d[1,2])
```
    a. 5
    
    b. 6
    
    c. 10
    
    d. 1
    
**ANS**: 5


3. What gets printed?

```python
import numpy as np

a = np.array([1,2,3,4,5])
b = np.arange(0,10,2)
c = a + b
print (c[4])
```
    a. 12
    
    b. 13
    
    c. 14
    
    d. 15
    
**ANS**: b. 13


4. What gets printed?

```python
import numpy as np

a = np.array([[0, 1, 2], [3, 4, 5]])
b = a.sum(axis=1)
print (b)
```
    a. [3 12]
    
    b. [3 2 10]
    
    c. []
    
    d. Error
    
**ANS**: a. [3 12]


5. What gets printed?

```python
import numpy as np

a = np.zeros(6)
b = np.arange(0,10,2)
c = a + b
print (c[4])
```
    a. 0
    
    b. 2
    
    c. 4
    
    d. Error
    
**ANS**: d. Error


6. What is the output?

```python
a = np.array([[0, 1, 0], [1, 0, 1]])
a += 3
b = a + 3
print (a[1,2] + b[1,2])
```

    a. 11
    
    b. 10
    
    c. 9
    
    d. Error
    
**ANS**: a. 11


7. What gets printed?

```python
a = np.array([[0, 1, 2], [3, 4, 5]])
b = a.sum(axis=1)
print (b)
```
    
    a. 11
    
    b. [3 12]
    
    c. [3 11]
    
    d. [12]
    
**ANS**: b. [3 12]


8. Do NumPy arrays take up less memory as compared to Python lists?

    a. YES
    
    b. NO
    
**ANS**: a. YES


9. Which of the following operations is not common between Python lists and NumPy arrays?

    a. Indexing
    
    b. Slicing
    
    c. Vectorization
    
    d. Iterating using for loops
    
**ANS**: c. Vectorization


10. Are all the elements in NumPy array homogeneous (of same type) in nature?

    a. YES
    
    b. NO
    
**ANS**: a. YES