<a href="https://colab.research.google.com/github/CC-MNNIT/2018-19-Classes/blob/master/MachineLearning/2019_04_11_ML3_content/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2019 MNNIT Computer Club.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Numpy Concepts

**Learning Objectives:**
   + Why NumPy?
   + Learn the basics of Numpy Library, focusing on the following concepts:
      - ndarray
      - Vectorisation
      - BLAS
      - Basic Array Operations
          - Finding Shape of array
          - Extracting values
          - Extracting Multiple values (Slicing)
          - Reversing Array ::-1
       - Arithmetic Operations
       - Axis in context of array
       - Concatenating two np arrays
       - Matrix Mulitplication
       - Broadcasting

# What is NumPy ?

It is a low-level Python Library for **Numerical computing** (now available for some other language as well)

# Why NumPy exists?

- It adds support for multidimensional array objects and LOT MORE STUFF, which python inherently doesn't have.
- Python does have lists, but they are extremely inefficient for numerical computations.
- NumPy is **extremely efficient for handling numerical data** and the calculations on them, when compared to pure python.

In [0]:
import numpy as np

# now the numpy package can be used via 'np'

In [0]:
# Creating a numpy array

a = [1,2,3]
np.array(a)

## The inefficiencies of Python

1. Lists take up a lot of space, as they allow heterogenity
2. The for loop can be made faster, since we know that all our data is numeric.
3. As the default for loop is much more complex which, although, allows it to handle bizzare datatypes
4. However in most data science applications, datatype is float or int. Then why use such a complex implementation for such an easy task ?

In [0]:
# list can be heterogeneous (advantage of lists)
this_list = ["MNNIT", "CC", 100.0, ["ML",("March",2019)]]

In [0]:
# Numpy arrays are homogeneous
np.array([1,2,3,4])
np.array(["hello","world","MNNIT","CC"])

# following is not allowed
np.array([1,2,"hello"])

# numpy treated all the entries as strings

In [0]:
# But check this is out
np.array([1,2,"hello"],dtype=np.int32)

In [0]:
import random,timeit,sys



---


### Mulitplying two lists element-wise

In [0]:
a_list = [random.uniform(0,1) for _ in range(20)]
b_list = [random.uniform(0,1) for _ in range(20)]

In [0]:
%%timeit
c_list = []
for i in range(len(a_list)):
    c_list.append(a_list[i]*b_list[i])




---

### Mulitplying two numpy arrays element-wise

In [0]:
a = np.asarray(a_list)
b = np.asarray(b_list)
c = np.zeros((20,))

In [0]:
%%timeit
for i in range(len(a)):
  c[i] = a[i]*b[i]

In [0]:
print(c)

# Motivation to use Numpy

- Why learn numpy, since it is slower than pure python (as can be seen above)? 
- Why is Numpy version slower?

<center>
  
**Because creation of a numpy array is slower than creation of a similar sized python list**
  
</center>

In [0]:
%timeit a = [[2,3,5],[3,6,2],[1,3,2]]

In [0]:
%timeit a = np.array([[2,3,5],[3,6,2],[1,3,2]])

## Why use numpy ?

1. Numpy arrays are **densely packed** arrays of homogeneous type.
2. Python lists, by contrast, are arrays of pointers to **objects**, even when all of them are of the same type(in our case - numeric data). So, you get the benefits of **locality of reference**.
3. Also, most Numpy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. **(BLAS)**
4. The NumPy arrays takes **significantly less amount of memory** as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

> The speed boost depends on which operations you're performing, but a few orders of magnitude isn't uncommon in number crunching programs.

https://stackoverflow.com/a/8385658/6922149

### What are BLAS ?

**Basic Linear Algebra Sub-programs** : a set of precompiled low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication.

In the above element-wise multiplication was done by a BLAS not by python code.

In [0]:
# pure python can't beat this

%%timeit
c = a*b

### Numpy is more space efficient

In [0]:
py_list = [1,2,3,4,5,6]
numpy_arr = np.array([1,2,3,4,5,6])

sizeof_py_list = sys.getsizeof(1) * len(py_list)           # Size = 168
sizeof_numpy_arr = numpy_arr.itemsize * numpy_arr.size   # Size = 48
print("size of list = ",sizeof_py_list)
print("Size of numpy array = ",sizeof_numpy_arr)

### HENCE

1. Numpy is more space efficient than python lists
2. Numpy is faster for mathematical operations because of BLAS.

## When to use Numpy ?

- Numpy shines when you setup the arrays once.
And then perform a lot of mathematical calculations on them.
- It is because numpy uses BLAS to speed up matrix operations via vectorisation
- It may fail to outperform pure Python *if the arrays are small* because **the setup cost** can outweigh **the benefit of offloading the calculations to compiled C/Fortran functions(BLAS)**.

<center>
  
  <h2>Which is almost always the case with Machine Learning Solutions</h2>

</center>

# Basics of NumPy

## The ndarray object

The core datatype of numpy.

<center>

An ndarray is a (usually fixed-size) multidimensional **container** of items of the **same data type** and size

![alt text](https://www.tutorialspoint.com/numpy/images/ndarray.jpg)
  
 </center>

In [0]:
help(np.ndarray)

# Basic array operations

## Creating Arrays

### From list/tuple

In [0]:
np.array([1,2,3,4])

In [0]:
this_list = [8,90,23,98]
np.asarray(this_list)

In [0]:
this_tuple = (56,23,56,78,90)
np.asarray(this_tuple)

### Equi-spaced numbers

In [0]:
np.arange(10)

In [0]:
np.arange(1,10)

In [0]:
np.arange(1,10,2)

In [0]:
np.arange(start=1,stop=10,step=2.25)
#np.arange(start,stop, step)

# stop is excluded

In [0]:
np.linspace(start=1,stop=10,num=5)

### Random arrays

In [0]:
np.random.rand(5,2)

In [0]:
np.random.randn(5,2)

### Standard arrays

In [0]:
np.eye(7)

In [0]:
np.identity(7)

In [0]:
np.zeros(12)

In [0]:
np.zeros((3,4))

In [0]:
np.ones(12)

In [0]:
np.ones((3,4))

## Accessing array elements

**Array indices start at 0**

In [0]:
my_array = np.arange(100,0,-1)

print(my_array)
print("\nArray Shape is ",my_array.shape)

### Single Element

In [0]:
my_array[23]  #This gives us the value of element at index 23

### Multiple Elements (array slicing)

In [0]:
my_array[10:15]
# 10,11,12,13,14.

In [0]:
my_array[50:]

In [0]:
my_array[:50]

### Max Element

In [0]:
# Value 
my_array.max()

In [0]:
# Index
my_array.argmax()

### Min Element

In [0]:
# Value
my_array.min()

In [0]:
# Index
my_array.argmin()

### Slicing on higher dimension arrays

In [0]:
two_d_arr = np.array([[10,20,30], [40,50,60], [70,80,90]])
two_d_arr

In [0]:
two_d_arr[1][2] #The value 60 appears is in row index 1, and column index 2

In [0]:
two_d_arr[0,1]

In [0]:
two_d_arr[:1, :2]           # This returns [[10, 20]]
two_d_arr[:2, 1:]           # This returns ([[20, 30], [50, 60]])
two_d_arr[:2, :2]           #This returns ([[10, 20], [40, 50]])

## Concatenating Two Arrays

But first let's see what is an Axis (Plural Axes) :
>Axes are defined for arrays with more than one dimension.
A 2-dimensional array has two corresponding axes:
1. The first running vertically downwards across rows (axis 0)
2. The second running horizontally across columns (axis 1).

**Always think of axes as directions**



In [0]:
# summing across axis=1
x = np.arange(12).reshape((3,4))
print(x)

In [0]:
x.sum(axis=0)

In [0]:
x.sum(axis=1)

In [0]:
# Concatenating a,b

a = np.array([[1, 2],
              [3, 4]])

b = np.array([[5, 6]])

In [0]:
np.concatenate((a, b), axis=0)

In [0]:
np.concatenate((a, b.T), axis=1)

In [0]:
# axis = None will first flatten both the arrays and concat them
np.concatenate((a, b), axis=None)

# see yourself : np.array_split

## TypeCasting an array

In [0]:
float_array = np.array([1.4,7.2,3.2,6.9])

In [0]:
float_array.astype(np.int64)

## Array Attributes

In [0]:
my_array.dtype

In [0]:
my_array.shape

In [0]:
my_array.size

In [0]:
my_array.ndim

## Reshaping an Array

In [0]:
new_arr = my_array.reshape(10,10)
# creates a new array with shape (10,10) made from my_array
# doesn't alter my_array in any way

In [0]:
my_array

In [0]:
new_arr.shape

In [0]:
new_arr.reshape(new_arr.shape[0],-1)
# when -1 is supplied as a shape, numpy automatically understands that only first shape is to maintained and rest elements are to be laid

In [0]:
new_arr.reshape(4,-1)

In [0]:
# straigtens the array into a single row vector
new_arr.ravel()

## Arithmetic Operations


- Operator overloading internally implemented
- Mostly all arithmetic operations are done elementwise

In [0]:
a = np.array([[3,2,1],[4,50,6],[7,8,9]])
b = np.array([[1,1,1],[-1,-1,-1],[10,10,10]])

In [0]:
print("\nMultiplication \n",  a * b    )
print("\nAddition\n",         a + b    )
print("\nSubtraction\n",      a - b    )
print("\nDivision\n",         a / b    )
print("\nLog (base 10)\n",    np.log(a))

## Comparision operators on numpy arrays

- Returns a boolean array of same shape as the original array
- The comparision is applied **elementwise**

In [0]:
my_array > 20

## Statistical operations

In [0]:
# sum of all elements
np.sum(a)

In [0]:
# same as above (returns sum of all elements)
a.sum()

In [0]:
# std deviation
np.std(a)

In [0]:
np.mean(a)

In [0]:
np.median(a)

# Matrix Operations

## Matrix Multiplication

In [0]:
# mathematical matrix multiplication
np.dot(a,b)

## Matrix Transpose

In [0]:
# Transpose
a.T

## Determinant, Inverse.....

- Lots of other inbuilt methods in **np.linalg library**

In [0]:
# Determinant
np.linalg.det(a)

In [0]:
# Inverse of a matrix
np.linalg.inv(a)

#Broadcasting

- Broadcasting describes how numpy treats **arrays with different shapes during arithmetic operations.**
- Subject to **certain constraints**
    + the **smaller array is “broadcast” across the larger array** so that they have compatible shapes.
- Provides a means of **vectorizing array operations so that looping occurs in C instead of Python.**
- It does this **without making needless copies of data** and usually leads to efficient algorithm implementations.
- There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

In [0]:
a = np.array([[0.0,0.0,0.0],
              [10.0,10.0,10.0],
              [20.0,20.0,20.0],
              [30.0,30.0,30.0]]) 
b = np.array([1.0,2.0,3.0])

In [0]:
# b is broadcasted on a
a + b

# Additionally lookup the following

1. np.nonzero(), np.count_nonzero()
2. np.where()
3. creating numpy array from csv file np.genfromtxt()

---

Authored By [Dipunj Gupta](https://github.com/dipunj) | Report errors/typos as github issues.

---