# NUMPY
- Author:Varun Anand Patkar
- Created On:14 Nov 2020

<a href = "https://colab.research.google.com/drive/1lt1w-wB7gAAROdnw-UZ7qGX_Gw0XAhdu?usp=sharing">Collab Notebook Link</a>

## So why are Lists slow and Numpy fast?
- **Numpy uses fixed types**: Each element in an array in numpy is only represented by default by a 32 bit size value(You can make it less). Whereas lists have to store 4 values for each element in a list. Object value(different for each type), Object Type(type of element), Reference Count(No of times it is referenced) and size of the element. This makes it faster to access and put data into numpy than for lists

- **No typechecking when iterating through objects**: When we are iterating through a Numpy array, you don't have to do type checking for each element as all are the same type. But in lists you have to check type of each element.

- **Numpy uses contiguous memory**: Lists don't use contiguous memory whereas Numpy uses contiguous memory like arrays in C. So accessing each element is faster in Numpy as you can just go one step forward. Whereas in Lists you have to map each element to it's memory and then access each memory location.(There are also further benefits like being able to use SIMD Vector Processing elements in the CPU but they are not in the scope of today's workshop).

- **Numpy allows usage of Broadcasting**: Numpy allows use of broadcasting which means that we can do operations with incompatible array sizes. We will get back to this later as this is an important aspect of numpy.

- **Numpy allows element wise operations**: Numpy allows us to do element by element operations whereas lists give errors. Ex: [1,2,3]+[4,5,6] gives error in lists but gives [5,7,9] in Numpy. This is also an important aspect of Numpy.

- **Numpy supports vectorization**: Numpy allows us to use vectorization which significantly boost the performance of arrays. We'll take a look at this later.

**Now, What do I mean by multidimensional data?. It's multiple data arranged in orderly manner. A good analogy is Cupboard. Multiple rows and columns that you can take and save data from and to. This is for 2 dimensional. 3 dimensional is a library. And so on.**

2D array<br>
<img src="https://drive.google.com/uc?id=1gOc9hY8F_Dvzc8J2YRBPj-XJqzRhZwO8" height=350 width=350></img>

In [None]:
import numpy as np

Before anything else let's take a look at the efficiency of numpy arrays over python lists

First let's look at the time gained

In [None]:
import time 

n = 10000000 #10 million/1 Crore
l1 = list(range(n)) #2 python lists
l2 = list(range(n)) 
arr1 = np.arange(n) #2 numpy 1D arrays with same values as l1 and l2
arr2 = np.arange(n) 

#Calculate time for Lists
tic = time.time() 

sqlist = [(i*j) for i,j in zip(l1, l2)] 

print("Time taken by Lists to perform multiplication:",(time.time()-tic),"seconds") 

#Calculate time for numpy arrays   
tic = time.time() 

sqarray = arr1 * arr2 
 
print("Time taken by NumPy Arrays to perform multiplication:",(time.time()-tic),"seconds")
print("Is sqlist==sqarray? :",np.array_equal(l1,arr1))

**Results may vary by your hardware specs**

Now let's take a look at the space taken side of things

In [None]:
import sys 

S=list(range(1000)) 

print("Size of each element of list in bytes: ",sys.getsizeof(S)) 

print("Size of the whole list in bytes: ",sys.getsizeof(S)*len(S)) 

D= np.arange(1000) 

print("Size of each element of the Numpy array in bytes: ",D.itemsize) 

print("Size of the whole Numpy array in bytes: ",D.size*D.itemsize) 

## Basics

- Defining arrays
- Array Functions

In [None]:
a=np.array([1,2,3])
a # it can be seen that the type of numpy arrays is array

In [None]:
b=np.array([[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]) # a complex 2D array of floats. Numpy represents 2.0 as 2.
b

Now let's take a look at some functions of the arrays to find the no. of dimensions, shape of the array, etc.

In [None]:
print("No. of Dims in a : "+str(a.ndim)+"\nNo. of Dims in b : "+str(b.ndim))

In [None]:
print("Shape of a : "+str(a.shape)+"\nShape of b"+str(b.shape))

**Now I would like to point something out here.** Look at the shape of a. It gives (3,). This means that a is a vector. This is dangerous as it can cause errors in your code. So don't use these types of arrays. Instead use (3,1) arrays which is a column array.

Now consider you have this array and you take it's transpose. It gives a (1,3) array in your mind but it gives a (,3) array. Now you try to use matrix multiplication. Now by the rules, a (3,1) array mult by (1,3) will give a (3,3) array. Watch what happens with this one.

In [None]:
c=a.T #taking a transpose
c.shape

In [None]:
np.dot(a,c)#taking matrix multiplication

It gives a number. Instead let's redefine a and try this again

In [None]:
a=np.array([[1],[2],[3]])
a.shape

Now lets apply the same operations:

In [None]:
c=a.T
c.shape

In [None]:
np.dot(a,c)

Hence even later when using the random library of numpy, **don't use vectors. Instead use column arrays.**

Now let's get back to it. To find the data type of an array:

In [None]:
print("Data Type of a : "+str(a.dtype)+"\nData Type of b : "+str(b.dtype))

You can also change the type when defining the array.
Ex:

In [None]:
a=np.array([[1],[2],[3]],dtype='int16')
a.dtype
#float8 does not exist. only int8,int16,int64,float16,float64

In [None]:
# Ex 1
# Try to define ans1 as int8 value with the same values in it [[1],[2],[3]]
ans1=0
#ADD YOUR CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader1(ans1)

We can also find the size of each item in the array(in bytes)

In [None]:
a.itemsize#8 bits from "int8" is 1 byte

In [None]:
b.itemsize#float 64 bits is 8 bytes as 64/8 is 8

In [None]:
#Ex 2
#Find total size of a using a single property and store it in ans1 and find total size by multiplying length of a*size of each element and save that in ans2
#Use Google to find what property for all 3 operations
ans1=ans2=0
#ADD YOUR CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader2(ans1,ans2)

## Accesing/Changing some elements in rows, columns
Now let's see how we access elements, rows, colums etc.

In [None]:
d=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
d.shape

In [None]:
d[1,5]#you can access elements by using d[i,j,k,...] just like normal arrays

In [None]:
d[1,:]#you can access a row by setting the y dim as all(:)

In [None]:
d[:,2]#access a column by setting x as all(:)

In [None]:
d[1,-2]#we can also use negative indices to reference from the back

You can also cut slices by using start index and end index. Remember that the end index is not inclusive

We can use the array_equal function in numpy to check if both are equal or not

In [None]:
#Ex 3
#Extract [9,10,11,12] from d and put that in ans1
ans1=0
#ADD CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader3(ans1)

Also you can change the step size between the nos.

In [None]:
d[1,1:-1:2]

Also we can assign nos., rown and colums using this

In [None]:
d[1,:]=[9, 10, 11, 12, 13, 14,15]
d

Let's try with a 3d array

In [None]:
b=np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
b

In [None]:
b[1]

In [None]:
b[1][0]

In [None]:
b[1,0,1]

In [None]:
b[1]=[[9,10],[11,12]]
b

## Different Types of Inbuilt Arrays

Numpy allows us to be able to define some common arrays used like the all zeros, all ones, etc. array. Let's take a look at some of them.

As i said earlier,dont pass values like 3 in the shape fo these arrays as that will mess up your code.

First let's look at the zeros function in numpy

In [None]:
a=np.zeros(5) #Don't do this as it becomes a vector and we have discussed this earlier
a.shape

In [None]:
# a=np.zeros(5,1) #also dont do this. as this will give an error. We pass a tuple that is the shape 
#                   into the function so there will be 2 paranthesis

In [None]:
a=np.zeros((5,5)) #a correct example of zeros
print(a)

In [None]:
a=np.zeros((5,5,5),dtype="int8")
print(a)

Now, let's take a look at an all 1s matrix using ones function. 

We can also specify datatypes in each of these like when defining

In [None]:
a=np.ones((5,5),dtype="int8")
print(a)

If you want any other number, there are 2 ways. You can multiply a ones array by that no. or use the np.full function

In [None]:
a=5*np.ones((5,5))
b=np.full((5,5),5)
print("a = "+str(a)+"\nb = "+str(b))
assert np.array_equal(a,b)

There is another function named full_like which takes another array as input and assigns all nos the same value

In [None]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Original array:"+str(a))
a=np.full_like(a,5)
print("\nAfter full_like:"+str(a))

Or you can just use the a.shape property with the full function

In [None]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Original array:"+str(a))
a=np.full(a.shape,5)
print("\nAfter full_like:"+str(a))

In [None]:
np.identity(5)#needs only 1 param as identity matrux is a sq matrix i.e. height=width

In [None]:
#Ex 4
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
ans1=ans2=ans3=0
#Create array of size (3,3) that has 6 in all elements using ones function and put it in ans1
#ADD YOUR CODE HERE

#END CODE HERE
#Create array of size (3,3) that has 6 in all elements using full function(use a) and put it in ans2
#ADD YOUR CODE HERE

#END CODE HERE
#Create array of size (3,3) that has 6 in all elements using full_like function(use a) and put it in ans3
#ADD YOUR CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader4(ans1,ans2,ans3)

You will need to generate random nos. later on in machine learning as setting it all to 0 or 1 is not good enough. You can use the numpy.random.rand(generates samples from a uniform distribution (in the range [0,1)). and numpy.random.randn(generates samples from the normal distribution) function for that

This is how they look like<br>
<img src="https://drive.google.com/uc?id=1gwjeWNDoY8JUf8orFJLKvI59tbbehp65">

In [None]:
a=np.random.rand(2,3)
print(a)

In [None]:
a=np.random.randn(2,3)
print(a)

Now the thing to remember is that in random functions you dont pass in tuples so if you want to use a tuple (like shaoe of a matrix) the you will have to either manually pass in the dimensions or use np.random.random_samples function

In [None]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Shape of a = "+str(a.shape))
# b=np.random.randn(a.shape) #Won't work. will give error
b=np.random.randn(a.shape[0],a.shape[1])
print("\nb(with randn)= "+str(b))
b=np.random.random_sample(a.shape)
print("\nb(with random_sample)= "+str(b))

We can also get random integers with the np.random.randint function.

In [None]:
np.random.randint(1,10,size=(3,3)) #gives nos between 1 and 10(10 not inclusive) with array size of 3x3

You can also repeat the array across different axes.

Here is for **axis=0**:

<img src="https://drive.google.com/uc?id=10Kn1Grv9nHTtvsMTIiuRP_gqGHq6I66Q" width=500></img>

Here is for **axis=1**:

<img src="https://drive.google.com/uc?id=1aAYIoO4m9YJTFBT8m4DVkSsV645epY1V" width=500></img>

In [None]:
a=np.array([[1,2,3]])
print("Repeat across X axes : "+str(np.repeat(a,3,axis=0)))
print("\nRepeat across Y axes : "+str(np.repeat(a,3,axis=1)))

## Copying Arrays Problem

As you all know Python is built on C, so arrays are referenced by pointers. So when you want to assign something to another array you have to be careful or they'll point to the same array. Here is an example to show that:

In [None]:
a=np.array([[1,2,3]])
# b=a #copy a to b(uses pointers and so both point to same value)
b=a.copy() #use copy function to make a copy of a and save to b
b[0,0]=4
print("b = "+str(b))
print("a = "+str(a))

## Mathematical functions in Numpy

Let's first take a look at element-wise mathematics that you can do with numpy.

In [None]:
#Elementwise
a=np.array([[1,2,3,4]])
a=a+[[2,2,2,2]]
print(a)

This time a was a 4 dim vector. Now think if a was a 100x100 dim matrix or worse a 1000x1000 dim vector. How would you add 2 to it? 

Yes, you can use the full function and do with that but don't you think that you have to check the shapes every time you want to do anything with matrices. This lowers its simplicity adn Python is all about simplicity. So here is where Broadcasting comes into play.

If you use element wise operations to 2 matrices with different shapes ut they have same shape in at least 1 axis, Python repeats the,atrix along the other axis and makes them equal shape and then does the operation.

ex: if a=[[1,2,3,4]] and you add 2 to it. [[1,2,3,4]] has shape (1,4) and 2 as shape (1,1) as the x axes shape is the same, plython replicates it along the y axis and makes 2 to [[2,2,2,2]] and adds resulting in the same output

In [None]:
a=np.array([[1,2,3,4]])
a=a+2
print(a)

The same is applicable for all other element wise operations like -,*,/,exponential(2 asteriks), etc.

In [None]:
a=np.array([[1,2,3,4]])
a-=2
print(a)

In [None]:
a=np.array([[1,2,3,4]])
a*=2
print(a)

Now let's take a look at trignometrical functions

In [None]:
a=np.array([[1,2,3,4]])
print("Sin of a = "+str(np.sin(a))+"\nCos of a = "+str(np.cos(a)))

#### Matrix Multiplication

In [None]:
a=np.ones((3,2))
b=np.full((2,3),2)
print("a="+str(a)+"\n\nb="+str(b))
print("\na.b="+str(np.matmul(a,b)))

To find the determinant of a matrix, use np.linalg.det() function

In [None]:
c=np.identity(10)
np.linalg.det(c)# det of any identity mat is 1

In [None]:
# Ex 5
#Create an array a [[1,2,3],[4,5,6],[7,8,9]] and calculate it's determiant and put it in ans1
ans1=-1
#ADD YOUR CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader5(ans1)

In [None]:
a=np.array([[1,2,3]])
a.T

## Statistics with Numpy

- Min
- Max
- Avg


In [None]:
data=np.array([[1,2],[3,4],[5,6]])
print(data)

In [None]:
np.min(data)

In [None]:
np.max(data)

In [None]:
np.min(data,axis=0)

In [None]:
np.min(data,axis=1)

In [None]:
print("Sum : "+str(np.sum(data))+"\nSum(along axis 0) : "+str(np.sum(data,axis=0))+"\nSum(along axis 1) : "+str(np.sum(data,axis=1)))

## Reorganizing Arrays

1.Reshape: You can use reshape top reshape an array as long as the multiplication of all dimensions in that array are equal to the size of that array


In [None]:
a=np.array([[1,2],[3,4],[5,6],[7,8]])
print(a.shape)
a=a.reshape((8,1))
print("\n a(after reshape) = "+str(a))
a=a.reshape((1,8))
print("\n a(after reshape) = "+str(a))

# a=a.reshape((4,4)) #can't do this as 4*4!=4*2
# print("\n a(after reshape) = "+str(a))

We can also reshape it into 3 dimensions as long as it has the same value as size when all dims are multiplied together

In [None]:
assert 2*2*2==a.size
#Ex 6
#Try to predict the (2,2,2) reshaped array without coding and answer in chat.Then try it by coding and save it in ans1
#ADD YOUR CODE HERE

#END CODE HERE
!pip install gdown
!gdown https://drive.google.com/uc?id=1poiO99Ebtz0OHQd4qxKa7tNBj7aiPDRo
import grader_numpy as g
print("\n\n")
g.grader6(ans1)

2.vstack:To vertically any nos. of matrices then you can use the np.vstack(). It takes a list of arrays and stacks them up vertically

In [None]:
a=[1,2,3,4]
b=[5,6,7,8]
np.vstack([a,b,a,b])

3.hstack: Same as vstack but stacks them horizontally

In [None]:
np.hstack([a,b,a,b])

## Vectorization

Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently. Various operations are being performed over vector such as dot product of vectors which is also known as scalar product as it produces single output, outer products which results in square matrix of dimension equal to length X length of the vectors, Element wise multiplication which products the element of same indexes and dimension of the matrix remain unchanged.

In [None]:
import time
n=1000000
a=np.random.randn(n)
b=np.random.randn(n)

tic=time.time()
c1=0
for i in range(n):
    c1+=a[i]*b[i]
toc=time.time()
print("Time taken using basic for loops : "+str(1000*(toc-tic))+"ms")

tic=time.time()
c2=np.dot(a,b)
toc=time.time()
print("Time taken using Vectorized version : "+str(1000*(toc-tic))+"ms")

assert int(c1)==int(c2)

Vectorization is just using in built functions in numpy instead of for loops. You should avoid for loops wherever you can, except if necessary. As for loops take up time and are less efficient then vectorized versions. 

The difference in the finctions is that they directly do operations on the lists instead element by element. That is the magic of Python and why we should embrace it and use it in out projects

**Go over to <a href="https://zzi.sh">zzi.sh</a> and write the code that I give you**