# NUMPY

Numpy is a multidimensional array library in python. So you can use numpy to store all sorts of arrays from 1D arrays(which is just a single number) to any n dimensional arrays.<br>
Now obviously Pysthon has a great inbuild functionality called **lists** that can store multidimensional arrays with relative ease. So why use Numpy instead of Lists?
<table>
    <tr>
        <th><center><h1>Lists</h1></center></th>
        <th><center><h1>Numpy</h1></center></th>
    </tr>
    <tr>
        <th><img src="https://drive.google.com/uc?id=1OvF5kk1RPWo0NqB6XR_3YTVJ8Xvxe0g2" height=500 width=500 alt="Slow image"></th>
        <th><img src="https://drive.google.com/uc?id=13dHeRTUMYcpKSCLult0vl0m62s1Odh1_" height=500 width=500 alt="Fast image"></th>
    </tr>
</table>
<br><br>
<h3>So why are Lists slow and Numpy fast?</h3>

* **Numpy uses fixed types**: Each element in an array in numpy is only represented by default by a 32 bit size value(You can make it less). Whereas lists have to store 4 things for each element in a list. Object value(different for each type), Object Type(type of element), Reference Count(No of times it is referenced) and size of the element. This makes it faster to access and put data into numpy than for lists

* **No typechecking when iterating through objects**: When we are iterating through a Numpy array, you don't have to do typoe checking for each element as all are the same type. But in lists you have to check type of each element.

* **Numpy uses contiguous memory**: Lists don't use contiguous memory wheread Numpy uses contiguous memory like arrays in C. So accessing each element is faster in Numpy as you can just go one step forward. Whereas in Lists you have to map each element to it's memory and then access each memory location.(There are also further benefits like being able to use SIMD Vector Processing elements in the CPU but they are not in the scope of today's workshop).

* **Numpy allows usage of Broadcasting**: Numpy allows use of broadcasting which means that we can do operations with incompatible array sizes. We will get back to this later as this is an important aspect of numpy.

* **Numpy allows element wise operations**: Numpy allows us to do element by element operations whereas lists give errors. Ex: [1,2,3]+[4,5,6] gives error in lists but gives [5,7,9] in Numpy. This is also an important aspect of Numpy. 

* **Numpy supports vectorization**: Numpy allows us to use vectorization which significantly boost the performance of arrays. We'll take a look at this later.

## Applications of Numpy

Numpy is used everywhere when you have to use multi dimensional arrays in python. It can be used in:

- As a replacement for MATLAB
- Used for plotting using matplotlib
- It is used in backend by libraries like pandas, Connect4, etc.
- Used in Machine Learning
- And of course used all across Data Science

## Installing Numpy

Go over to your command prompt and as you already have python and pip installed just go over to yourcommand line and type in

**pip install numpy** 

and it will install the numpy library for you.

## Importing Numpy into your project

Now that we've installed Numpy, we have to import it into the project to work with it. The standard is to import numpy as np as it shortens all calls like numpy.arrays becomes np.arrays and it is the same across people's codes everywhere and it improves readabitlity of your code so we import numpy as np. 

**For those who don't know, use Shift+Enter or the run button on the top to run a cell**

In [1]:
import numpy as np

Before anything else let's take a look at the efficiency of numpy arrays over python lists

First let's look at the time gained

In [2]:
import time 

n = 10000000 
l1 = range(n) #2 python lists
l2 = range(n) 
arr1 = np.arange(n) #2 numpy 
arr2 = np.arange(n) 

tic = time.time() 

flist = [(i*j) for i,j in zip(l1, l2)] 

print("Time taken by Lists to perform multiplication:",  
      (time.time() - tic), 
      "seconds") 
   
tic = time.time() 

farray = arr1 * arr2 
 
print("Time taken by NumPy Arrays to perform multiplication:", 
      (time.time() - tic), 
      "seconds")

Time taken by Lists to perform multiplication: 1.2580628395080566 seconds
Time taken by NumPy Arrays to perform multiplication: 0.014640331268310547 seconds


Now let's take a look at the space taken side of things

In [3]:
import sys 

S= range(1000) 

print("Size of each element of list in bytes: ",sys.getsizeof(S)) 

print("Size of the whole list in bytes: ",sys.getsizeof(S)*len(S)) 

D= np.arange(1000) 

print("Size of each element of the Numpy array in bytes: ",D.itemsize) 

print("Size of the whole Numpy array in bytes: ",D.size*D.itemsize) 

Size of each element of list in bytes:  48
Size of the whole list in bytes:  48000
Size of each element of the Numpy array in bytes:  4
Size of the whole Numpy array in bytes:  4000


## Basics

- Defining arrays
- Array Functions

In [4]:
a=np.array([1,2,3])
a # it can be seen that the type of numpy arrays is array

array([1, 2, 3])

In [5]:
b=np.array([[1.0,2.0,3.0],[4.0,5.0,6.0],[7.0,8.0,9.0]]) # a complex 2D array of floats. Numpy represents 2.0 as 2.
b

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

Now let's take a look at some functions of the arrays to find the no. of dimensions, shape of the array, etc.

In [6]:
print("No. of Dims in a : "+str(a.ndim)+"\nNo. of Dims in b : "+str(b.ndim))

No. of Dims in a : 1
No. of Dims in b : 2


In [7]:
print("Shape of a : "+str(a.shape)+"\nShape of b"+str(b.shape))

Shape of a : (3,)
Shape of b(3, 3)


Now I would like to point something out here. Look at the shape of a. It gives (3,). This means that a is a vector. This is dangerous as it can cause errors in your code. So don't use these types of arrays. Instead use (3,1) arrays which is a column array. 

Now consider you have this array and you take it's transpose. It gives a (1,3) array in your mind but it gives a (,3) array. Now you try to use matrix multiplication. Now by the rules, a (3,1) array mult by (1,3) will give a (3,3) array. Watch what happens with this one.

In [8]:
c=a.T #taking a transpose
c.shape

(3,)

In [9]:
np.dot(a,c)#taking matrix multiplication

14

It gives a number. Instead let's redefine a and try this again

In [10]:
a=np.array([[1],[2],[3]])
a.shape

(3, 1)

Now lets apply the same operations:

In [11]:
c=a.T
c.shape

(1, 3)

In [12]:
np.dot(a,c)

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

Hence even later when using the random library of numpy, don't use vectory. Instead use column arrays.

Now let's get back to it. To find the data type of an array:

In [13]:
print("Data Type of a : "+str(a.dtype)+"\nData Type of b : "+str(b.dtype))

Data Type of a : int32
Data Type of b : float64


You can also change the type when definingthe array.
Ex:

In [14]:
a=np.array([[1],[2],[3]],dtype='int16')
a.dtype

dtype('int16')

In [15]:
a=np.array([[1],[2],[3]],dtype='int8')
a.dtype

dtype('int8')

We can also find the size of each item in the array(in bytes)

In [16]:
a.itemsize#8 bits from "int8" is 1 byte

1

In [17]:
b.itemsize#float 64 bits is 8 bytes as 64/8 is 8

8

We can also find the total size of the array using a.nbytes. Let's check if its correct

In [18]:
assert a.nbytes==a.size*a.itemsize #no error 

## Accesing/Changing some elements in rows, columns
Now let's see how we access elements, rows, colums etc.

In [19]:
d=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
d.shape

(2, 7)

In [20]:
d[1,5]#you can access elements by using d[i,j,k,...] just like normal arrays

13

In [21]:
d[1,:]#you can access a row by setting the y dim as all(:)

array([ 8,  9, 10, 11, 12, 13, 14])

In [22]:
d[:,2]#access a column by setting x as all(:)

array([ 3, 10])

In [23]:
d[1,-2]#we can also use negative indices to reference from the back

13

You can also cut slices by using start index and end index. Remember that the end index is not inclusive

We can use the array_equal function in numpy to check if both are equal or not

In [24]:
assert np.array_equal(list(range(9,13)),d[1,1:5])

Also you can change the step size between the nos.

In [25]:
d[1,1:-1:2]

array([ 9, 11, 13])

Also we can assign nos., rown and colums using this

In [26]:
d[1,:]=[9, 10, 11, 12, 13, 14,15]
d

array([[ 1,  2,  3,  4,  5,  6,  7],
       [ 9, 10, 11, 12, 13, 14, 15]])

Let's try with a 3d array

In [27]:
b=np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
b

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [28]:
b[1]

array([[5, 6],
       [7, 8]])

In [29]:
b[1][0]

array([5, 6])

In [30]:
b[1,0,1]

6

In [31]:
b[1]=[[9,10],[11,12]]
b

array([[[ 1,  2],
        [ 3,  4]],

       [[ 9, 10],
        [11, 12]]])

## Different Types of Inbuilt Arrays

Numpy allows us to be able to define some common arrays used like the all zeros, all ones, etc. array. Let's take a look at some of them.

As i said earlier,dont pass values like 3 in the shape fo these arrays as that will mess up your code.

First let's look at the zeros function in numpy

In [32]:
a=np.zeros(5) #Don't do this as it becomes a vector and we have discussed this earlier
a.shape

(5,)

In [33]:
# a=np.zeros(5,1) #also dont do this. as this will give an error. We pass a tuple that is the shape 
#                   into the function so there will be 2 paranthesis

In [34]:
a=np.zeros((5,5)) #a correct example of 
print(a)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


In [35]:
a=np.zeros((5,5,5),dtype="int8")
print(a)

[[[0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]]

 [[0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]]

 [[0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]]

 [[0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]]

 [[0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]
  [0 0 0 0 0]]]


Now, let's take a look at an all 1s matrix using ones function. 

We can also specify datatypes in each of these like when defining

In [36]:
a=np.ones((5,5),dtype="int8")
print(a)

[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


If you want any other number, there are 2 ways. You can multiply a ones array by that no. or use the np.full function

In [37]:
a=5*np.ones((5,5))
b=np.full((5,5),5)
print("a = "+str(a)+"\nb = "+str(b))
assert np.array_equal(a,b)

a = [[5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]]
b = [[5 5 5 5 5]
 [5 5 5 5 5]
 [5 5 5 5 5]
 [5 5 5 5 5]
 [5 5 5 5 5]]


There is another function named full_like which takes another array as input and assigns all nos the same value

In [38]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Original array:"+str(a))
a=np.full_like(a,5)
print("\nAfter full_like:"+str(a))

Original array:[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]

After full_like:[[5 5 5 5 5 5 5]
 [5 5 5 5 5 5 5]]


Or you can just use the a.shape property with the full function

In [39]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Original array:"+str(a))
a=np.full(a.shape,5)
print("\nAfter full_like:"+str(a))

Original array:[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]

After full_like:[[5 5 5 5 5 5 5]
 [5 5 5 5 5 5 5]]


You will need to generate random nos. later on in machine learning as setting it all to 0 or 1 is not good enough. You can use the numpy.random.rand(generates samples from a uniform distribution (in the range [0,1)). and numpy.random.randn(generates samples from the normal distribution) function for that

This is how they look link
<img src="randvsrandn.png">

In [40]:
a=np.random.rand(2,3)
print(a)

[[0.17398503 0.88824055 0.76628661]
 [0.38719241 0.49153998 0.99837073]]


In [41]:
a=np.random.randn(2,3)
print(a)

[[-1.06330097 -0.73158693 -1.46685323]
 [-0.22090452 -1.28561888 -0.54978944]]


Now the thing to remember is that in random functions you dont pass in tuples so if you want to use a tuple (like shaoe of a matrix) the you will have to either manually pass in the dimensions or use np.random.random_samples function

In [42]:
a=np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print("Shape of a = "+str(a.shape))
# b=np.random.randn(a.shape) #Won't work. will give error
b=np.random.randn(a.shape[0],a.shape[1])
print("\nb(with randn)= "+str(b))
b=np.random.random_sample(a.shape)
print("\nb(with random_sample)= "+str(b))

Shape of a = (2, 7)

b(with randn)= [[-0.4417382   0.19353968 -0.73147385  1.5844313   0.78626688 -1.08971302
   1.77074155]
 [-1.11608753  0.41533084  0.97514621  1.81895245  0.36978065 -0.22701105
   1.37679962]]

b(with random_sample)= [[0.70425723 0.75975535 0.50529044 0.75689667 0.25463554 0.22810717
  0.02583974]
 [0.34646878 0.09064522 0.6052767  0.02320914 0.94409759 0.72301302
  0.38662015]]


We can also get random integers with the np.random.randint function.

In [43]:
np.random.randint(1,10,size=(3,3)) #gives nos between 1 and 10(10 not inclusive) with array size of 3x3

array([[9, 9, 9],
       [2, 9, 4],
       [7, 2, 8]])

You can also use the identity matrix which is a matrix of 0s with 1s at the diagonal

In [44]:
np.identity(5)#needs only 1 param as identity matrux is a sq matrix i.e. height=width

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

You can also repeat the array across different axes.

In [45]:
a=np.array([[1,2,3]])
print("Repeat across X axes : "+str(np.repeat(a,3,axis=0)))
print("\nRepeat across Y axes : "+str(np.repeat(a,3,axis=1)))

Repeat across X axes : [[1 2 3]
 [1 2 3]
 [1 2 3]]

Repeat across Y axes : [[1 1 1 2 2 2 3 3 3]]


## Copying Arrays Problem

As you all know Python is built on C, so arrays are referenced by pointers. So when you want to assign something to another array you have to be careful or they'll point to the same array. Here is an example to show that:

In [46]:
a=np.array([[1,2,3]])
# b=a #copy a to b(uses pointers and so both point to same value)
b=a.copy() #use copy function to make a copy of a and save to b
b[0,0]=4
print("b = "+str(b))
print("a = "+str(a))

b = [[4 2 3]]
a = [[1 2 3]]


## Mathematical functions in Numpy

Let's first take a look at element-wise mathematics that you can do with numpy.

In [47]:
a=np.array([[1,2,3,4]])
a=a+[[2,2,2,2]]
print(a)

[[3 4 5 6]]


This time a was a 4 dim vector. Now think if a was a 100x100 dim matrix or worse a 1000x1000 dim vector. How would you add 2 to it? 

Yes, you can use the full function and do with that but don't you think that you have to check the shapes every time you want to do anything with matrices. This lowers its simplicity adn Python is all about simplicity. So here is where Broadcasting comes into play.

If you use element wise operations to 2 matrices with different shapes ut they have same shape in at least 1 axis, Python repeats the,atrix along the other axis and makes them equal shape and then does the operation.

ex: if a=[[1,2,3,4]] and you add 2 to it. [[1,2,3,4]] has shape (1,4) and 2 as shape (1,1) as the x axes shape is the same, plython replicates it along the y axis and makes 2 to [[2,2,2,2]] and adds resulting in the same output

In [48]:
a=np.array([[1,2,3,4]])
a=a+2
print(a)

[[3 4 5 6]]


The same is applicable for all other element wise operations like -,*,/,exponential(2 asteriks), etc.

In [49]:
a=np.array([[1,2,3,4]])
a-=2
print(a)

[[-1  0  1  2]]


In [50]:
a=np.array([[1,2,3,4]])
a*=2
print(a)

[[2 4 6 8]]


Now let's take a look at trignometrical functions

In [51]:
a=np.array([[1,2,3,4]])
print("Sin of a = "+str(np.sin(a))+"\nCos of a = "+str(np.cos(a)))

Sin of a = [[ 0.84147098  0.90929743  0.14112001 -0.7568025 ]]
Cos of a = [[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]]


#### Matrix Multiplication
Now let's take a look at some of the linear algebra stuff

In [52]:
a=np.ones((3,2))
b=np.full((2,3),2)
print("a="+str(a)+"\n\nb="+str(b))
print("\na.b="+str(np.matmul(a,b)))

a=[[1. 1.]
 [1. 1.]
 [1. 1.]]

b=[[2 2 2]
 [2 2 2]]

a.b=[[4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]]


To find the determinant of a matrix, use np.linalg.det() function

In [53]:
c=np.identity(10)
np.linalg.det(c)# det of any identity mat is 1

1.0

To find the transpose of mat use mat.T

In [54]:
a=np.array([[1,2,3]])
a.T

array([[1],
       [2],
       [3]])

For higher level of matrix manipulation like trace of a matrix, eigen values, etc. visit this link for documentation of numpy linear algebra library<br>
<a href="https://docs.scipy.org/doc/numpy/reference/routines.linalg.html">Numpy Linear Algebra Docs Link</a>

## Statistics with Numpy

- Min
- Max
- Avg


In [55]:
data=np.array([[1,2],[3,4],[5,6]])
print(data)

[[1 2]
 [3 4]
 [5 6]]


In [56]:
np.min(data)

1

In [57]:
np.max(data)

6

We can also use the axis attriblue to decide along which axis to find the min/max/etc.

In [58]:
np.min(data,axis=0)

array([1, 2])

In [59]:
np.min(data,axis=1)

array([1, 3, 5])

In [60]:
print("Sum : "+str(np.sum(data))+"\nSum(along axis 0) : "+str(np.sum(data,axis=0))+"\nSum(along axis 1) : "+str(np.sum(data,axis=1)))

Sum : 21
Sum(along axis 0) : [ 9 12]
Sum(along axis 1) : [ 3  7 11]


## Reorganizing Arrays

1.Reshape: You can use reshape top reshape an array as long as the multiplication of all dimensions in that array are equal to the size of that array

In [61]:
a=np.array([[1,2],[3,4],[5,6],[7,8]])
print(a.shape)
a=a.reshape((8,1))
print("\n a(after reshape) = "+str(a))
a=a.reshape((1,8))
print("\n a(after reshape) = "+str(a))

# a=a.reshape((4,4)) #can't do this as 4*4!=4*2
# print("\n a(after reshape) = "+str(a))

(4, 2)

 a(after reshape) = [[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]

 a(after reshape) = [[1 2 3 4 5 6 7 8]]


We can also reshape it into 3 dimensions as long as it has the same value as size when all dims are multiplied together

In [62]:
assert 2*2*2==a.size
a=a.reshape((2,2,2))
print("a = "+str(a))

a = [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


2.vstack:To vertically any nos. of matrices then you can use the np.vstack(). It takes a list of arrays and stacks them up vertically

In [63]:
a=[1,2,3,4]
b=[5,6,7,8]
np.vstack([a,b,a,b])

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [1, 2, 3, 4],
       [5, 6, 7, 8]])

3.hstack: Same as vstack but stacks them horizontally

In [64]:
np.hstack([a,b,a,b])

array([1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8])

## Miscellaneous Functions

#### Load files from a file

If you don't want to use pandas, etc. you can use the genfromtxt function in numpy to make arrays from data

In [65]:
data=np.genfromtxt('data.txt',delimiter=",")
data

array([[  1.,  13.,  21.,  11., 196.,  75.,   4.,   3.,  34.,   6.,   7.,
          8.,   0.,   1.,   2.,   3.,   4.,   5.],
       [  3.,  42.,  12.,  33., 766.,  75.,   4.,  55.,   6.,   4.,   3.,
          4.,   5.,   6.,   7.,   0.,  11.,  12.],
       [  1.,  22.,  33.,  11., 999.,  11.,   2.,   1.,  78.,   0.,   1.,
          2.,   9.,   8.,   7.,   1.,  76.,  88.]])

In [66]:
data=data.astype('int32')#to take in data as integer instead
data

array([[  1,  13,  21,  11, 196,  75,   4,   3,  34,   6,   7,   8,   0,
          1,   2,   3,   4,   5],
       [  3,  42,  12,  33, 766,  75,   4,  55,   6,   4,   3,   4,   5,
          6,   7,   0,  11,  12],
       [  1,  22,  33,  11, 999,  11,   2,   1,  78,   0,   1,   2,   9,
          8,   7,   1,  76,  88]])

## Boolean Masking

Suppose you have an array and you want to turn all the elements that are greater than a value to 0 then you will need a boolean mask of the array. Let's see how to do that

In [67]:
mask=data<20 #gives a boolean mask that is true if less than 20 and false if greater than 20
mask

array([[ True,  True, False,  True, False, False,  True,  True, False,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True, False,  True, False, False, False,  True, False,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True, False, False,  True, False,  True,  True,  True, False,
         True,  True,  True,  True,  True,  True,  True, False, False]])

You can now pass this mask into the index and you get only the values that are true.

In [68]:
data[mask]

array([ 1, 13, 11,  4,  3,  6,  7,  8,  0,  1,  2,  3,  4,  5,  3, 12,  4,
        6,  4,  3,  4,  5,  6,  7,  0, 11, 12,  1, 11, 11,  2,  1,  0,  1,
        2,  9,  8,  7,  1])

To apply this ,mask just turn the boolean mask to integer and then multiply it element wise

In [69]:
mask=mask.astype('int32')
mask

array([[1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]])

In [70]:
data_masked=data*mask
data_masked

array([[ 1, 13,  0, 11,  0,  0,  4,  3,  0,  6,  7,  8,  0,  1,  2,  3,
         4,  5],
       [ 3,  0, 12,  0,  0,  0,  4,  0,  6,  4,  3,  4,  5,  6,  7,  0,
        11, 12],
       [ 1,  0,  0, 11,  0, 11,  2,  1,  0,  0,  1,  2,  9,  8,  7,  1,
         0,  0]])

You can also use bitwise operations to do bitwise operations to the mask

In [71]:
((data>20)&(data<100))# a map of data that is >20 and <100

array([[False, False,  True, False, False,  True, False, False,  True,
        False, False, False, False, False, False, False, False, False],
       [False,  True, False,  True, False,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False,  True,  True, False, False, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

## Advanced Indexing

Did you know that you can index a numpy array with lists? If not, let's take a look at that

In [72]:
a=np.array([1,2,3,4,5,6,7,8,9])
a[[1,2,5,8]]

array([2, 3, 6, 9])

To select it along an axis you can use the function np.any

In [73]:
np.any(data>50,axis=0)

array([False, False, False, False,  True,  True, False,  True,  True,
       False, False, False, False, False, False, False,  True,  True])

In [74]:
np.any(data>50,axis=1)

array([ True,  True,  True])

## Vectorization

Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently. Various operations are being performed over vector such as dot product of vectors which is also known as scalar product as it produces single output, outer products which results in square matrix of dimension equal to length X length of the vectors, Element wise multiplication which products the element of same indexes and dimension of the matrix remain unchanged.

In [75]:
import time
n=1000000
a=np.random.randn(n)
b=np.random.randn(n)

tic=time.time()
c1=0
for i in range(n):
    c1+=a[i]*b[i]
toc=time.time()
print("Time taken using basic for loops : "+str(1000*(toc-tic))+"ms")

tic=time.time()
c2=np.dot(a,b)
toc=time.time()
print("Time taken using Vectorized version : "+str(1000*(toc-tic))+"ms")

assert int(c1)==int(c2)

Time taken using basic for loops : 711.5030288696289ms
Time taken using Vectorized version : 0.9763240814208984ms


Vectorization is just using in built functions in numpy instead of for loops. You should avoid for loops wherever you can, except if necessary. As for loops take up time and are less efficient then vectorized versions. 

The difference in the finctions is that they directly do operations on the lists instead element by element. That is the magic of Python and why we should embrace it and use it in out projects