<a href="https://colab.research.google.com/github/SamarjeetKaur/ML/blob/master/ML_Numpy_WS_2_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Numpy
###NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. 

#Operations using NumPy
###Mathematical and logical operations on arrays.

###Fourier transforms and routines for shape manipulation.

###Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation.

#NumPy – A Replacement for MatLab
###NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib (plotting library). This combination is widely used as a replacement for MatLab, a popular platform for technical computing.


###Numpy is the most basic and a powerful package for working with data in python.

###If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.

###Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

###So what does numpy provide?

###At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

###In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

###You might wonder, ‘I can store numbers and other objects in a python list itself and do all sorts of computations and manipulations through list comprehensions, for-loops etc. What do I need a numpy array for?’

###Well, there are very significant advantages of using numpy arrays overs lists.

In [4]:
#First numpy array
# Create an 1d array from a list
import numpy as np
list1 = [0,1,2,3,4]
arr1d = np.array(list1)

# Print the array and its type
print(type(arr1d))
arr1d

#> class 'numpy.ndarray'
#> array([0, 1, 2, 3, 4])

<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4])

###The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

###That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

###Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this:

In [0]:
#list1 + 2

In [6]:
print(arr1d+2)

[2 3 4 5 6]


#Another characteristic is that, once a numpy array is created, you cannot increase its size. To do so, you will have to create a new array. But such a behavior of extending the size is natural in a list.

#Nevertheless, there are so many more advantages. Let’s find out.

#So, that’s about 1d array. You can also pass a list of lists to create a matrix like a 2d array.

In [7]:
# Create a 2d array from a list of lists
list2 = [[0,1,2], [3,4,5], [6,7,8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

#You may also specify the datatype by setting the dtype argument. Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

#To control the memory allocations you may choose to use one of ‘float32’, ‘float64’, ‘int8’, ‘int16’ or ‘int32’.

In [8]:
# Create a float 2d array
list2 = [[0,1,2], [3,4,5], [6,7,8]]
arr2d_f = np.array(list2, dtype='float')
arr2d_f


array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

#The decimal point after each number is indicative of the float datatype. You can also convert it to a different datatype using the astype method.

In [9]:
# Convert to 'int' datatype
arr2d_f.astype('int')



array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [10]:
# Convert to int then to str datatype
arr2d_f.astype('int').astype('str')

array([['0', '1', '2'],
       ['3', '4', '5'],
       ['6', '7', '8']], dtype='<U21')

#A numpy array must have all items to be of the same data type, unlike lists. This is another significant difference.

#However, if you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.



In [11]:
#Example bool
# Create a boolean array
arr2d_b = np.array([1, 0, 10], dtype='bool')
arr2d_b


array([ True, False,  True])

In [12]:
#Example object type

# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a', 2.3], dtype='object')
arr1d_obj

array([1, 'a', 2.3], dtype=object)

#Finally, you can always convert an array back to a python list using tolist().



In [13]:
# Convert an array back to a list
arr1d_obj.tolist()

[1, 'a', 2.3]

#To summarise, the main differences with python lists are:

#Arrays support vectorised operations, while lists don’t.
#Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
#Every array has one and only one dtype. All items in it should be of that dtype.
#An equivalent numpy array occupies much less space than a python list of lists.

#Inspecting the size and shape of numpy array

In [14]:
# Create a 2d array with 3 rows and 4 columns
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
arr2

# shape
print('Shape: ', arr2.shape)

# dtype
print('Datatype: ', arr2.dtype)

# size
print('Size: ', arr2.size)

# ndim
print('Num Dimensions: ', arr2.ndim)

Shape:  (3, 4)
Datatype:  float64
Size:  12
Num Dimensions:  2


In [15]:
#You can extract specific portions on an array using indexing starting with 0, something similar to how you would do with python lists.

#But unlike lists, numpy arrays can optionally accept as many parameters in the square brackets as there is number of dimensions.
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')

arr2[:2, :2]


array([[1., 2.],
       [3., 4.]])

In [0]:
#list2[:2, :2]

In [17]:
#Additionally, numpy arrays support boolean indexing.

#A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. The values corresponding to True positions are retained in the output.
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
print(arr2)
b=arr2>3
print(arr2[b])


[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]
[4. 4. 5. 6. 5. 6. 7. 8.]


#Reversing an array works like how you would do with lists, but you need to do for all the axes (dimensions) if you want a complete reversal.

In [18]:
print(arr2[::-1])

[[5. 6. 7. 8.]
 [3. 4. 5. 6.]
 [1. 2. 3. 4.]]


In [19]:
print(arr2[::-1,::-1])

[[8. 7. 6. 5.]
 [6. 5. 4. 3.]
 [4. 3. 2. 1.]]


#Representing missing values and infinite?
# At times there are missing values in the dataset, so we can't just leave it blank


In [20]:
#Missing values can be represented using np.nan object, while np.inf represents infinite. Let’s place some in arr2d.

list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')

# Insert a nan and an inf
arr2[1,1] = np.nan  # not a number
arr2[1,2] = np.inf  # infinite
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., nan, inf,  6.],
       [ 5.,  6.,  7.,  8.]])

#How to compute mean, min, max on the ndarray?


In [21]:

list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
# mean, max and min
print("Mean value is: ", arr2.mean())
print("Max value is: ", arr2.max())
print("Min value is: ", arr2.min())

Mean value is:  4.5
Max value is:  8.0
Min value is:  1.0


#However, if you want to compute the minimum values row wise or column wise, use the np.amin version instead.

In [22]:
# Row wise and column wise min
print("Column wise minimum: ", np.amin(arr2, axis=0))
print("Row wise minimum: ", np.amin(arr2, axis=1))

Column wise minimum:  [1. 2. 3. 4.]
Row wise minimum:  [1. 3. 5.]


#How to create a new array from an existing array?
#If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

#That means, if you make any changes to the new array, it will reflect in the parent array as well.

#So to avoid disturbing the parent array, you need to make a copy of it using copy(). All numpy arrays come with the copy() method.

In [23]:
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')

arr2a = arr2[:2,:2]  
print(arr2a)
arr2a[:1, :1] = 100  # 100 will reflect in arr2
arr2

[[1. 2.]
 [3. 4.]]


array([[100.,   2.,   3.,   4.],
       [  3.,   4.,   5.,   6.],
       [  5.,   6.,   7.,   8.]])

In [24]:
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101  # 101 will not reflect in arr2
arr2

array([[100.,   2.,   3.,   4.],
       [  3.,   4.,   5.,   6.],
       [  5.,   6.,   7.,   8.]])

In [25]:
s="Syntax"
t="Wrong"

l='n'
a='N'

print( l in s)

print( l in (s or t))

print( a in (s and t))

True
True
False


#Reshaping and Flattening Multidimensional arrays

Reshaping is changing the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.

Flattening, however, will convert a multi-dimensional array to a flat 1d array. And not any other shape.

First, let’s reshape the arr2 array from 3×4 to 4×3 shape.

In [26]:
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
print(arr2)
print(arr2.shape)

[[1. 2. 3. 4.]
 [3. 4. 5. 6.]
 [5. 6. 7. 8.]]
(3, 4)


In [27]:
arr2.reshape(4,3)

array([[1., 2., 3.],
       [4., 3., 4.],
       [5., 6., 5.],
       [6., 7., 8.]])

In [0]:
#arr2.reshape(2,2)

In [29]:
arr2.reshape(2,6)

array([[1., 2., 3., 4., 3., 4.],
       [5., 6., 5., 6., 7., 8.]])

#What is the difference between flatten() and ravel()?
There are 2 popular ways to implement flattening. That is using the flatten() method and the other using the ravel() method.

The difference between ravel and flatten is, the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

In [30]:
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='int')
arr2.flatten() 

array([1, 2, 3, 4, 3, 4, 5, 6, 5, 6, 7, 8])

In [31]:
b1=arr2.flatten()
b1[0]=3
print(b1)
print(arr2)

[3 2 3 4 3 4 5 6 5 6 7 8]
[[1 2 3 4]
 [3 4 5 6]
 [5 6 7 8]]


In [32]:
b2=arr2.ravel()  #ravel affects the original array
b2[0]=3
print(b2)
print(arr2)

[3 2 3 4 3 4 5 6 5 6 7 8]
[[3 2 3 4]
 [3 4 5 6]
 [5 6 7 8]]


#How to create sequences, repetitions and random numbers using numpy?

In [33]:
# Lower limit is 0 be default
print(np.arange(5))  

# 0 to 9
print(np.arange(0, 10))  

# 0 to 9 with step of 2
print(np.arange(0, 10, 2))  

# 10 to 1, decreasing order
print(np.arange(10, 0, -1))



[0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[10  9  8  7  6  5  4  3  2  1]


You can set the starting and end positions using np.arange. But if you are focussed on the number of items in the array you will have to manually calculate the appropriate step value.

Say, you want to create an array of exactly 10 numbers between 1 and 50, use linspace function

In [34]:
np.linspace(start=1, stop=50, num=10, dtype=int)

#because it is int, the numbers aren't equally spaced. We can use float too.

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

In [35]:
np.linspace(start=1, stop=50, num=10, dtype=float)

array([ 1.        ,  6.44444444, 11.88888889, 17.33333333, 22.77777778,
       28.22222222, 33.66666667, 39.11111111, 44.55555556, 50.        ])

#The np.zeros and np.ones functions lets you create arrays of desired shape where all the items are either 0’s or 1’s.

In [36]:
np.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [37]:
np.ones([2,3])

array([[1., 1., 1.],
       [1., 1., 1.]])

#How to create repeating sequences?
np.tile will repeat a whole list or array n times. Whereas, np.repeat repeats each item n times.

In [38]:
a = [1,2,3] 

# Repeat whole of 'a' two times
print('Tile:   ', np.tile(a, 2))

# Repeat each element of 'a' two times
print('Repeat: ', np.repeat(a, 2))

Tile:    [1 2 3 1 2 3]
Repeat:  [1 1 2 2 3 3]


#How to generate random numbers?
The random module provides nice functions to generate random numbers (and also statistical distributions) of any given shape.

#But why random numbers?

Randomness is a big part of machine learning.

Randomness is used as a tool or a feature in preparing data and in learning algorithms that map input data to output data in order to make predictions.

#Randomness in Machine Learning
There are many sources

Randomness is used as a tool to help the learning algorithms be more robust and ultimately result in better predictions and more accurate models.

##Randomness in Data

There is always a random element to the sample of data that we us in ML 

The data may have mistakes or errors.

More deeply, the data contains noise that can obscure the crystal-clear relationship between the inputs and the outputs.

##Randomness in Evaluation

We work with only a small sample of the data. Therefore, we harness randomness when evaluating a model, such as using k-fold cross-validation to fit and evaluate the model on different subsets of the available dataset.

We do this to see how the model works on average rather than on a specific set of data.

##Randomness in Algorithms
Machine learning algorithms use randomness when learning from a sample of data.

This is a feature, where the randomness allows the algorithm to Algorithms that use randomness are often called stochastic algorithms rather than random algorithms. This is because although randomness is used, the resulting model is limited to a more narrow range, e.g. like limited randomness.

Some clear examples of randomness used in machine learning algorithms include:

The shuffling of training data prior to each training epoch in stochastic gradient descent.
The random subset of input features chosen for spit points in a random forest algorithm.
The random initial weights in an artificial neural network.achieve a better performing mapping of the data than if randomness was not used. Randomness is a feature, which allows an algorithm to attempt to avoid overfitting (over fitting) the small training set and generalize to the broader problem.

#Pseudorandom Number Generators
The source of randomness that we inject into our programs and algorithms is a mathematical trick called a pseudorandom number generator.

We do not need true randomness in machine learning. Instead we can use pseudorandomness. Pseudorandomness is a sample of numbers that look close to random, but were generated using a deterministic process.

Shuffling data and initializing coefficients with random values use pseudorandom number generators. These little programs are often a function that you can call that will return a random number. Called again, they will return a new random number. Wrapper functions are often also available and allow you to get your randomness as an integer, floating point, within a specific distribution, within a specific range, and so on.

The numbers are generated in a sequence. The sequence is deterministic and is seeded with an initial number. If you do not explicitly seed the pseudorandom number generator, then it may use the current system time in seconds or milliseconds as the seed.

The value of the seed does not matter. Choose anything you wish. What does matter is that the same seeding of the process will result in the same sequence of random numbers.



In [0]:
#Syntax - np.random.rand((array dim))
#either use this 
from numpy.random import rand   
print(rand(2,2))

#or
print(np.random.rand(2,2))


In [41]:
# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

# Normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))

# Random integers between [0, 10) of shape 2,2
print(np.random.randint(0, 10, size=[2,2]))

# One random number between [0,1)
print(np.random.random())

# Random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))

# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

[[0.2315273  0.00553812]
 [0.04391763 0.40865781]]
[[ 0.98213739  0.14718366]
 [-1.5411265  -1.82583622]]
[[2 4]
 [8 9]]
0.3325283474714408
[[0.9323251  0.52738211]
 [0.06254249 0.00900818]]
['a' 'e' 'e' 'a' 'i' 'e' 'e' 'a' 'u' 'e']
['o' 'a' 'o' 'u' 'o' 'o' 'u' 'i' 'o' 'o']


Now, everytime we run any of the above functions, you get a different set of random numbers.

If you want to repeat the same set of random numbers every time, you need to set the seed or the random state. The seed can be any value. The only requirement is you must set the seed to the same value every time you want to generate the same set of random numbers.

#How to get the unique items and the counts?
The np.unique method can be used to get the unique items. If you want the repetition counts of each item, set the return_counts parameter to True.

In [42]:
# Create random integers of size 10 between [0,10)
np.random.seed(100)
arr_rand = np.random.randint(0, 10, size=10)
print(arr_rand)


[8 8 3 7 7 0 4 2 5 2]


In [43]:
# Get the unique items and their counts
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items : ", uniqs)
print("Counts       : ", counts)


Unique items :  [0 2 3 4 5 7 8]
Counts       :  [1 2 1 1 1 2 2]
