# NumPy
What is NumPy? <br>
A python library used to deal with large multidimensional arrays. <br>


In [4]:
import numpy as np

In [5]:
#Code along with us
#x=np.array([9,23,32,47,53])
#print(x, type(x))

Bro, we already have Lists and Tuples right? Why NumPy? <br>
Let's compare

In [6]:
N = 100000

Let's create a list with N entries containing integers from 0 to N-1 and multiply it element-wise with itself.<br> %%time will time the operation.

In [7]:
%%time
list_ = list(range(N))
for i in range(N):
    list_[i] = list_[i] * list_[i]

CPU times: user 10.7 ms, sys: 860 µs, total: 11.6 ms
Wall time: 11.2 ms


Now, let's replicate this process using NumPy. Observe the simplicity in syntax and the latency

In [8]:
%%time
arr = np.arange(N)
arr = arr * arr

CPU times: user 414 µs, sys: 557 µs, total: 971 µs
Wall time: 524 µs


## Creating NumPy arrays

In [9]:
# Creating an array
arr = np.arange(12)
# Datatype of entries in np array
print(arr.dtype)
# Dimension of np array
print(arr.ndim)
# Shape of array
print(arr.shape)
# Size of array
print(arr.size)

int64
1
(12,)
12


Let's create a 2 Dimensional array

In [10]:
# Code along with us
arr = np.random.randn(2,3)
arr

array([[ 0.51273553, -0.43412329,  1.24507346],
       [ 0.38858017, -0.88240967, -0.25646585]])

## Understanding dimension and shape of an array
Try making a 3 Dimensional array. How about a 4 dimensional array? It's getting more complicated? <br>
No worries, we have NumPy functions to create some basic arrays of required dimensions

In [11]:
# Create a 2x3x4 array with ones
np.ones((2,3,4))

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In [12]:
# Create a 10x3 array with zeros
#Fill the code
arr2 = np.zeros((10,3))
arr2

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Let's first define a **function** that gives all info about nature of array

In [13]:
# Function code
def array_info(arr):
    print("dtype is",arr.dtype)
    print("shape is",arr.shape)
    print("Dimension is",arr.ndim)
    print("Size is",arr.size,"\n")  #\n to separate the text in next function call with a new line char


In [14]:
# Calling the function
array_info(arr2)

dtype is float64
shape is (10, 3)
Dimension is 2
Size is 30 



Observe the difference between the following arrays

In [15]:
a = np.ones((6))
b = np.ones((6,1))
c = np.ones((2,3))
array_info(a)
array_info(b)
array_info(c)

dtype is float64
shape is (6,)
Dimension is 1
Size is 6 

dtype is float64
shape is (6, 1)
Dimension is 2
Size is 6 

dtype is float64
shape is (2, 3)
Dimension is 2
Size is 6 



Try creating different sizes of arrays, play around, get the feel

## Reshaping and Flattening

In [16]:
a.reshape(3,2)

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

Try converting arr2 to a one dimensional array without changing the elements using np.reshape()

In [17]:
# Code with us
c.reshape((6))

array([1., 1., 1., 1., 1., 1.])

## Random normal and random uniform in Numpy
Ever heard of normal and uniform distribution? <br>
*   Normal distribution is a probability distribution of a random variable, which has bell shaped curve<br> 
*   In uniform distribution, probability of choosing any number at random is equally likely. <br>
*   These probability distributions are extremely useful in a broad spectrum of engineering disciplines. We will look more about this in later part of the session

In [18]:
# Creating 2x3 array whose elements are randomly sampled from a normal distribution with mean =0 and std = 1
np.random.randn(2,3)

array([[ 1.53667914, -0.5438378 ,  1.40583565],
       [-0.03787313,  0.37305652, -1.22114828]])

In [19]:
# Creating 2x3 array whose elements are randomly sampled from a uniform distribution of [0,1)
np.random.rand(2,3)

array([[0.31447763, 0.92131061, 0.21933918],
       [0.50535635, 0.46860722, 0.31327651]])

np.random.randint samples only integers, try creating one

In [20]:
np.random.randint?

[0;31mDocstring:[0m
randint(low, high=None, size=None, dtype=int)

Return random integers from `low` (inclusive) to `high` (exclusive).

Return random integers from the "discrete uniform" distribution of
the specified dtype in the "half-open" interval [`low`, `high`). If
`high` is None (the default), then results are from [0, `low`).

.. note::
    New code should use the ``integers`` method of a ``default_rng()``
    instance instead; please see the :ref:`random-quick-start`.

Parameters
----------
low : int or array-like of ints
    Lowest (signed) integers to be drawn from the distribution (unless
    ``high=None``, in which case this parameter is one above the
    *highest* such integer).
high : int or array-like of ints, optional
    If provided, one above the largest (signed) integer to be drawn
    from the distribution (see above for behavior if ``high=None``).
    If array-like, must contain integer values
size : int or tuple of ints, optional
    Output shape.  If the given

## Other Data types in NumPy array

In [21]:
# Create an array with Boolean entries
arr_bool = np.array([True, False, True, True])
print(arr_bool, type(arr_bool))


[ True False  True  True] <class 'numpy.ndarray'>


In [22]:
# Create an array with str type elements
arr_str = np.array(['1.4','3.14','6.314'])


Can I use alphabetical characters as elements of NumPy array?

## NumPy Operations

In [23]:
arr1 = np.random.randint(0,10,(2,3))
arr2 = np.random.randint(0,10,(2,3))
print(arr1)
print(arr2)

[[5 2 4]
 [8 6 3]]
[[6 4 3]
 [4 1 3]]


In [24]:
# Addition
print(arr1+arr2)
# Subtraction
print(arr1-arr2)
# Multiplication
print(arr1*arr2)    #Notice that it multiplies element-wise
# Division (Element-wise)
print(arr1/arr2)

[[11  6  7]
 [12  7  6]]
[[-1 -2  1]
 [ 4  5  0]]
[[30  8 12]
 [32  6  9]]
[[0.83333333 0.5        1.33333333]
 [2.         6.         1.        ]]


Let's look at element-wise division more detailly

In [25]:
arr = np.zeros((2,3))
arr_inv = 1/arr

  arr_inv = 1/arr


In [26]:
print(arr_inv)

[[inf inf inf]
 [inf inf inf]]


We did get a **run-time error**

In [27]:
np.isinf(arr_inv)

array([[ True,  True,  True],
       [ True,  True,  True]])

In lists, we say that we had to iterate through all the elements to increment each value by some value. NumPy makes our life easy

In [28]:
print(2*arr1 + 2)

[[12  6 10]
 [18 14  8]]


To learn more how this works, browse for **NumPy broadcasting**

Some important mathematical operations. Try running the code below

In [29]:
np.sin(arr1)

array([[-0.95892427,  0.90929743, -0.7568025 ],
       [ 0.98935825, -0.2794155 ,  0.14112001]])

In [30]:
np.exp(arr1)

array([[ 148.4131591 ,    7.3890561 ,   54.59815003],
       [2980.95798704,  403.42879349,   20.08553692]])

## Statistical operations

In [31]:
# Intialise an NumPy array with some vals
arr = np.random.randint(0,50,(3,4))

In [32]:
np.amin(arr)

4

In [33]:
np.amax(arr)

49

In [34]:
np.mean(arr)

23.833333333333332

In [35]:
#axis 0 and axis 1

Browse for NumPy functions to find median, variance, standard deviation and percentile

## Exercise Problem 1
Write a program to multiply two matrices of size $(100, 100)$ in two methods: (a) by using `np.dot(mat_1, mat_2)` and (b) by using for-loops. Comapre the time of execution in both the cases. Check out the documentation of `np.dot` in case that is not familiar to you. 

In [36]:
#Initialise the two matrices
mat_1 = np.random.rand(100,100)
mat_2 = np.random.rand(100,100)
#Intitialise the output matrix with zero


In [37]:
## Using the definition of matrix mutliplication


In [38]:
## Using np.dot function


## Exercise Problem 2
Create two vectors $y$ and $\hat{y}$ having **same** dimensions, where $\hat{y}$ should consist of random numbers between $[0, 1)$ and $y$ should contain $0s$ and $1s$, for example $y = [0, 1, 1, 0, 1, 0, 0, 1, ..., 1]$. Compute the given expression: $$O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$$
where $n$ = 100, is the total number of elements in $y$ and $\hat{y}$.

In [39]:
#Given n = 100
#Create a 1D array y, of size 100 with randomly selected 0s and 1s
#Create a 1D array y_hat, of size 100 with numbers randomly (uniformly) selected from [0,1)
#Find logarithmic loss (Also known as logarithmic loss)
n = 100
y_hat = np.random.uniform(size=n)


The expression $O = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log_2(\hat{y_i}) + (1-y_i)\log_2(1-\hat{y_i})]$, which you have computed is actually a **Cross-Entropy** loss function used in machine learning for classification task which tells us how bad or good model is performing, if $O$ is large then model is performing worst and vice versa.