# Numpy tutorial

NumPy empowers rapid computation in Python by leveraging an underlying C implementation, resulting in exceptional speed. Its primary asset is the `ndarray` object, ensuring homogeneity of data types within arrays, thereby requiring elements to be of a singular data type. This design contributes to the efficiency and performance that makes NumPy a powerful tool for numerical operations in Python.

In [2]:
import numpy as np

vector = np.array([1, 2, 3, 4])
print("Vector: {}".format(vector))
# Every array will have a shape. That is, its dimensions
print("Shape: {}".format(vector.shape))
# Print number of dimensions
print("Dim: {}".format(vector.ndim))
print("Data type: {}".format(vector.dtype))

Vector: [1 2 3 4]
Shape: (4,)
Dim: 1
Data type: int64


Numpy utilizes representations with dimensions specified as 

`(depth, rows, columns)`. 

Therefore, a 3D array characterized by 3 rows, 2 columns, and 2 depth will have the shape `(2, 3, 2)`.

In [3]:
v = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
v.shape = (2, 3, 2)
print(v)

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


In [4]:
v = np.zeros((2, 3, 2))
print(v)

[[[0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]]]


## arange

The [`arange` function](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) is similar to Python's `range` function. If the data type is not specified, in many cases it will be `np.foat64`. 

**Exercice** : create an array with integers from 0 to 10, then from 10 to 20 

In [5]:
# a = 
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [6]:
# a = 
print(a)

[10 11 12 13 14 15 16 17 18 19]


## zeros, zeros_like

[`zeros(dim)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html) will return a np.array of `dim` dimensions initialised with 0. Note that `dim` should be a `tuple`. 

[`zeros_like(array)`](https://numpy.org/doc/stable/reference/generated/numpy.zeros_like.html) will return a np.array of same dimensions as of `array` initialised with zeros. 

**Exercice** : create a (4,4) array with zeros and then an array similar to `array_example` but with zeros everywhere


In [7]:
print("Zeros")
# a = 
print("A: {}".format(a))

array_example = np.array([[3., 4.], [9, 0.]])
# b = 
print("B: {}".format(b))

Zeros
A: [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
B: [[0. 0.]
 [0. 0.]]


`ones` and `ones_like` share identical functionality, initializing arrays with ones. Similarly, `empty` and `empty_like` create numpy arrays without initialization (resulting in faster execution), defaulting to garbage values for all array elements.

**Exercice** : do the previous exercice with those functions


In [10]:
print("\nOnes")

# a = 
print("A: {}".format(a))

array_example = np.array([[3., 4.], [9, 0.]])
# b = 
print("B: {}".format(b))

print("\nEmpty")

# c = 
print("C: {}".format(c))

array_example = np.array([[3., 4.], [9, 0.]])
# d = 
print("D: {}".format(d))


Ones
A: [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
B: [[1. 1.]
 [1. 1.]]

Empty
C: [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
D: [[3. 4.]
 [9. 0.]]


## astype

The [`astype` method](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html) will convert one data type to another. Also note that `astype` will __create a new copy of the input array (even if the data type is same)__. 

**Exercice** : convert the array below into `int` type. 

In [12]:
a = np.array([1, 2, 3, 4.5, 6.7])
print("A: {}, dtype: {}".format(a, a.dtype))
# b = 
print("B: {}, dtype: {}".format(b, b.dtype))

A: [1.  2.  3.  4.5 6.7], dtype: float64
B: [1 2 3 4 6], dtype: int64


## Vectorization and vector-scalar operations

Utilizing for loops in code not only introduces error vulnerabilities but also proves inefficient. A more effective approach involves leveraging NumPy operations to replace these for loops, a technique known as vectorization.

#### Using operations on same sized arrays produce element wise operations. 

It is possible to do basic operations with numpy

**Exercice** : create additions, substractions, multiplications out of the two arrays below

In [40]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[4, 5, 6], [1, 2, 3]])

# Addition
c = 
print(c)

# Substraction
d = 
print(d)

# Multiplication
e = 
print(e)

[[5 7 9]
 [5 7 9]]
[[ 4 10 18]
 [ 4 10 18]]
[[-3 -3 -3]
 [ 3  3  3]]


Using scalars with vectors will produce element wise operations

**Exercice** : create addition, substraction, multiplication, division out of the two arrays below

In [13]:
a = 3
b = np.array([[1, 2, 3], [4, 5, 6]])

# c = 
print(c)

# d = 
print(d)

# e = 
print(e)

# f = 
print(f)

[[4 5 6]
 [7 8 9]]
[[ 2  1  0]
 [-1 -2 -3]]
[[ 3  6  9]
 [12 15 18]]
[[3.   1.5  1.  ]
 [0.75 0.6  0.5 ]]


## Slicing 

In NumPy, [slicing](https://numpy.org/doc/stable/user/basics.indexing.html#slicing-and-striding) refers to extracting a portion of an array by specifying a range of indices. It allows you to create a new view of the array without copying data, facilitating efficient access and manipulation of specific elements or subarrays. Slicing in NumPy involves using the colon (`:`) operator to define the start, stop, and step parameters for the desired selection.

You can slice by following the syntax:
```
array[start_index:end_index] 
```
For n-dimensional array, you can slice with:
```
array[start_index:end_index, start_index:end_index] 
```

Slicing NumPy arrays is similar to that of Python lists. One main distinction in Python list and NumPy array is that the slice __is not the copy, but the original array. Hence, if any operations on the slice will be reflected in the original array.__

**Exercice** : slice the array below and assign to indexes from 10 to 15 the value 5.

In [14]:
a = np.arange(20)
print(a)
### 
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 0  1  2  3  4  5  6  7  8  9  5  5  5  5  5 15 16 17 18 19]


If you want to avoid above scenario, you can use `copy()`

**Exercice**: use the function `copy()` to create a separate array

In [6]:
a = np.arange(20)
print(a)
# b = 
b = 5
# value in the original array doesn't change
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


Slicing by `:` will take the entire axis. So:
```
1. arr2d[:, 0]         Will return array of shape (3, )
2. arr2d[:, :1]        Will return array of shape (3, 1)
```

In [23]:
arr2d = np.array([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]])
arr2d[:, 0]

array([1, 4, 7])

## Boolean indexing

[Boolean indexing](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing) in NumPy involves using boolean arrays to select elements based on a specified condition. This method creates a new array containing only the elements that satisfy the given boolean condition. It provides a powerful and concise way to filter and manipulate data within NumPy arrays.

Using boolean indexing, you can use it to filter or check if any entries have any specific values. 

**Exercice** : use boolean indexing to create a boolean arrays (we call them masks) for indexing.

In [16]:
a = np.array(["Mayur", "is", "an", "awesome", "coder"])
mask = (a == "Mayur")
print(mask) # Returns boolean array

[ True False False False False]


**Exercice**: obtain an array from the array `a` that has all values except "Mayur"

In [17]:
# lists entry where value != "Mayur" 
#

array(['is', 'an', 'awesome', 'coder'], dtype='<U7')

__You can use `|` for `or` and `&` for `and` but not Python's `and`, `or` will not work with NumPy's indexing. __

**Exercice**: create booleans masks with the or and and logical conditions, on variables "Mayur" and "coder"

array([ True, False, False, False,  True])

array([False, False, False, False, False])

## Transposing

You can obtains the [transpose](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html) of your matring using `matrix.T` where `matrix` is your matrix name. 

**Exercice**: create the transposition of the matrix below

In [20]:
a = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])


[[1 2 3 4]
 [1 2 3 4]
 [1 2 3 4]]


## Matrix multiplication

In NumPy, matrix multiplication can be performed using the [`np.dot()`](https://numpy.org/doc/stable/reference/generated/numpy.dot.html) function or the `@` operator. You need to check that the shapes are compatible.

**Exercice**: do the matrix multiplication

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Matrix multiplication using np.dot() or @
result_dot = 

## Universal Functions
NumPy has variety of functions that can be applied to scalars as well as vectors. Some examples are sqrt, exp, log, log10, sin, cos, arcsin etc. 

In [21]:
a = 20
b = np.random.rand(2, 2)
print(np.exp(a))
print(np.exp(b))

485165195.4097903
[[1.20490523 1.76000102]
 [1.01048349 2.70105749]]


**Exercice**: create other examples with sqrt, log, sin

## meshgrid

One of the most useful function is [meshgrid](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html). It's used to visualize data boundaries of your classifier. What you do is train your classifier, then create a meshgrid of every pixel in the plot, and then classify the pixel. When you give the pixel a specific color according to the labelled class you can clearly visualize the boundaries. 

Using meshgrid requires three steps. 
1. Create xs (1D array)
2. Create ys (1D array)
3. Create meshgrid (2D array) which corresponds to every pixel in the graph.  

In [23]:
xs = np.linspace(1, 10, 100)
ys = np.linspace(1, 10, 100)
xx, yy = np.meshgrid(xs, ys)
# plot with xx and yy
print(xx, yy)

[[ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]
 [ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]
 [ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]
 ...
 [ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]
 [ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]
 [ 1.          1.09090909  1.18181818 ...  9.81818182  9.90909091
  10.        ]] [[ 1.          1.          1.         ...  1.          1.
   1.        ]
 [ 1.09090909  1.09090909  1.09090909 ...  1.09090909  1.09090909
   1.09090909]
 [ 1.18181818  1.18181818  1.18181818 ...  1.18181818  1.18181818
   1.18181818]
 ...
 [ 9.81818182  9.81818182  9.81818182 ...  9.81818182  9.81818182
   9.81818182]
 [ 9.90909091  9.90909091  9.90909091 ...  9.90909091  9.90909091
   9.90909091]
 [10.         10.         10.         ... 10.         10.
  10.        ]]


# where

[`np.where` is a NumPy function](https://numpy.org/doc/stable/reference/generated/numpy.where.html) that returns the indices where a specified condition is true in an array, allowing for conditional selection or assignment of values. Concretly, if you have 3 arrays x, y, and condition then, `np.where` is replacement for using:
```
if condition: 
    use x
else:
    use y
```

In [24]:
x = [0, -1, 2, 3, -4, -5]
y = [9, 3, 4, 11, 2, 3]
condition = [True, False, True, True, False, True]
np.where(x, y, condition)

array([ 1,  3,  4, 11,  2,  3])

**Exercice**: create a condition `x<y` and apply it to the previous arrays `x` and `y`

## mean, sum, std
NumPy provides variety of functions for statistical use. You can furthermore specify the axis you want to reduce. 

**Exercice**: Use the [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html), [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html) and [np.std](https://numpy.org/doc/stable/reference/generated/numpy.std.html) functions on the array below 

In [78]:
a = np.random.rand(3, 3)
###

[[ 0.47420622  0.43247366  0.93400638]
 [ 0.75673826  0.02606293  0.96672388]
 [ 0.00133784  0.89357209  0.08779815]]
0.508102157536
0.508102157536
0.376361414067
