# Vectorized operations

In this workshop you going to learn about one of the most fundamental block of datascience, called the vector operations. \
In order for this workshop to work you will need numpy to be installed

In [None]:
%pip install numpy

And also matplotlib

In [None]:
%pip install matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## What is a vector? (Tensors)
Vectors are the most fundamental data structure in data science. \
Vectors are a list of numbers, and they can be represented in many ways. \
The most common way is to represent them as a list of numbers, but they can also be represented as a matrix, or a tensor. \
In this workshop we will use the numpy library to represent vectors. \
Numpy is a library that is used for scientific computing, and it is the most common library used in data science. \

### Multi-dimensionality of vectors
Vectors usually expand into multiple dimensions and depending on the scenario they can be understood differently. \
For images vectors are represnted as a matrix with usually 3 dimension (width, height, color). \
For text vectors are represented as a matrix with usually 2 dimensions (words, sentences). \
Or the above can be also interpreted as the sentence has huge amounts of dimensions, and each word is a dimension. \


## What are vectorized operations

You can think of it as very fast matrix operations. It allows you to manipulate arrays, while still removing the performance over head of default lists. \
Here is an example:

In [None]:
def vec_add(a,b):
    return a + b # vector addition

array_1 = np.array([1,2,3,4,5])
array_2 = np.array([6,7,8,9,10])

print(vec_add(array_1, array_2))

### Why is it so "fast"?
Numpy leverages c as its backend such that runs with no overhead. It also leverages hardware acceleration, to process operations in parallel.

But there is (as always) a catch! \
Numpy arrays arent mutable, meaning once they are created you can only modify them with making a new one (very expensive operation you want to avoid it) 

### Characteristics of Numpy arrays

Numpy arrays are represented as one single continouos array in the memory.
To actually give shapes to numpy arrays (multiple dimmensions), striding is used to determine each dimensions location.
To access this metadata use the header of the array

In [None]:
print(f"Strides: {array_1.strides}")
print(f"Shape: {array_1.shape}")
print(f"Data Type: {array_1.dtype}")

multi_dim_array = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(multi_dim_array)

print(f"Strides: {multi_dim_array.strides}")
print(f"Shape: {multi_dim_array.shape}")
print(f"Data Type: {multi_dim_array.dtype}")



The strides are basically an addition in retrieveing memory data \
Above you see (20,4) where 4 is detemined by the data type 32 bits = 4 bytes and 20 is determined by the elements in the previous dimension 4*5 = 20 \
Such that to get from row 1 to row 2 you need to add 20 to the memory location of the first element of the first row. \
Changing the strides:

In [None]:
as_strided = np.lib.stride_tricks.as_strided
as_strided(multi_dim_array,(5,2),(4,20))

After changing the stripes you can see it transposed the array. \
This did not actually change the array it just changed the way we look at the "memory".
This type of manipulation allows numpy to do mutation operations insanely fast.

### What operations are available
A huge number of transforms or operations used in linear algebra is available from very basic to advanced ones.
such as:

In [None]:
array_1 = np.array([1,2,3,4,5])
array_2 = np.array([6,7,8,9,10])
# addition
print(array_1 + array_2)

# subtraction
print(array_1 - array_2)

# multiplication
print(array_1 * array_2)

# division
print(array_1 / array_2)


### Mutation operations
You can use operations like flip, transpose, swap axes. this basically just moves the strides around for more optimal operation.

In [None]:
from PIL import Image
meme = Image.open("meme.jpeg")
meme = np.array(meme)
plt.imshow(meme)

Reshaping is an other powerful operator that allows you to change the shape of the array. \
It does not change the actual representation of the array, it just changes the way we look at it, as mentioned above.
To flatten an image for example (reserving the channels) you can use the reshape operator.

In [None]:
reshaped_meme = meme.copy()
reshaped_meme = np.reshape(reshaped_meme, (meme.shape[0]*meme.shape[1], meme.shape[2]))
print(reshaped_meme.shape)

In [None]:
meme_flip = np.swapaxes(meme,0,1)
plt.imshow(meme_flip)

In [None]:
meme_transpose = np.transpose(meme, axes=(1,0,2))
plt.imshow(meme_transpose)

In [None]:
mirrored_meme = meme.copy()
mirrored_meme = np.flip(mirrored_meme, axis=1)
plt.imshow(mirrored_meme)

In [None]:
meme_color_swap = meme.copy()
meme_color_swap[:,:,:] = meme[:,:,::-1] # "walking" back on the color axis

plt.imshow(meme_color_swap)

In [None]:
upside_down_meme = meme.copy()
upside_down_meme = np.flip(upside_down_meme, axis=0)
plt.imshow(upside_down_meme)

### Indexing of numpy arrays
Indexing numpy arrays is similar to default lists, however they are way more flexible than standard list indices. \
Standard indexing:

In [None]:
print(meme.shape)
print(meme[0,0,0])


Numpy arrays also allow for slicing operations. This is great for extracting the areas of an array

In [None]:
meme_shape = meme.shape
meme_half_shape = (meme_shape[0]//4, meme_shape[1]//4, meme_shape[2])
center_meme = meme[meme_half_shape[0]:-meme_half_shape[0], meme_half_shape[1]:-meme_half_shape[1], :]
plt.imshow(center_meme)
print(center_meme.shape)

You can also specify steps in slicing by using the third element in the slicer

In [None]:
squished_meme = meme.copy()
squished_meme = squished_meme[::2,:, :]
plt.imshow(squished_meme)

In [None]:
other_way_squished_meme = meme.copy()
other_way_squished_meme = other_way_squished_meme[:, ::2, :] 
plt.imshow(other_way_squished_meme)

Einsum is an other very useful operation for reshaping (in some frameworks), and swaping axes in a numpy array.

In [None]:
meme_einsum = np.einsum("ijk->jik", meme)
plt.imshow(meme_einsum)


### Broadcasting
In numpy for most above mentioned operations require that the arrays have a certain shape such as you cant matrix mulitply (m x n) with (m x n) only (m x n) x (n x s)
Because of this reason broad casting exists.\
Broadcasting allows for a temporary change in the orientation of the array.\
Its mostly handled by numpy but sometimes it has to be handled manually \

In [None]:
# auto broadcasting
array_1 = np.array([1,2,3,4,5])

print(array_1 + 1) # creates an array with the same shape as the one above and adds 1 to each element

In [None]:
# manual broadcasting

array_1 = np.arange(0,50,1, dtype=np.int32).reshape(10,5)
print(f"Original array: \n{array_1}",)
mult_array = np.array([1,2,3,4,5])
print(f"Array to multiply with: \n{mult_array}")
result = array_1 * mult_array[None,:]
print(f"Resulting array with broadcasting: \n{result}")


Notice that the columns are mutiplied according to our second array
it is essentially the same as doing this

In [None]:
array_2 = np.arange(0,50,1, dtype=np.int32).reshape(10,5)
print(f"Original array: \n{array_1}",)
mult_array_2 = np.array([[1,2,3,4,5] for i in range(10)])
print(f"Array to multiply with: \n{mult_array_2}")
result_2 = array_2 * mult_array_2
print(f"Resulting array with broadcasting: \n{result_2}")

In [None]:
# comparing the arrays together
print(f"Are arrays equal: {np.array_equal(result, result_2)}")

## Masking
Masking is an other technique in numpy which provides ability to use a boolean array to "enable" and "disable" elements or dimensions

In [None]:
to_be_masked = np.arange(0,50,1, dtype=np.int32).reshape(10,5)
print(f"Original array: \n{to_be_masked}",)
mask = np.array([True, False, True, False, True])
print(f"Mask: \n{mask}")
masked_array = to_be_masked[:,mask]
print(f"Masked array: \n{masked_array}")


# End of workshop continue for exercises

Note: these exercises aren't mandatory if you know them otherwise it is very strongly recommended to go through them. It will also tell me how determined are you to work. \
Note: there are operations in numpy for basically everything, if you are looking for something I recommend googling it and looking at the documentation

In [None]:
def generate_chess():
    board = np.zeros((8,8,3), dtype=np.int32)
    board[::2, ::2, :] = 255
    board[1::2, 1::2, :] = 255
    return board

board = generate_chess()

plt.imshow(board)
plt.show()

Select only the black cells in the cell below with slicing, filtering, and boolean indexing

In [None]:
# your code here

Rearrange the chess board into a 1D Vector

In [None]:
# your code here

In [None]:

assert one_board.shape == (64,3)
assert np.all(one_board[0] == np.array([255,255,255]))

Transpose the chess board (Note: It probably won't look any different)

In [None]:
# your code here

Paint the most bottom right corner to pink (255,0,255) (name it to board_pk)

In [None]:
# your code here

In [None]:
plt.imshow(board_pk)

In [None]:
assert np.all(board_pk[-1,-1,:] == np.array([255,0,255])) 

Broadcast the chess board into 4D (B,H,W,C)

In [None]:
# Your code here

In [None]:
assert board_4D.shape == (1,8,8,3)

In some deep learning frameworks channel first orientation is used (eg (3,256,256) instead of (256,256,3))
Using a numpy function change the orientation of the chess board to channel first

In [None]:
# Your code here

In [None]:
assert board_channel_first.shape == (3,8,8)

### This is the end of the lab
if you got this far congratulations, and if you have any questions ask me