<div>
    <center><img src="https://i.imgur.com/mqPVRT5.png"></center>
    </div>

<center><h1>Introduction 📝</h1></center>

> 🎯Goal: To build a model that predicts which items are the same products
> 
> As a shopaholic🛍️ , I admit getting the best deals for products is a very rewarding experience. Scanning through multiple shopping websites to get the perfect deal and keeping an eye on upcoming sales is one manual way to go about.
> 
> We often find retail companies offering recommendations in which they promote their products in such a way that customers tend to get swayed and pick a similar product that is priced lower. Product matching 📋📋 is one of these strategies wherein a company to offers products at rates that are competitive to the same product sold by another retailer. 
> 
> These matches can be performed automatically with the help of machine learning and that is the goal of this competition. We have been provided with data of **Shopee**, which is the leading e-commerce platform in Southeast Asia and Taiwan. 

<center><h1>Diving into the Data 🤿 </h1></center>

> **train/test.csv** - Each row contains the data for a single posting. 
> 
> ℹ️Multiple postings might have the exact same image ID, but with different titles or vice versa.
> 
> - posting_id : the ID code for the posting
> - image : the image id/md5sum
> - image_phash : a perceptual hash of the image
> - title : the product description for the posting
> - label_group : ID code for all postings that map to the same product. Not provided for the test set
> - matches - **Space delimited** list of all posting IDs that match a particular posting. 
> 
> 📌Posts always self-match. 
> 
> 📌**Group sizes were capped at 50**, so we need not predict more than 50 matches for a posting.

<h1><center>Evaluation metric: <b>F1-score 🧪</b> </center></h1>

> The evaluation metric for this competition is F1-Score or F-Score.
> 
> <center><img src="https://www.gstatic.com/education/formulas2/355397047/en/f1_score.svg"></center>
> 
>  It finds the balance between precision and recall.
>  <center><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/d37e557b5bfc8de22afa8aad1c187a357ac81bdb"></center>
>  <center><img src="https://miro.medium.com/max/560/1*AEV3TE67ahMn3NVpU0ov4g.png" height=10></center>
>  
>  where-
>  - TP = True Positive
>  - FP = False Positive
>  - TN = True Negative
>  - FN = False Negative

In [None]:
from IPython.core.display import display, HTML, Javascript

def nb():
    styles = open("../input/intermediate-notebooks-data/custom-orange.css", "r").read()
    return HTML("<style>"+styles+"</style>")
nb()

<center><h1>Import Libraries 📚</h1></center>

In [None]:
import os
import numpy as np 
import pandas as pd 
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import cuml, cudf, cupy
import nltk
import tensorflow as tf
import wandb

from pandas import DataFrame
from sklearn.feature_extraction.text import CountVectorizer as CV
from nltk.corpus import stopwords
from cuml.feature_extraction.text import CountVectorizer
from cuml.neighbors import NearestNeighbors
from colorama import Fore, Back, Style
from wordcloud import WordCloud,STOPWORDS
from tensorflow.keras.applications import ResNet101
from PIL import Image

nltk.download('stopwords')

# colored output
y_ = Fore.YELLOW
r_ = Fore.RED
g_ = Fore.GREEN
b_ = Fore.BLUE
m_ = Fore.MAGENTA

<center><img src="https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67"></center>

I will be integrating ```W&B``` for ```visualizations``` and ```logging artifacts```!

[Shopee Project on W&B Dashboard](https://wandb.ai/ruchi798/shopee?workspace=user-ruchi798) 🏋️‍♀️

* To get the API key, an account is to be created on the website first.
* Next, use secrets to use API Keys more securely🤫

In [None]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("api_key")

os.environ["WANDB_SILENT"] = "true"

In [None]:
! wandb login $api_key

<center><h1>Reading csv files 📖</h1></center>

In [None]:
train_df = pd.read_csv("../input/shopee-product-matching/train.csv")
test_df = pd.read_csv("../input/shopee-product-matching/test.csv")

In [None]:
train_df.head()

In [None]:
test_df

<center><h1>Getting image paths from the directory 🛣️</h1></center>

In [None]:
# specifying directory paths

train_jpg_directory = '../input/shopee-product-matching/train_images/'
test_jpg_directory = '../input/shopee-product-matching/test_images/'

In [None]:
# function to get image paths from train and test directory

def getImagePaths(path):
    image_names = []
    for dirname, _, filenames in os.walk(path):
        for filename in filenames:
            fullpath = os.path.join(dirname, filename)
            image_names.append(fullpath)
    return image_names

In [None]:
train_images_path = getImagePaths(train_jpg_directory)
test_images_path = getImagePaths(test_jpg_directory)

Number of images in each directory

In [None]:
print(f"{y_}Number of train images: {g_} {len(train_images_path)}\n")
print(f"{y_}Number of test images: {g_} {len(test_images_path)}\n")

Checking if images in each directory have the same shape

In [None]:
def getShape(images_paths):
    shape = cv2.imread(images_paths[0]).shape
    for image_path in images_paths:
        image_shape=cv2.imread(image_path).shape
        if (image_shape!=shape):
            return "Different image shape"
        else:
            return "Same image shape " + str(shape)

In [None]:
getShape(train_images_path)

In [None]:
getShape(test_images_path)

<center><h1>Displaying images 📷 </h1></center>

In [None]:
# function to display multiple images

def display_multiple_img(images_paths, rows, cols,title):
    
    figure, ax = plt.subplots(nrows=rows,ncols=cols,figsize=(16,8))
    plt.suptitle(title, fontsize=20)
    for ind,image_path in enumerate(images_paths):
        image = cv2.imread(image_path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) 
        try:
            ax.ravel()[ind].imshow(image)
            ax.ravel()[ind].set_axis_off()
        except:
            continue;
    plt.tight_layout()
    plt.show()

In [None]:
display_multiple_img(train_images_path[0:25], 5, 5,"Train images")

In [None]:
display_multiple_img(test_images_path, 1, 3,"Test images")

<center><h1>Colour Histograms 🎨</h1></center>

In [None]:
def styling():
    for spine in plt.gca().spines.values():
        spine.set_visible(False)
        plt.xticks([])
        plt.yticks([])

In [None]:
def hist(image_path):
    plt.figure(figsize=(16, 3))
    
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    
    plt.subplot(1, 5, 1)
    plt.imshow(img)
    styling()
    
    custom_colors = ["#ef233c", "#76da71", "#2667ff","#aea3b0"]
    labels = ['Red Channel', 'Green Channel', 'Blue Channel','Total']
    
    for i in range(1,4):
        plt.subplot(1, 5, i+1)
        plt.hist(img[:, :, i-1].reshape(-1),bins=64,color=custom_colors[i-1],alpha = 0.6)
        plt.xlabel(labels[i-1],fontsize=10)
        styling()
        
    plt.subplot(1, 5, 5)
    plt.hist(img.reshape(-1),bins=128,color=custom_colors[3],alpha = 0.6)
    plt.xlabel(labels[3],fontsize=10)
    styling()
    plt.show()

In [None]:
def display_hist(images_paths):
        for ind,image_path in enumerate(images_paths):
            if (ind<6):
                hist(image_path)

In [None]:
display_hist(train_images_path[5:10])

In [None]:
display_hist(test_images_path)

**Visualizing and querying the dataset** with W&B 🏋️‍♀️

[Documentation](https://docs.wandb.ai/datasets-and-predictions)

In [None]:
# initializing the run
run = wandb.init(project="shopee",
                 job_type="upload",
                 config={
                     "num_examples" : 8
                 })

# creating an artifact 
artifact = wandb.Artifact(name="histograms", type="raw_data")

# setting up a WandB Table object to hold the dataset
columns=["id", "raw image", "red channel","green channel","blue channel","label"]

table = wandb.Table(
    columns=columns
)

# filling up the table
images_train = [f for f in train_images_path[5:10]]
images_test = test_images_path

all_images = images_train + images_test
labels = ["train","train","train","train","train","test","test","test"]

for ndx in range(wandb.config.num_examples):
    img_file = all_images[ndx]
    train_id = img_file.split("/")[4].split(".")[0]
    
    img = cv2.imread(img_file)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 

  # raw image
    raw_img = wandb.Image(img_file)
    
  # plotting histograms 
    wb_color = ["#ef233c","#76da71","#2667ff"]
    def wb_hist(i):
        plt.figure(figsize=(16, 10))
        plt.hist(img[:, :, i-1].reshape(-1),bins=64,color=wb_color[i-1],alpha = 0.6)
        return wandb.Image(plt)
    red = wb_hist(1)
    green = wb_hist(2)
    blue = wb_hist(3)
    
    # adding an artifact file
    artifact.add_file(img_file, os.path.join("images", train_id + "_train_id.png"))

  # adding a row to the table
    row = [train_id, raw_img, red,green,blue,labels[ndx]]
    table.add_data(*row)
    
# adding the table to the artifact
artifact.add(table, "raw_examples")
    
# logging the artifact
run.log_artifact(artifact)

run.finish()

This is a snapshot of the table I just created and added to an artifact.

![](https://i.imgur.com/oF7CloS.png)

**Different versions** of the artifacts can be stored in W&B.

**Comparison of any two artifact versions** in the table is possible. 

Here I'm comparing the ```highlighted versions``` in the left sidebar, i.e., ```v1``` with ```v2``` in a split panel view ⬇️

We can see values from both artifact versions in a single table. 

![](https://i.imgur.com/9AVDPjQ.png)

We can even specify filters on any column to **limit the visible rows down to only rows that match**. 

Here I've filtered the table to see only the ```test images```.

![](https://i.imgur.com/8eTOBbX.png)

We have 11014 unique label groups for products.

In [None]:
train_df['label_group'].nunique()

In [None]:
train_labels_count = train_df['label_group'].value_counts()

# getting count for most frequent and least frequent label groups
most_freq = train_labels_count[train_labels_count == train_labels_count.max()]
less_freq = train_labels_count[train_labels_count == train_labels_count.min()]

# getting most frequent and least frequent label groups
m_label = np.unique(train_df['label_group'][train_df['label_group'].isin(most_freq.index)].values)
l_label = np.unique(train_df['label_group'][train_df['label_group'].isin(less_freq.index)].values)

print(f"{m_} Most frequent label group: ", m_label)
print(f"{y_} Less frequent label group: ", l_label)

**Logging a dictionary of custom objects** 🏋️‍♀️

In [None]:
run = wandb.init(project='shopee', name='count')

mw = train_labels_count.max()
lw = train_labels_count.min()
uw = train_df['label_group'].nunique()

wandb.log({'Unique label groups': uw, 
           'Most frequent label groups': mw, 
           'Least frequent label groups': lw})

run.finish()

In [None]:
def path(group,m):
    PATH = "../input/shopee-product-matching/train_images/"
    
    #label
    if m=='l':
        z = train_df['image'][train_df['label_group']==group].values
    
    #title
    if m=='t':
        z = train_df['image'][train_df['title']==group].values
   
    image_names = []
    for filename in z:
        fullpath = os.path.join(PATH, filename)
        image_names.append(fullpath)
    return image_names

<center><h1>Most frequent label groups 📈</h1></center>

In [None]:
lg = 159351600
display_multiple_img(path(lg,'l'), 3, 3,lg)

In [None]:
lg = 994676122
display_multiple_img(path(lg,'l'), 3, 3,lg)

In [None]:
lg = 562358068
display_multiple_img(path(lg,'l'), 3, 3,lg)

<center><h1>Least frequent label groups 📉</h1></center>

In [None]:
lg = 297977
display_multiple_img(path(lg,'l'), 1, 2,lg)

In [None]:
lg = 887886
display_multiple_img(path(lg,'l'), 1, 2,lg)

In [None]:
lg = 4293276364
display_multiple_img(path(lg,'l'), 1, 2,lg)

In [None]:
train_df.shape

Since the shape of the training dataframe and number of unique titles differ, we can infer that we have images with the same title.

In [None]:
train_df['title'].nunique()

In [None]:
t = train_df['title'].value_counts().sort_values(ascending=False).reset_index()
t.columns = ['title','count']
t

<center><h1>Images with the same title 🦉🦉</h1></center>

In [None]:
img_title = "Koko syubbanul muslimin koko azzahir koko baju"
display_multiple_img(path(img_title,'t'), 3, 3,img_title)

In [None]:
img_title = "Baju Koko Pria Gus Azmi Syubbanul Muslimin Kombinasi Hadroh Azzahir Hilw HO187 KEMEJA KOKO PRIA BAJU"
display_multiple_img(path(img_title,'t'), 4, 2, img_title)

In [None]:
img_title = "Monde Boromon Cookies 1 tahun+ 120gr"
display_multiple_img(path(img_title,'t'), 2, 3, img_title)

> **Observations from EDA**📝:
> 
> * Visually similar images in different label groups
> * Same images with different titles
> * Same titles have different images

<center><h1>Wordcloud of image titles ☁️</h1></center>

**Logging an image** of the wordcloud of image titles🏋️‍♀️

In [None]:
# color function for the wordcloud
def color_wc(word=None,font_size=None,position=None, orientation=None,font_path=None, random_state=None):
    h = int(360.0 * 21.0 / 255.0)
    s = int(100.0 * 255.0 / 255.0)
    l = int(100.0 * float(random_state.randint(80, 120)) / 255.0)
    return "hsl({}, {}%, {}%)".format(h, s, l)


run = wandb.init(project='shopee', job_type='image-visualization',name='wordCloud')

fig = plt.gcf()
fig.set_size_inches(16, 8)

wc = WordCloud(stopwords=STOPWORDS,background_color="white", contour_width=2, contour_color='orange',width=1500, height=750,color_func=color_wc,max_words=150, max_font_size=256,random_state=42)
wc.generate(' '.join(train_df['title']))
fig = plt.imshow(wc, interpolation="bilinear")
fig = plt.axis('off')

wandb.log({"wordcloud": [wandb.Image(plt, caption="Wordcloud")]})
run.finish()

run

<center><h1>Unigrams, bigrams and trigrams 🔢 </h1></center>

In [None]:
def get_top_n_words(corpus, n=None):
    vec = CV().fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]

def get_top_n_bigram(corpus, n=None):
    vec = CV(ngram_range=(2, 2)).fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]


def get_top_n_trigram(corpus, n=None):
    vec = CV(ngram_range=(3, 3)).fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]

In [None]:
def plot_bt(x,w,p):
    common_words = x(train_df['title'], 20)
    common_words_df = DataFrame (common_words,columns=['word','freq'])

    plt.figure(figsize=(16, 10))
    sns.barplot(x='freq', y='word', data=common_words_df,palette=p)
    plt.title("Top 20 "+ w , fontsize=16)
    plt.xlabel("Frequency", fontsize=14)
    plt.yticks(fontsize=13)
    plt.xticks(rotation=45, fontsize=13)
    plt.ylabel("");
    return common_words_df

In [None]:
common_words = get_top_n_words(train_df['title'], 20)
common_words_df1 = DataFrame(common_words,columns=['word','freq'])
plt.figure(figsize=(16, 8))
ax = sns.barplot(x='freq', y='word', data=common_words_df1,palette='Oranges')

plt.title("Top 20 unigrams", fontsize=16)
plt.xlabel("Frequency", fontsize=14)
plt.yticks(fontsize=13)
plt.xticks(rotation=45, fontsize=13)
plt.ylabel("");

common_words_df2 = plot_bt(get_top_n_bigram,"bigrams",'BuGn')
common_words_df3 = plot_bt(get_top_n_trigram,"trigrams",'RdPu')

**Logging custom bar charts** for unigrams, bigrams and trigrams🏋️‍♀️

In [None]:
def plot_wb(df, name, title): 
    run = wandb.init(project='shopee', job_type='image-visualization',name=name)

    labels = df.sort_values('freq', ascending=False).word
    values = df.sort_values('freq', ascending= False).freq
    dt = [[label, val] for (label, val) in zip(labels, values)]
    table = wandb.Table(data=dt, columns = ["Word", "Frequency"])
    wandb.log({name : wandb.plot.bar(table, "Word", "Frequency",title=title)})

    run.finish()
    
plot_wb(common_words_df1, "unigrams","Top 20 unigrams")
plot_wb(common_words_df2, "bigrams","Top 20 bigrams")
plot_wb(common_words_df3, "trigrams","Top 20 trigrams")

<center><h1>Plugging in RAPIDS 🏃‍♀️ </h1></center>
<center><img src="https://i.imgur.com/qWulN0F.jpg" height=40></center>

In [None]:
train_df_c = cudf.from_pandas(train_df)

In [None]:
train_df_c


<center><h3>Pre-processing title ✂️</h3></center>

In [None]:
STOPWORDS = nltk.corpus.stopwords.words('english')

punctuation = [ '!', '"', '#', '$', '%', '&', '(', ')', '*', '+', '-', '.', '/',  '\\', ':', ';', '<', '=', '>',
           '?', '@', '[', ']', '^', '_', '`', '{', '|', '}', '\t','\n',"'",",",'~' , '—']

def text_preprocessing(input_text, filters=None, stopwords=STOPWORDS):
    # filter punctuation 
    translation_table = {ord(char): ord(' ') for char in filters}
    input_text = input_text.str.translate(translation_table)
    
    #convert to lower case
    input_text = input_text.str.lower()
        
    # remove stopwords 
    stopwords_gpu = cudf.Series(stopwords)
    input_text =  input_text.str.replace_tokens(stopwords_gpu, ' ')
        
    # normalize spaces
    input_text = input_text.str.normalize_spaces( )
    
    # strip leading and trailing spaces
    input_text = input_text.str.strip(' ')
    
    return input_text

def preprocess_df(df, col, **kwargs):
    df[col] = text_preprocessing(df[col], **kwargs)
    return  df

%time 
df = preprocess_df(train_df_c,'title', filters=punctuation)

train_df_c.head(5)

In [None]:
train_df_c.to_csv("title_preprocessed_dataset.csv")

**Logging** the preprocessed title dataset as **an artifact**🏋️‍♀️

In [None]:
# run = wandb.init(project='shopee', name='title_preprocessed')

# artifact = wandb.Artifact('title_preprocessed_dataset', type='dataset')

# add a file to the artifact's contents
# artifact.add_file("title_preprocessed_dataset.csv")

# save the artifact version to W&B and mark it as the output of this run
# run.log_artifact(artifact)

# run.finish()

<center><h3>CountVectorizer for Feature Extraction 📐</h3></center>

<img src="https://i.imgur.com/1bEOBR1.png">

[Documentation](https://docs.rapids.ai/api/cuml/nightly/api.html#cuml.feature_extraction.text) 📖

In [None]:
vec = CountVectorizer(stop_words='english', binary=True)
%time X = vec.fit_transform(train_df_c.title).toarray()

<center><h3>Titles with similar text 🦉🦉</h3></center>

In [None]:
n = 50
knn = NearestNeighbors(n_neighbors=n)
knn.fit(X)
distances, indices = knn.kneighbors(X)

In [None]:
for k in range(5):
    plt.figure(figsize=(20,3))
    plt.plot(np.arange(50),cupy.asnumpy(distances[k,]),'o-',color='#f48c06')
    plt.title('Text Distance From Train Row %i to Other Train Rows'%k,fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
    plt.ylabel('Distance to Train Row %i'%k,fontsize=13, fontweight='bold',fontfamily='serif')
    plt.xlabel('Index Sorted by Distance to Train Row %i'%k,fontsize=13, fontweight='bold',fontfamily='serif')
    plt.show()
    
    print( train_df_c.loc[cupy.asnumpy(indices[k,:10]),['title','label_group']] )

<center><h3>Similar Images🦉🦉</h3></center>

In [None]:
class DataGenerator(tf.keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, df, img_size=256, batch_size=32, path=train_jpg_directory): 
        self.df = df
        self.img_size = img_size
        self.batch_size = batch_size
        self.path = path
        self.indexes = np.arange(len(self.df))
        
    def __len__(self):
        'Denotes the number of batches per epoch'
        ct = len(self.df) // self.batch_size
        ct += int(((len(self.df)) % self.batch_size)!=0)
        return ct

    def __getitem__(self, index):
        'Generate one batch of data'
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        X = self.__data_generation(indexes)
        return X
            
    def __data_generation(self, indexes):
        'Generates data containing batch_size samples' 
        X = np.zeros((len(indexes),self.img_size,self.img_size,3),dtype='float32')
        df = self.df.iloc[indexes]
        for i,(index,row) in enumerate(df.iterrows()):
            img = cv2.imread(self.path+row.image)
            X[i,] = cv2.resize(img,(self.img_size,self.img_size))
        return X

In [None]:
model = ResNet101(weights='imagenet',include_top=False, pooling='avg', input_shape=None)
train_gen = DataGenerator(train_df, batch_size=128)

In [None]:
# ie = model.predict(train_gen,verbose=1)
# np.save("image_embedding_val.npy", ie)

**Logging** the image embeddings as **an artifact**🏋️‍♀️

This helps me to save on time since the model need not be trained over and over again🥳

In [None]:
# run = wandb.init(project='shopee', name='image_embedding_val')

# artifact = wandb.Artifact(name='image_embedding_val', type='dataset')

# Add a file to the artifact's contents
# artifact.add_file("image_embedding_val.npy")

# Save the artifact version to W&B and mark it as the output of this run
# run.log_artifact(artifact)

# run.finish()

A snapshot of the newly created artifacts ⬇️

![](https://i.imgur.com/sn9xTWx.png)

Since I have already logged the image embeddings artifact, I can directly use it in this manner ⬇️

In [None]:
run = wandb.init()

# query W&B for an artifact and mark it as input to this run
artifact = run.use_artifact('ruchi798/shopee/image_embedding_val:v0', type='dataset')

# download the artifact's contents
artifact_dir = artifact.download()

In [None]:
path = os.path.join(artifact_dir, "image_embedding_val.npy")
img_embeddings = np.load(path)

In [None]:
n = 50
knn = NearestNeighbors(n_neighbors=n)
knn.fit(img_embeddings)
distances, indices = knn.kneighbors(img_embeddings)

In [None]:
ROWS=2
COLS=4
for c in range(75,85):
    print("Cluster ",c)  
    t = train_df.loc[cupy.asnumpy(indices[c,:8])]   
    for k in range(ROWS):
        plt.figure(figsize=(20,5))
        for j in range(COLS):
            row = COLS*k + j
            name = t.iloc[row,1]
            img = cv2.imread(train_jpg_directory+name)
            
            #converting from BGR to RGB
            img = img[:, :, ::-1]
            
            plt.subplot(1,COLS,j+1)
            plt.axis('off')
            plt.imshow(img)
        plt.show()

Here's a snapshot of my [project](https://wandb.ai/ruchi798/shopee?workspace=user-ruchi798) ⬇️

![](https://i.imgur.com/PYEnRRo.png)

References 📜
- [RAPIDS cuML TfidfVectorizer and KNN](https://www.kaggle.com/cdeotte/rapids-cuml-tfidfvectorizer-and-knn)
- [A very detailed explanation for data generation](https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly)

Inspiration 💡
- [Custom Jupyter Notebook Theme with plain CSS](https://medium.com/@formigone/my-first-custom-theme-for-jupyter-notebook-a9c1e69efdfe) 🎨

Illustrations tools ⚡
- [Canva](https://www.canva.com/en_gb/) 🖌️

<img src="https://i.imgur.com/pl3FhXV.png">