In [None]:
%%HTML
<link rel="stylesheet" type="text/css" href="../css/custom.css">

# Hackathon time!


![footer_logo](../images/logo.png)

## Goal

- Put our Deep Learning skills to work
- Master our Keras skills
- Have fun

## Datasets

- Fashion MNIST: clothing classification
- Alternative: Food recipes classification


In [None]:
import inspect
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import random

import sys
sys.path.insert(0, '../')


%matplotlib inline

In [None]:
plt.rcParams["figure.figsize"] = 15, 6

---
## Suggested workflow

First get a baseline model working:

- Preprocessing the data (and perform a train/test split if necessary).
- Define a simple Neural Network model.
- Compile it.
- Train and monitor performance.
- Evaluate performance.

Afterwards analyse your results and see if you can improve the model:

- When does model fail? Study the mispredicted cases.
- Does performance improve with a deeper model?
- Is the model overfitting?
- How do the hyperparamter settings affect performance?
- Does data augmentation help?
- Use Tensorboard to compare models.
- Could transfer learning improve performance?

### Have fun!

---
### Dataset 1: Fashion MNIST: clothing images classification

This famous dataset contains 10 types of clothing items with thousands of pictures for each:

Label ---	Description
- 0 ---	T-shirt/top
- 1 ---	Trouser
- 2 ---	Pullover
- 3 ---	Dress
- 4 ---	Coat
- 5 ---	Sandal
- 6 ---	Shirt
- 7 ---	Sneaker
- 8 ---	Bag
- 9 ---	Ankle boot

Let's train a classifier that can distinguish them!

**Bonus questions**: 

- Which clothing items are easiest to identify?
- Which are the hardest to tell from each other? Is it what you expected?
- Use ImageDataGenerator for image augmentation

In [None]:
from tensorflow.keras.datasets import fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train.shape, y_train.shape

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_val,y_train,  y_val = train_test_split(X_train,y_train,train_size=0.8)
X_train.shape, y_train.shape, X_val.shape, y_val.shape

In [None]:
[ax.imshow(random.choice(X_train), cmap="gray") for ax in plt.subplots(1, 6)[1]];

Continue from here...

In [None]:
# %load ../answers/fashion_mnist_preprocessing.py

In [None]:
# %load ../answers/fashion_mnist_model.py

In [None]:
# %load ../answers/fashion_mnist_compiling.py

In [None]:
# %load ../answers/fashion_mnist_imagedatagenerator.py

In [None]:
# %load ../answers/fashion_mnist_training.py

In [None]:
# %load ../answers/fashion_mnist_evaluating.py

---
### Dataset 2: Food Recipes classification

This dataset contains thousands of food recipes from [Kaggle website](https://www.kaggle.com/hugodarwood/epirecipes). Each row contains information about the type of recipes and their ingredients

**Our goal** is to use this information to classify whether a recipe is vegetarian or not.

**Important tip**: you may want to get rid of very infrequent features and missing values before starting with NNs!

In [None]:
#TODO - fill in your path to the data here.
food_data = pd.read_csv(...)

print(food_data.shape)
food_data.head()

In [None]:
#separating features and the target
X = food_data.drop(columns=['title','vegetarian','vegan','salad','side'])
y = food_data['vegetarian']

In [None]:
# %load ../answers/dataset1_preprocess.py

Continue from here...