## Yum or Yuck Butterfly Mimics 2022 – Baseline Model

**Author:** [Keith Pinson](https://github.com/keithpinson)<br>
**Date created:** 2022/06/11<br>
**Version:** 1.0.0001<br>
**Description:** A simple transfer learning model to establish a baseline score for a Kaggle Community Competition.<br>
**Platform:** Kaggle Packages including Tensorflow 2.6.3 with GPU<br>
<br>

![Butterfly Classification Diagram](DocResources/ButterflyClassificationTransferLearning-854.png)

We will use the pre-trained weights of the Resnet50 convolutional neural network with the Imagenet dataset.  The final classification layers will be replaced with our own dense layers to make the butterfly classifier.

In [1]:
import datetime;

print("executed",datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"),"local time")

executed 2022-06-18 09:05:07 local time


---
Using TensorFlow we will a build a butterfly image classifier that will be able to identify 6 remarkable North American butterflies.

![Images of Black, Monarch, Pipevine, Spicebush, Tiger and Viceroy butterflies from the dataset](DocResources/the-butterflies.png)

<br>
This notebook will:

 - Load ResNet50 with the pre-trained Imagenet weights
 - Load the Butterfly Mimics dataset
 - Create a new set of feature weights using our own classifier
 - Predict the butterflies from the test
 - Show a sample of the results
 - Quantify the accuracy of the results



---
## Set Hyperparameters
---

In [45]:
# Hyper-parameters
BATCH_SIZE = 32
LEARNING_RATE = 0.0003
NUMBER_OF_EPOCHS = 50

# Other constants
IMAGE_SIZE_H = IMAGE_SIZE_V = 224
CHANNELS = 3
MODEL_NAME = 'yoymimics'

SEED = 43

##
---
## Set Environment
---


In [46]:
import os
import platform
import random

import tensorflow as tf

# Using TensorFlow's enhanced version of Numpy
import tensorflow.experimental.numpy as np
np.experimental_enable_numpy_behavior()

import pandas as pd
from sklearn import metrics
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import cv2

# import tensorflow_datasets as tfds
# from keras.preprocessing.image import ImageDataGenerator,load_img
from keras.utils.np_utils import to_categorical
from keras.utils.vis_utils import plot_model
from keras.models import Sequential
from keras.layers import Conv2D, \
     MaxPooling2D,Dropout,Flatten,Dense,Activation,BatchNormalization

# from tensorflow_datasets.core.registered import DatasetNotFoundError

import keras.utils.vis_utils
import tensorflow.python.ops.gen_math_ops


os.environ['PYTHONHASHSEED'] = str(SEED)
random.seed = SEED
np.random.seed = SEED
tf.random.set_seed(SEED)

print(f"Tensorflow {tf.__version__}", "with GPU support" if len(tf.config.list_physical_devices('GPU')) > 0 else "for CPU only")

Tensorflow 2.6.3 with GPU support


##
---
## Load Pre-trained Weights
---

In [47]:
from keras.applications.resnet import ResNet50

resnet50 = ResNet50(weights='imagenet', include_top=False, input_shape=(IMAGE_SIZE_H, IMAGE_SIZE_V, CHANNELS))

##
---
## Load Dataset
---

Our purpose here is to set a baseline score for classifying the images from the Butterfly Mimics 2022 Dataset.

The dataset consists of JPG images of an individual butterfly. In the training data we are given a label that identifies the class of each butterfly as either black, monarch, pipevine, spicebush, tiger, or viceroy.

![Tiger Swallowtail Butterfly](DocResources/tiger_female_dark_form_vyaa1ee082.jpg)

```python
X = vyaa1ee082.jpg # Features tensor
y = "tiger"        # Target vector
```

We will split the dataset into training and validation data. It should be noted that in order to avoid leaking any test data, we will use these splits and not the `'image_holdouts'` provided for the competition to generate our baseline scores.

The following variables will be helpful to work with the data:

 - xy_index `{'X': 0, 'y': 1}`
 - supervised_keys `['image', 'label']`
 - class_to_string `{0: 'black', 1: 'monarch', 2: 'pipevine', 3: 'spicebush', 4: 'tiger', 5: 'viceroy'}`

We will split the training data into a number of folds for cross validation:

 - training_folds
 - validation_folds

Finally, we will load the test data:

 - test_data



In [None]:
def set_data_variables():

    xy_index = {'X': 0, 'y': 1}

    supervised_keys = ['image', 'label']

    class_to_str = {0: 'black', 1: 'monarch', 2: 'pipevine', 3: 'spicebush', 4: 'tiger', 5: 'viceroy'}

    return xy_index, supervised_keys, class_to_str


xy_index, sup_keys, class2str, num_channels = set_data_variables()

In [None]:
def load_butterfly_training_folds():
    pass

train_folds, val_folds = load_butterfly_training_folds()


In [None]:
def load_butterfly_test():
    pass

test_data = load_butterfly_test()


### <u>The train/validate splits</u>

