# Experimental Template
The following notebook acts as a template for experiments. The one step not included is the data cleaning phase!
<br />
<br />
I have put FIXME tags next to the areas that you will need to address.
<br />
<br />
If you are curious to know more about how the code works, look at the `helper_functions.py` file.

## Imports and Setup

In [1]:
from helper_functions import pd, os, shutil
from helper_functions import convert_samples_to_binary, get_column_data_types, print_library_versions, add_id_column

In [2]:
print_library_versions()

pandas version:           1.4.1
matplotlib version:       3.5.1
numpy version:            1.18.5
bitstring version:        3.1.9
joblib version:           1.1.0
PIL version:              8.2.0


## Data Loading and Cleaning

In [3]:
# FIXME - update the filename to point to your dataset
filename = '/mnt/sda1/iris.csv'
df = pd.read_csv(filename)

In [4]:
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


# NOTE:
The following code segments are specific to my dataset. This is where you will need to perform your own data cleaning!!!
<br />
<br />
This will be the most difficult/time intensive aspect.

In [6]:
# FIXME - change the value 'species' to match the 'y' value of your dataset
# This function will give your dataframe an 'id' column which will be used to identify the samples in the dataset
df = add_id_column(df, 'Species')
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1-setosa,5.1,3.5,1.4,0.2,Iris-setosa
1,2-setosa,4.9,3.0,1.4,0.2,Iris-setosa
2,3-setosa,4.7,3.2,1.3,0.2,Iris-setosa
3,4-setosa,4.6,3.1,1.5,0.2,Iris-setosa
4,5-setosa,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,46-virginica,6.7,3.0,5.2,2.3,Iris-virginica
146,47-virginica,6.3,2.5,5.0,1.9,Iris-virginica
147,48-virginica,6.5,3.0,5.2,2.0,Iris-virginica
148,49-virginica,6.2,3.4,5.4,2.3,Iris-virginica


In [7]:
# FIXME - Perform any data cleaning or preprocessing steps here

In [8]:
# FIXME - get the correct values in x and Y for your dataset
x = df.drop(['Species', 'Id'], axis=1)
Y = df['Species']
print(f"x {x.shape}")
print(f"Y {Y.shape}")
print(f"This value should be True: {x.shape[0] == Y.shape[0]}")

x (150, 4)
Y (150,)
This value should be True: True


## Image Generation

In [9]:
# FIXME - update the image directory to point to a directory where you want the images to be saved. Create a folder there.
# Change any other constants you want to change
image_directory = "/mnt/sda1/image-results-iris"
feature_types = get_column_data_types(x)
precision = 64
one = 128
zero = 0
n_jobs = -1
# For feature_types, 0 = float, 1 = int, 2 = bool
print(feature_types)

[0, 0, 0, 0]


In [10]:
convert_samples_to_binary(x, df["Id"], image_directory, precision, one, zero, n_jobs, feature_types)

## Place Folders into the Correct Categories

In [11]:
dirs = Y.unique().tolist()
if not os.path.exists(image_directory + '/data'):
    os.mkdir(image_directory + '/data')
new_dir = image_directory + '/data/'
if not os.path.exists(new_dir+'Train/'):
    os.mkdir(new_dir+'Train/')
if not os.path.exists(new_dir+'Validation/'):
    os.mkdir(new_dir+'Validation/')
for i in dirs:
    i = str(i).split('-')[1]
    if not os.path.exists(new_dir+'Train/'+i):
        os.mkdir(new_dir+'Train/'+i)
    if not os.path.exists(new_dir+'Validation/'+i):
        os.mkdir(new_dir+'Validation/'+i)

### Place pictures into the correct folder

In [12]:
total_images = 0
type_counts = {value.split("-")[1]: 0 for value in dirs}
for file in os.listdir(image_directory):
    try:
        dir = file.split("-")[1].split(".")[0]
    except:
        continue
    type_counts[dir]+=1
    shutil.move(f"{image_directory}/{file}", f"{new_dir}Train/{dir}/{file}")
    total_images += 1
print(total_images)
print(type_counts)

150
{'setosa': 50, 'versicolor': 50, 'virginica': 50}


### Place 20% of the data into the test folder

In [13]:
import random
for dir in os.listdir(new_dir+"Train/"):
    images_to_move = int(type_counts[dir] * 0.2)
    # select images_to_move random images from the directory
    for _ in range(images_to_move):
        image = random.choice(os.listdir(f"{new_dir}Train/{dir}"))
        shutil.move(f"{new_dir}Train/{dir}/{image}", f"{new_dir}Validation/{dir}/{image}")

## Train ResNet50

### Once experiments are running, view the live updates on TensorBoard
Run the command `tensorboard --logdir=[your log directory]` in the terminal. 
<br />
<br />
Then, open a browser and navigate to `localhost:6006`.
<br />
<br />
![TensorBoard](./tensorboard.png)

In [1]:
from resnet import print_dl_versions, train_resnet_model_k_fold, evaluate_on_test_data

2023-03-02 20:17:00.507577: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1


In [2]:
# FIXME - update the values below to match your dataset
img_size = 64
target_size = (img_size,img_size)
num_classes = 3
batch_size = 32
num_folds = 3
number_of_epochs = 50

In [4]:
best_model = train_resnet_model_k_fold(num_classes, img_size, f"{new_dir}/Train/", number_of_epochs, f"{image_directory}/results/", num_folds, batch_size)

2023-03-02 20:17:16.808451: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-03-02 20:17:16.809210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-03-02 20:17:16.843275: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-02 20:17:16.844050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 20 deviceMemorySize: 7.92GiB deviceMemoryBandwidth: 298.32GiB/s
2023-03-02 20:17:16.844076: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2023-03-02 20:17:16.845430: I tensorflow/stream_executor/platform/d

Fold:  0
Found 96 images belonging to 3 classes.
Found 24 images belonging to 3 classes.


2023-03-02 20:17:18.238329: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:17:18.238352: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2023-03-02 20:17:18.238371: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2023-03-02 20:17:18.238918: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.10.1
2023-03-02 20:17:18.308612: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:17:18.308701: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2023-03-02 20:17:18.457576: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2023-03-02 20:17:18.474127: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 4200000000 Hz


Epoch 1/50


2023-03-02 20:17:22.473107: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2023-03-02 20:17:22.616649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2023-03-02 20:17:23.218283: W tensorflow/stream_executor/gpu/asm_compiler.cc:63] Running ptxas --version returned 256
2023-03-02 20:17:23.274308: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.




2023-03-02 20:18:02.821074: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:18:02.821111: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2023-03-02 20:18:02.880520: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.




2023-03-02 20:18:02.882052: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2023-03-02 20:18:02.889705: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:228]  GpuTracer has collected 1740 callback api events and 1688 activity events. 
2023-03-02 20:18:02.919719: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:18:02.949296: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_02
2023-03-02 20:18:02.971672: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_02/drake-pc.trace.json.gz
2023-03-02 20:18:03.022471: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /mnt/sda1/image-results-iris/results/tb_logs/train/plug


Epoch 00001: val_acc improved from -inf to 0.33333, saving model to /mnt/sda1/image-results-iris/results/fold0-resnet50v2-saved-model-01-val_acc-0.33.hdf5
Epoch 2/50

Epoch 00002: val_acc did not improve from 0.33333
Epoch 3/50

Epoch 00003: val_acc improved from 0.33333 to 0.54167, saving model to /mnt/sda1/image-results-iris/results/fold0-resnet50v2-saved-model-03-val_acc-0.54.hdf5
Epoch 4/50

Epoch 00004: val_acc improved from 0.54167 to 0.70833, saving model to /mnt/sda1/image-results-iris/results/fold0-resnet50v2-saved-model-04-val_acc-0.71.hdf5
Epoch 5/50

Epoch 00005: val_acc did not improve from 0.70833
Epoch 6/50

Epoch 00006: val_acc did not improve from 0.70833
Epoch 7/50

Epoch 00007: val_acc did not improve from 0.70833
Epoch 8/50

Epoch 00008: val_acc did not improve from 0.70833
Epoch 9/50

Epoch 00009: val_acc did not improve from 0.70833
Epoch 10/50

Epoch 00010: val_acc did not improve from 0.70833
Epoch 11/50

Epoch 00011: val_acc did not improve from 0.70833
Epoch 

2023-03-02 20:18:27.257311: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:18:27.257343: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2023-03-02 20:18:27.257461: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:18:27.257526: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed


Epoch 1/50

2023-03-02 20:18:31.225214: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:18:31.225239: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.




2023-03-02 20:18:31.613852: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2023-03-02 20:18:31.618143: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2023-03-02 20:18:31.628963: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:228]  GpuTracer has collected 1740 callback api events and 1688 activity events. 
2023-03-02 20:18:31.667309: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:18:31.701172: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_31
2023-03-02 20:18:31.723676: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_31/drake-pc.trace.json.gz
2023-03-02 20:18:31.788824: I tensorflow/core


Epoch 00001: val_acc improved from -inf to 0.33333, saving model to /mnt/sda1/image-results-iris/results/fold1-resnet50v2-saved-model-01-val_acc-0.33.hdf5
Epoch 2/50

Epoch 00002: val_acc improved from 0.33333 to 0.45833, saving model to /mnt/sda1/image-results-iris/results/fold1-resnet50v2-saved-model-02-val_acc-0.46.hdf5
Epoch 3/50

Epoch 00003: val_acc improved from 0.45833 to 0.54167, saving model to /mnt/sda1/image-results-iris/results/fold1-resnet50v2-saved-model-03-val_acc-0.54.hdf5
Epoch 4/50

Epoch 00004: val_acc improved from 0.54167 to 0.62500, saving model to /mnt/sda1/image-results-iris/results/fold1-resnet50v2-saved-model-04-val_acc-0.62.hdf5
Epoch 5/50

Epoch 00005: val_acc did not improve from 0.62500
Epoch 6/50

Epoch 00006: val_acc did not improve from 0.62500
Epoch 7/50

Epoch 00007: val_acc did not improve from 0.62500
Epoch 8/50

Epoch 00008: val_acc did not improve from 0.62500
Epoch 9/50

Epoch 00009: val_acc improved from 0.62500 to 0.70833, saving model to /mn

2023-03-02 20:18:39.686874: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:18:39.686900: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2023-03-02 20:18:39.687003: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:18:39.687066: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed


Epoch 1/50

2023-03-02 20:18:43.747555: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2023-03-02 20:18:43.747582: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.




2023-03-02 20:18:44.139056: I tensorflow/core/profiler/lib/profiler_session.cc:71] Profiler session collecting data.
2023-03-02 20:18:44.141756: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2023-03-02 20:18:44.154864: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:228]  GpuTracer has collected 1740 callback api events and 1688 activity events. 
2023-03-02 20:18:44.189887: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2023-03-02 20:18:44.221805: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_44
2023-03-02 20:18:44.245767: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to /mnt/sda1/image-results-iris/results/tb_logs/train/plugins/profile/2023_03_02_20_18_44/drake-pc.trace.json.gz
2023-03-02 20:18:44.314967: I tensorflow/core


Epoch 00001: val_acc improved from -inf to 0.33333, saving model to /mnt/sda1/image-results-iris/results/fold2-resnet50v2-saved-model-01-val_acc-0.33.hdf5
Epoch 2/50

Epoch 00002: val_acc improved from 0.33333 to 0.50000, saving model to /mnt/sda1/image-results-iris/results/fold2-resnet50v2-saved-model-02-val_acc-0.50.hdf5
Epoch 3/50

Epoch 00003: val_acc did not improve from 0.50000
Epoch 4/50

Epoch 00004: val_acc did not improve from 0.50000
Epoch 5/50

Epoch 00005: val_acc did not improve from 0.50000
Epoch 6/50

Epoch 00006: val_acc did not improve from 0.50000
Epoch 7/50

Epoch 00007: val_acc did not improve from 0.50000
Epoch 8/50

Epoch 00008: val_acc improved from 0.50000 to 0.54167, saving model to /mnt/sda1/image-results-iris/results/fold2-resnet50v2-saved-model-08-val_acc-0.54.hdf5
Epoch 9/50

Epoch 00009: val_acc did not improve from 0.54167
Epoch 10/50

Epoch 00010: val_acc improved from 0.54167 to 0.66667, saving model to /mnt/sda1/image-results-iris/results/fold2-resne

## Evaluate Performance on Test Data

In [5]:
evaluate_on_test_data(f"{image_directory}/results/best_model.h5", f"{new_dir}Validation/", img_size, batch_size)

Found 54 images belonging to 3 classes.
[0.5954808592796326, 0.8703703880310059, 0.8693181872367859, 0.935442328453064, 7.0, 7.0, 101.0, 47.0, 0.10876356065273285, 0.08314419537782669]
