## **Environment Setup**

1. Mount our google drive directory
2. Set the current directory in our python runtime

IF NOT RUNNING IN GOOGLE COLAB, CHANGE THIS!

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

import os
# os.chdir('/content/drive/MyDrive/Ye/Projects/DairyBCS/BodyWeight') 
# os.chdir('/content/drive/MyDrive/Students/Ye/Projects/DairyBCS/BodyWeight')
os.chdir('/content/drive/MyDrive/School/CS5824/FINALPROJECT/')
!pwd

Mounted at /content/drive
/content/drive/MyDrive/School/CS5824/FINALPROJECT


## **Necessary Imports**

In [None]:
##opencv
from google.colab.patches import cv2_imshow
import cv2

import numpy as np
from glob import glob #read img path.
import pandas as pd
from pathlib import Path
import os.path

##tensorflow and keras
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn.metrics import r2_score

## **Parse Drive Directory for Cow Data**

REQUIREMENTS:
1. The data_dir variable contains the directory which contains each of the day folders.
2. Each day folder contains depth, CSV, and RGB subfolders

data_dir
> D1
>> depth\
>> CSV\
>> RGB

Where each of the depth, CSV, and RGB folders store folders which have name (COWID)AM or (COWID)PM
and store their respective data.

The day folders should be formatted like so:

D1
>depth
>>2343AM
>>>__123412.png

>>2343PM
>>>__123412.png

In [None]:
# Store paths to all cow images
import string
import re
day_regex = 'D\d+$'
data_dir = '/COW_DATA/'
root_dir = os.getcwd() + data_dir   # CHANGE THIS TO MATCH YOUR FOLDER STRUCTURE
                                    # SHOULD BE WERE ALL THE DX FOLDERS ARE LOCATED
day_dirs = list(filter(lambda day: re.search(day_regex, day), os.listdir(root_dir)))
print('Inside of: ', root_dir)
print('Got information for days: ', ', '.join(day_dirs))
depth_dir = '/depth/'
csv_dir = '/CSV/'
rgb_dir = '/RGB/'

cow_depth_paths = np.array([], dtype=np.unicode_)   # Path to all cow depth data folders
cow_csv_paths = np.array([], dtype=np.unicode_)     # Path to all cow depth data folders
cow_rgb_paths = np.array([], dtype=np.unicode_)     # Path to all cow depth data folders

for day in day_dirs:
  print('Beginning parse of ', day)
  temp_depth_dir = root_dir + day + depth_dir
  temp_csv_dir = root_dir + day + csv_dir
  temp_rgb_dir = root_dir + day + rgb_dir

  try:
    print('Attempting to parse depth in ', day)
    for cow_dir in os.listdir(temp_depth_dir): # PARSE depth SUBDIRECTORY (/DX/depth/)
      cowid = temp_depth_dir + cow_dir
      for image in os.listdir(cowid):
        path_from_cwd = '.' + data_dir + day + depth_dir + cow_dir + '/'
        cow_depth_paths = np.append(cow_depth_paths, path_from_cwd + image)
  except FileNotFoundError:
    print(day, '/depth was not found. Skipping to the next.')

  try:
    print('Attempting to parse RGB in ', day)
    for cow_dir in os.listdir(temp_rgb_dir): # PARSE RGB SUBDIRECTORY (/DX/CSV/)
      cowid = temp_rgb_dir + cow_dir
      for image in os.listdir(cowid):
        path_from_cwd = '.' + data_dir + day + rgb_dir + cow_dir + '/'
        cow_rgb_paths = np.append(cow_rgb_paths, path_from_cwd + image)
  except FileNotFoundError:
    print(day, '/RGB was not found. Skipping to the next.')

  try:
    print('Attempting to parse CSV in ', day)
    for cow_dir in os.listdir(temp_csv_dir): # PARSE CSV SUBDIRECTORY (/DX/RGB/)
      cowid = temp_csv_dir + cow_dir
      for image in os.listdir(cowid):
        path_from_cwd = '.' + data_dir + day + csv_dir + cow_dir + '/'
        cow_csv_paths = np.append(cow_csv_paths, path_from_cwd + image)
  except FileNotFoundError:
    print(day, '/CSV was not found. Skipping to the next.')
    
  print('Finished parsing: ', day)

Inside of:  /content/drive/MyDrive/School/CS5824/FINALPROJECT/COW_DATA/
Got information for days:  D5, D2, D1, D4, D3
Beginning parse of  D5
Attempting to parse depth in  D5
Attempting to parse RGB in  D5
Attempting to parse CSV in  D5
Finished parsing:  D5
Beginning parse of  D2
Attempting to parse depth in  D2
Attempting to parse RGB in  D2
Attempting to parse CSV in  D2
Finished parsing:  D2
Beginning parse of  D1
Attempting to parse depth in  D1
Attempting to parse RGB in  D1
D1 /RGB was not found. Skipping to the next.
Attempting to parse CSV in  D1
D1 /CSV was not found. Skipping to the next.
Finished parsing:  D1
Beginning parse of  D4
Attempting to parse depth in  D4
Attempting to parse RGB in  D4
Attempting to parse CSV in  D4
Finished parsing:  D4
Beginning parse of  D3
Attempting to parse depth in  D3
Attempting to parse RGB in  D3
Attempting to parse CSV in  D3
D3 /CSV was not found. Skipping to the next.
Finished parsing:  D3


### **Cow Path Information**

Now the paths to all found files will be stored in cow_depth_paths, cow_rgb_paths, and cow_csv_paths respectively.

All indices in each of the lists are of the form:

./COW_DATA/D5/depth/5687PM/_Depth_61321.png

(NOTE THAT THIS PATH IS UNIQUE TO MY OWN ENVIRONMENT. IT MAY LOOK DIFFERENT FOR YOU.)

In [None]:
print(cow_depth_paths[0])

./COW_DATA/D5/depth/4973PM/_Depth_60899.png


## **Cow Weight Information Fetching**

Now grab the read weights from the csv file.\
The CSV file should be able to be found on the top level of the data_dir directory, on the same
level as the D1, D2, ...., DN folders.

First, read the CSV file into a dataframe.\
Then we will match up the provided weights from the CSV with the depths they correspond to on a daily basis.

In [None]:
bw_csv_path = os.getcwd() + data_dir + 'BodyWeight_cleaned.csv'
bw_df = pd.read_csv(bw_csv_path) # Read weight csv into a pandas dataframe

# Get the largest 'day' number out of all found days
largest_day = 0
for day in day_dirs:
  if (int(day[1:]) > largest_day):
    largest_day = int(day[1:])
daily_labelled_depth_images = [[] for _ in range(largest_day)]

for i in range(len(cow_depth_paths)):
  split_depth_path = cow_depth_paths[i].split('/')
  day_str = split_depth_path[-4][1:]      # Used to access the weight dataframe
  day_idx = int(day_str) - 1              # Take a day string (ie D5), isolate '5' and subtract 1
                                          # convert to array index
  id = split_depth_path[-2]               # The cowid, of the form XXXXAM or XXXXPM

  try:
    weight = bw_df[bw_df.DAY == day_str][id].values[0] # grab the weight for given cow
  except KeyError:
    continue
  daily_labelled_depth_images[day_idx].append([cow_depth_paths[i], weight])

daily_labelled_depth_images = np.array(daily_labelled_depth_images, dtype=object)

## **Format Of daily_labelled_depth_images**

1st Dimension\
The array daily_labelled_depth_images will contain an array for each day between the largest day value (D10 for example) and the smallest day value (D1). Any folders which happened to not be found or exist in this range will appear as empty lists.

2nd Dimension\
The second dimension of daily_labelled_depth_images contains lists of length two which contain\
[PATH_TO_DEPTH_IMAGE, ASSOCIATED WEIGHT]

We also create a version of this information where each day's information is combined into a single list, removing the first dimension of daily_labelled_depth_images.

This is then converted into a DataFrame with the first column labelled 'FilePath' and the second column labelled 'Weights'

In [None]:
labelled_depth_images = []
for day in daily_labelled_depth_images:
  labelled_depth_images += day
labelled_depth_images = np.array(labelled_depth_images, dtype=object)

col_vals = ['FilePath', 'Weights']
labelled_depth_images = pd.DataFrame(data=labelled_depth_images, columns=col_vals)
labelled_depth_images['FilePath'] = labelled_depth_images['FilePath'].astype(str)
labelled_depth_images['Weights'] = labelled_depth_images['Weights'].astype(float)
labelled_depth_images

Unnamed: 0,FilePath,Weights
0,./COW_DATA/D1/depth/5327PM/_Depth_2079.png,1879.81
1,./COW_DATA/D1/depth/5327PM/_Depth_2111.png,1879.81
2,./COW_DATA/D1/depth/5327PM/_Depth_2089.png,1879.81
3,./COW_DATA/D1/depth/5327PM/_Depth_2172.png,1879.81
4,./COW_DATA/D1/depth/4973AM/_Depth_1362.png,1701.97
...,...,...
192,./COW_DATA/D5/depth/5687PM/_Depth_61321.png,1408.75
193,./COW_DATA/D5/depth/5687PM/_Depth_61322.png,1408.75
194,./COW_DATA/D5/depth/5687PM/_Depth_61329.png,1408.75
195,./COW_DATA/D5/depth/5687PM/_Depth_61276.png,1408.75


## **Create Train / Validation / Test Split, Load Images**

Prepare the images for use in the CNN now. Create the train_test_split, then create
ImageDataGenerator objects to scale the pixel values for each depth image, and subsequently create
a validation set out of the training set.

In [None]:
train_df, test_df = train_test_split(
    labelled_depth_images,
    train_size=0.7,
    shuffle=True,
    random_state=1
)

In [None]:
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale =1./255,
    validation_split = 0.2
)
test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255
)

In [None]:
train_images = train_generator.flow_from_dataframe(
    dataframe=train_df,
    x_col='FilePath',
    y_col='Weights',
    target_size=(120,120),
    color_mode='rgb',
    class_mode='raw',
    batch_size=10,
    shuffle=True,
    seed=42,
    subset='training',
)

val_images = train_generator.flow_from_dataframe(
    dataframe=train_df,
    x_col='FilePath',
    y_col='Weights',
    target_size=(120,120),
    color_mode='rgb',
    class_mode='raw',
    batch_size=10,
    shuffle=True,
    seed=42,
    subset='validation',
)

test_images = test_generator.flow_from_dataframe(
    dataframe=test_df,
    x_col='FilePath',
    y_col='Weights',
    target_size=(120,120),
    color_mode='rgb',
    class_mode='raw',
    batch_size=10,
    shuffle=False,
)

Found 110 validated image filenames.
Found 27 validated image filenames.
Found 60 validated image filenames.


## **CNN Design**

The code below outlines the architecture of the CNN we use.

The first Conv2D layer uses a large inital kernal size because the only feature in the input image is the cow itself, which covers a significant amount of area. Being able to absorb this information spread out across fewer filters will allow us to better capture the relationship between the area it is occupying and its resulting weight prediction - in theory.

The remaining layers function have no special modifications.

In [None]:
inputs = tf.keras.Input(shape=(120,120,3))
x = tf.keras.layers.Conv2D(filters=8, kernel_size=(30,30), activation='relu')(inputs)
x = tf.keras.layers.AveragePooling2D()(x)
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu')(x)
x = tf.keras.layers.AveragePooling2D()(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(1, activation='linear')(x)

## **CNN Training**

For training we use the Adam Optimizer and MSE loss.

We also perform early stopping checks in the event validation error rises. If the early stopping criteria is not hit, then we will perform 100 epochs of training.

In [None]:
cnn = tf.keras.Model(inputs = inputs, outputs = outputs)

cnn.compile(
    optimizer='adam',
    loss='mse'
)
history = cnn.fit(
    train_images,
    validation_data=val_images,
    epochs=100,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True
        )
    ]
)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100


## **Making Predictions on Test Set**

Here we run the test images through the trained CNN, and get predictions on the weights.

We also want to perfrom some amount of backend CV. To do so, select a single cow from each day and grab its associated measured weight, and perform a prediction.

These selected values will also be scored in the next block, at the same time as the entire testing set.

In [None]:
import random
predicted_weights = np.squeeze(cnn.predict(test_images))
true_weights = test_images.labels

# Additionally, we also want to select a single cow image from each day to report our
# final model metrics on.

# Each entry in the selected_cows array corresponds to one index in the daily_labelled_depth_images
# array where the randomly grabbed entry resides.
selected_cows = []
selected_true_weights = []
for day in daily_labelled_depth_images:
  if (len(day) > 0):
    rand_idx = random.randint(0, len(day) - 1)
    rand_cow = day[rand_idx]
    print(rand_cow)
    selected_cows.append(rand_cow)
    label_idx = np.where(labelled_depth_images['FilePath'] == rand_cow[0])[0][0]
    print(label_idx)
    selected_true_weights.append(labelled_depth_images['Weights'][label_idx])
  else:
    selected_cows.append(['no_cow', 0.0])

# Create a dataframe from the found filepaths
col_vals = ['FilePath', 'Weights']
selected_cows = pd.DataFrame(data=selected_cows, columns=col_vals)
selected_cows['FilePath'] = selected_cows['FilePath'].astype(str)
selected_cows['Weights'] = selected_cows['Weights'].astype(float)
# Now grab those images from the path and place them back in place
selected_images = test_generator.flow_from_dataframe(
    dataframe=selected_cows,
    x_col='FilePath',
    y_col='Weights',
    target_size=(120,120),
    color_mode='rgb',
    class_mode='raw',
    shuffle=False
)

selected_predicted_weights = np.squeeze(cnn.predict(selected_images))

['./COW_DATA/D1/depth/4973AM/_Depth_1359.png', 1701.97]
6
['./COW_DATA/D4/depth/5488AM/_Depth_12782.png', 2052.5]
98
['./COW_DATA/D5/depth/5687PM/_Depth_61321.png', 1408.75]
192
Found 3 validated image filenames.




## **Model Evaluation / Metric Determination**

To demonstrate the total quality of the ending model we find the RMSE and R^2 scores.

The RMSE demonstrates exactly how far from some regression line our data points are. Smaller is better.

The R^2 score demonstrates how much variation in our prediction is explained by the input images. Typically, larger means better. In our case this is the degree to which our predictions' variance can be explained by the contents of the image itself - or essentially how much our model is actually using those pixels to generate a conclusion.

In [None]:
print('METRICS ON ENTIRE DATASET:')
rmse = np.sqrt(cnn.evaluate(test_images, verbose=0))
print("Test RMSE:\t{:.5f}".format(rmse))
r2 = r2_score(true_weights,predicted_weights)
print("Test R^2 Score:\t{:.5f}".format(r2))
print('--------------------------')
print('METRICS ON SELECTED COWS:')
cv_rmse = np.sqrt(cnn.evaluate(selected_images, verbose=0))
print("CV RMSE:\t{:.5f}".format(rmse))
cv_r2 = r2_score(selected_true_weights,selected_predicted_weights)
print("CV R^2 Score:\t{:.5f}".format(r2))
print('--------------------------')

METRICS ON ENTIRE DATASET:
Test RMSE:	260.65550
Test R^2 Score:	0.17656
--------------------------
METRICS ON SELECTED COWS:
CV RMSE:	260.65550
CV R^2 Score:	0.17656
--------------------------
