# *Init*

In [45]:
#@title Imports

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import pandas as pd
import cv2
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from tensorflow.keras.preprocessing import image
from sklearn.model_selection import train_test_split

In [46]:
# #@title Mount Data
# # Mount data drive

# from google.colab import drive
# drive.mount('/content/drive/', force_remount=True)
# %cd /content/drive/MyDrive/CPSC4300-ADS-Project

# **Unhealthy Tree Detection in Segmented Drone Footage via Machine Learning**
**Clemson University | Fall 2023**<br>
**Authors:** Scott Logan, Lisa Umatoni, Mostafa Saberian, Ian McCall, Neil Kuehn


NOTES:

Look into adapting a pretrained model, such as VGG or Resnet, and retraining only the last few layers.

# **Project Goals!**

# To Do
- Finish Data Cleaning Methods
- Implement CNNs 2,3
- Try implementing 4
- Look into adapting a pretrained model, such as VGG or Resnet, and retraining only the last few layers.

# **Data Summary**

The provided data set is a total of 83 images of trees, with 45 healthy trees and 28 sick tree images given. The main unit of analysis for determining whether a tree is sick or not is color. Healthy trees are greener and darker, whereas sick trees are yellowed and lighter. 

Very light areas, such as bare tree branches, are not counted as sick.

# **Data Cleaning Strategies**

In a photograph, there is plenty of noise that may distract a Machine Learning Model from the data that is being represented. As such, it may be beneficial to perform data cleaning steps before training or predicting using the model.


We have decided to implement the following data cleaning steps, and test the model's performance using various combinations of these steps:

**Greyscale:** Normalize each pixel into a single grey value

**Isolate Hue:** Isolate the Red channel of each pixel
> Tree health is largely defined by yellowing, which in an RGB value is defined by an increase in the Red value. As such, we may be interested primarily in the Red channel, and may increase model accuracy by isolating or at least exaggerating the Red channel of images during processing.

**Omit Values Beyond Range of Interest:** Remove information likely to confuse the model
> Areas of images with red values that are too high are likely to be unrelated to tree data, so they should be omitted


In [47]:
#@title Greyscale
def greyscale(input_img):
    
    output_img = cv2.cvtColor(input_img, cv2.COLOR_BGR2GRAY)
    return output_img

In [48]:
#@title Normalize Saturation and Value:
def normalize_saturation_value(input_img):
    # Convert the image from BGR to HSV color space
    hsv_image = image.rgb_to_hsv(input_img)
    #hsv_image = cv2.cvtColor(input_img, cv2.COLOR_BGR2HSV)

    # Split the HSV image into separate channels
    h, s, v = cv2.split(hsv_image)

    # Normalize the saturation and value channels
    s = np.uint8(np.clip((s * 1.2), 0, 255))
    v = np.uint8(np.clip((v * 1.2), 0, 255))

    # Merge the normalized channels back into an HSV image
    hsv_image = cv2.merge([h, s, v])

    # Convert the HSV image back to BGR color space
    output_image = cv2.cvtColor(hsv_image, cv2.COLOR_HSV2BGR)

    return output_image

In [6]:
#@title Isolate Hue:
def isolate_hue(input_img):
    
    # Exaggerate the red channel by decreasing blue and green by 80%
    output_img = input_img[:, :, :2] * 0.2

    return output_img

In [7]:
#@title Omit Unwanted Pixel Data Beyond Range of Interest

def omit_unwanted_ranges(input_img):

    output_img = input_img
    return output_img

In [52]:
#@title Normalize Pixels

def normalize_pixels(input_img):
    img_arr = image.img_to_array(input_img)
    img_arr / 255.0

    output_img = image.array_to_img(img_arr)
    return output_img

In [57]:
#@title Resize Images

def resize_img(input_img, targetwidth, targetheight):
    output_img = tf.image.resize(input_img, [targetwidth, targetheight])
    return output_img

In [58]:
#@title Cleaning Function

def clean_data(input_img, GS=False, NSV=False, IH=False, OU=False, NP=True, RI=True, resize_W=2000, resize_H=1125):
    clean_img = input_img
    if GS:
        clean_img = greyscale(clean_img)
    if NSV:
        clean_img = normalize_saturation_value(clean_img)
    if IH:
        clean_img = isolate_hue(clean_img)
    if OU:
        clean_img = omit_unwanted_ranges(clean_img)
    if NP:
        clean_img = normalize_pixels(clean_img)
    if RI:
        clean_img = resize_img(clean_img, 2000, 1125)

    output_img = clean_img
    return output_img

# **Selected Model:** Convolutional Neural Network

We considered a number of different models for this project, choosing CNN as our initial model selection:
- **-> Convolutional Neural Network (CNN):** Suitable for classifying photos by visible features, which we plan to use by training the CNN to detect color patterns typical of sick trees.
- **Classification Model:** Suitable for binary response values, which may be useful to classify healthy vs. sick.
- **Clustering Model:** May be useful to detect multiple instances of sick trees within an image, using elbow method to determine number of sick tree instances.


# Model Architecture

A few different CNN implementations will be tested, primarily as an exploration of how CNNs work.

- Implementation 1: Basic Brute-Force Approach
 - Training Data:
   - Cleaning Steps: Resize Images, Normalize Pixels
 - Two Convolutional Layers: 
 - Two Pooling Layers: 
 - Two Dense Layers:
   - 128-unit Relu layer: Learn non-linear transformations of features to capture complex relationships between features.
   - 1-unit Sigmoid layer: Filters output into a single binary value: 0 for healthy (lacks sick features), 1 for sick (contains sick features)
 - 

- Model 2: Cleaned Approaches


- Model 3: Noisy Approach


- Model 4: Repurpose trailed model

# **Model Training**

In order to train and test our model, we need to construct a feature matrix on which to train our data. For this CNN model, the input data needs to come as an array of tuples defined as (id, feature). Images of sick regions of trees will be paired with an ID of 1.

In [59]:
# Define Data Cleaning step for Model
default_width, default_height = 400, 225
def clean_data(input_img, resize_width=default_width, resize_height=default_height):
    clean_img = input_img
    
    # Apply Data Cleaning Functions
    clean_img = normalize_pixels(clean_img)
    clean_img = resize_img(clean_img, resize_width, resize_height)
    
    output_img = clean_img
    return output_img

In [60]:
# Read Images

tree_imgs = []
healthy_tree_imgs = []
sick_tree_imgs = []
sick_tree_features = []

# Read files and perform data cleaning steps
dir_name = "data/healthy"
for file in os.listdir(dir_name):
    img_path = os.path.join(dir_name, file)
    img = image.load_img(img_path, target_size=(224, 224))
    # img = cv2.imread(os.path.join(dir_name, file))
    clean_img = clean_data(img)
    tree_imgs.append(clean_img)
    healthy_tree_imgs.append(clean_img)

dir_name = "data/sick"
for file in os.listdir(dir_name):
    img_path = os.path.join(dir_name, file)
    img = image.load_img(img_path, target_size=(224, 224))
    clean_img = clean_data(img)
    tree_imgs.append(clean_img)
    sick_tree_imgs.append(clean_img)

# dir_name = "data/sick_features"    
# for file in os.listdir(dir_name):
#     img = cv2.imread(os.path.join(dir_name, file))
#     clean_img = clean_data(img, 100, 100)
#     sick_tree_features.append(clean_img)




In [61]:
# Create feature matrix and split into training and testing data
feature_mx = np.array([(0, img) for img in healthy_tree_imgs] + [(1, img) for img in sick_tree_imgs])

train_data, test_data = train_test_split(feature_mx)
X_train = train_data[0]
y_train = test_data[1]

X_test = test_data[0]
y_test = test_data[1]

  feature_mx = np.array([(0, img) for img in healthy_tree_imgs] + [(1, img) for img in sick_tree_imgs])


In [62]:
# Define the CNN model
model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(default_width, default_height, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_test, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Accuracy: {accuracy}")

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

In [37]:
y_train = pd.DataFrame(test_data, columns=['label', 'img'])


Unnamed: 0,label,img
0,1,"[[[29, 73, 58], [39, 84, 68], [34, 81, 65], [4..."
1,0,"[[[24, 41, 32], [21, 37, 30], [21, 37, 30], [2..."
2,1,"[[[51, 46, 47], [36, 31, 32], [40, 38, 38], [5..."
3,1,"[[[53, 73, 74], [29, 49, 50], [89, 110, 112], ..."
4,1,"[[[110, 153, 158], [91, 133, 133], [93, 135, 1..."
5,0,"[[[59, 68, 78], [160, 166, 177], [149, 155, 16..."
6,0,"[[[38, 52, 58], [39, 53, 59], [79, 94, 97], [7..."
7,0,"[[[32, 53, 50], [35, 56, 53], [36, 59, 55], [4..."
8,0,"[[[58, 89, 86], [62, 91, 88], [63, 90, 87], [4..."
9,0,"[[[25, 34, 31], [15, 24, 21], [17, 26, 23], [2..."


In [39]:
y_train.describe()

Unnamed: 0,label,img
count,15,15
unique,2,15
top,0,"[[[29, 73, 58], [39, 84, 68], [34, 81, 65], [4..."
freq,9,1


# **Results and Discussion**