# Feature Reduction Notebook

## Objectives 

The model already performs at 99% accuracy, this hypothesis seeks to discover if reducing input complexity by focusing on grayscale, improves computational efficiency without compromising on the performance required by the client of >97% accuracy.


* Answer Business Requirement 2:
    * The client is interested to know if a cherry leaf has powdery mildew or not.

## Inputs
* inputs/cherry_leaves/cherry-leaves/train
* inputs/cherry_leaves/cherry-leaves/test
* inputs/cherry_leaves/cherry-leaves/validation
   

## Outputs 
* Images distribution plot in train, validation and test set
* Image augmentation and greyscale 
* Class indices to change prediction inference in labels
* Machine learning model creation and training
* Save model
* Learning curve plot for model performance
* Model evaluation on pickle file
* Prediction on the random image file
    
## Additional Comments | Insights | Conclusions
# Objective:
Investigate whether converting images to grayscale improves performance.

# Key Takeaways:
Reduction in Computational Cost:
* Grayscale images reduced memory usage and training time.
Accuracy Impact:
* Accuracy remained comparable to color images. However, performance was not significantly better, suggesting color information does not contribute much.


## Import Libraries

In [None]:
import datetime
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import TensorBoard
import joblib

## Set Working Directory

In [None]:
current_dir = os.getcwd()
current_dir

In [None]:
os.chdir('/workspace/NEW-CHERRY-LEAVES')
print("You set a new current directory")

In [None]:
work_dir = os.getcwd()
work_dir

## Set Input Directories

In [12]:
my_data_dir = 'inputs/cherry_leaves/cherry-leaves'
test_path = my_data_dir + '/test'
train_path = my_data_dir + '/train'
val_path = my_data_dir + '/validation'

## Set Output Directory

In [None]:
version = 'v2'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(work_dir) and version in os.listdir(work_dir + '/outputs'):
  print('Old version is already available create a new version.')
  pass
else:
  os.makedirs(name=file_path)

## Log Directory

In [None]:
log_dir = f"outputs/{version}/logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch='500,520')

### Set labels

In [None]:
labels = os.listdir(train_path)

print(
    f"Project Labels: {labels}"
    )

### Set Greyscale image shape

In [None]:
image_shape = (256, 256, 1)  # Grayscale shape
joblib.dump(value=image_shape, filename=f"{file_path}/image_shape.pkl")

### Data Distribution Visualisation 

In [None]:
import pandas as pd
import os

df_freq = pd.DataFrame(columns=['Set', 'Label', 'Frequency'])  # Initialize empty DataFrame

for folder in ['train', 'validation', 'test']:
    for label in labels:
        row = pd.DataFrame([{
            'Set': folder,
            'Label': label,
            'Frequency': int(len(os.listdir(my_data_dir + '/' + folder + '/' + label)))
        }])

        df_freq = pd.concat([df_freq, row], ignore_index=True)  

        print(f"* {folder} - {label}: {len(os.listdir(my_data_dir + '/' + folder + '/' + label))} images")

print("\n")

## Image Data Augmentation

### Augmentation for Greyscale creation

In [None]:
batch_size = 20
augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255)

train_set = augmented_image_data.flow_from_directory(train_path,
                                                      target_size=image_shape[:2],
                                                      color_mode='grayscale',
                                                      batch_size=batch_size,
                                                      class_mode='binary',
                                                      shuffle=True)

validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(val_path,
                                                                        target_size=image_shape[:2],
                                                                        color_mode='grayscale',
                                                                        batch_size=batch_size,
                                                                        class_mode='binary',
                                                                        shuffle=False)

test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(test_path,
                                                                  target_size=image_shape[:2],
                                                                  color_mode='grayscale',
                                                                  batch_size=batch_size,
                                                                  class_mode='binary',
                                                                  shuffle=False)

In [None]:
for _ in range(3):
    img, label = next(train_set)
    print(img.shape)   #  (1,256,256,3)
    plt.imshow(img[0])
    plt.show()

### Save class indices

In [None]:
joblib.dump(value=train_set.class_indices, filename=f"{file_path}/class_indices.pkl")

## Model Creation

### Define Model

Define the model using layers (the layers repeat to progressively extract more complex and higher-level features from the input data).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

In [16]:
def create_tf_model_grayscale():
    model = Sequential()
    model.add(Conv2D(filters=32, kernel_size=(3, 3), input_shape=image_shape, activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

### Initialise Model

In [None]:
model = create_tf_model_grayscale()
model.summary()

### Set Early Stopping

In [18]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)

### Train Model

In [None]:
import math

steps_per_epoch = math.ceil(train_set.samples / batch_size)

model = create_tf_model_grayscale()
model.fit(
    train_set,
    epochs=25,
    steps_per_epoch=steps_per_epoch,
    validation_data=validation_set,
    callbacks=[early_stop],
    verbose=1
)

### Save Model

In [None]:
model.save(f'{file_path}/cherry_leaves_model_grayscale.h5')
print(f"TensorBoard logs saved to: {log_dir}")

### Plot Training Results

In [None]:
losses = pd.DataFrame(model.history.history)

sns.set_style("whitegrid")
losses[['loss','val_loss']].plot(style='.-')
plt.title("Loss")
plt.savefig(f'{file_path}/model_training_losses.png', bbox_inches='tight', dpi=150)
plt.show()

print("\n")
losses[['accuracy','val_accuracy']].plot(style='.-')
plt.title("Accuracy")
plt.savefig(f'{file_path}/model_training_acc.png', bbox_inches='tight', dpi=150)
plt.show()

In [None]:
from keras.models import load_model
model = load_model('outputs/v2/cherry_leaves_model_grayscale.h5')

In [None]:
test_loss, test_accuracy = model.evaluate(test_set, verbose=1)

print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

In [None]:
evaluation = model.evaluate(validation_set, verbose=1)

In [None]:
joblib.dump(value=evaluation ,
            filename=f"outputs/v2/evaluation.pkl")

### Conclusion:
Grayscale conversion reduces training time without compromising accuracy. However, it does not enhance performance, making it an optimization strategy rather than a necessity.

In [36]:
!cat gitignore

cat: gitignore: No such file or directory


In [37]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   app_pages/multipage.py[m
	[31mmodified:   jupyter_notebooks/refined_FeatureReduction.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


In [38]:
!git add .

In [34]:
!git commit -am "edit comments on multipage.py"

[main 254d6db]  reduce file size
 4 files changed, 129 insertions(+), 943 deletions(-)


In [35]:
!git push

Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 32 threads
Compressing objects: 100% (7/7), done.
Writing objects: 100% (7/7), 8.66 KiB | 8.66 MiB/s, done.
Total 7 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/Katherine-Holland/NEW-CHERRY-LEAVES.git
   c01fd72..254d6db  main -> main
