# Machine Learning

Andrew Esch, Evan Lee, Collin Stratton

Dr. Isac Artzi

CST-425

4/17/2022

# Introduction
The purpose of this assignment is to synthesize the complex aspects of machine learning. Specifically, this project will create an image model that will be able to predict the type of food an image of a food item is, and then pass in an image of a dog to see what type of food the dog is. This is a fun way to implement machine learning in a unique way to showcase the skills learned in this course.

# Project Description
## Machine Learning Problem
Image recognition is extremely useful and can be used in many ways. We have the idea of figuring out which foods a dog would be. This may not be a breakthrough project by itself but at its core can be very useful. Creating the model proved to take a very long time, but with the implementation of CUDA cores to speed up the image model using tensorflow and cv2, we looked to improve the code we found that creates the model. Machine learning models should utilize technology to the fullest and companies that are not implementing CUDA in their code could be losing lots of money and time on their models.

## Problem Sketch
See the images in the folder.


# Code
The outline of the code is as follows: First, the appropriate libraries are install and the images are loaded. Next, the images are labeled and categorized for preprocessing. Then, the images are preprocessed and the model is trained. Finally, the model is used to predict what type of food an image of a dog is.  

## Setup

In [None]:
# import libraries
from sklearn.model_selection import train_test_split
import tensorflow as tf
import os.path
import numpy as np
import pandas as pd

from pathlib import Path

import requests
from PIL import Image
from io import BytesIO
import cv2

## Get Images and Label Data
# About the dataset:
Our dataset is from kaggle (link at the bottom) with 101 different foods and about 300 pictures per food item. Using this super in depth color image dataset, we are able to build an accurate model for these foods that will allow the best fit for any dog we enter into the model. Our data does not need to be normalized or standardized before it is used in the modeling process. There is no need to handle outliers for the pictures because all the foods in each folder are indeed the correct food, but possibly at an angle or presented differently but this would help our model instead of hurting it.

# Explanation of Code:
The first step is downloading the full dataset from kaggle and running it locally on our device. We decided to preform it locally instead of on GitHub since the file sizes would be too big. Then the os paths of the labels and food items are taken and put in a pandas dataframe.

In [None]:
# import images
# andrew_dir = Path('C:/Users/Drew/OneDrive/Pictures/ml-food/images')
evan_dir = Path('C:/GCU Academics/Junior Year/Second Semester/CST-425/Food Dataset/FoodFiles/images')
# collin_dir = Path('/Users/collinstratton/Documents/archive/images')

image_dir = Path(evan_dir)
images_paths = list(image_dir.glob(r'**/*.jpg'))
labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1], images_paths))

images_paths = pd.Series(images_paths, name='Image_Path').astype(str)
labels = pd.Series(labels, name='Label')

images = pd.concat([images_paths, labels], axis=1)
print("First chunk done")

In [None]:
# split images into categories
category_samples = []
for category in images['Label'].unique():
    category_slice = images.query('Label == @category')
    category_samples.append(category_slice.sample(300, random_state=1))

# group images into tables and check the number of images by category
images_samples = pd.concat(category_samples, axis=0)
images_samples['Label'].value_counts()

## Preprocess Images
# Building the model:
We use sklearn to create a training and testing image dataset with a 70/30 split. We then will use keras to preform preprocessing on the images on both the training and testing set. We decided to use 10 epochs or 10 passthroughs of the model which nets around a 90% accuracy for food items. On CPU alone it will take over an hour to run but CUDA will significantly decrease that time.

In [None]:
# Testing and training
train_data, test_data = train_test_split(images_samples, train_size=0.7, shuffle=True, random_state=1)

train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
    validation_split=0.2
)

test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input
)

train_images = train_generator.flow_from_dataframe(
    dataframe=train_data,
    x_col='Image_Path',
    y_col='Label',
    target_size=(224, 224),
    color_mode='rgb',
    class_mode='categorical',
    batch_size=32,
    shuffle=True,
    seed=42,
    subset='training'
)

val_images = train_generator.flow_from_dataframe(
    dataframe=train_data,
    x_col='Image_Path',
    y_col='Label',
    target_size=(224, 224),
    color_mode='rgb',
    class_mode='categorical',
    batch_size=32,
    shuffle=True,
    seed=42,
    subset='training'
)

test_images = test_generator.flow_from_dataframe(
    dataframe=test_data,
    x_col='Image_Path',
    y_col='Label',
    target_size=(224, 224),
    color_mode='rgb',
    class_mode='categorical',
    batch_size=32,
    shuffle=False
)

pretrained_model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights='imagenet',
    pooling='avg',
)

# freeze the weights of the network to maintain transfer learning
pretrained_model.trainable = False

inputs = pretrained_model.input

# create two layers of our own
x = tf.keras.layers.Dense(128, activation='relu')(pretrained_model.output)
x = tf.keras.layers.Dense(128, activation='relu')(x)

# create output layer with the 101 classes
output_layer = tf.keras.layers.Dense(101, activation='softmax')(x)

# unite the original model with the new layers
model = tf.keras.Model(inputs, output_layer)

print(model.summary())
print("chunk done")

In [None]:
# Training
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(
    train_images,
    validation_data=val_images,
    epochs=10,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
    ]
)

# Training Results - Prediction Accuracy
results = model.evaluate(test_images, verbose=1)
print("Precision: {:.2f}%".format(results[1] * 100))

Epoch 1/10

## Image Prediction

In [None]:
# find what food a dog image looks like
# 'C:/GCU Academics/Junior Year/Second Semester/CST-425/CLC Shared GitHub Repository/Cloned Repository/MachineLearning/Daisy Pic.png'
#'/Users/collinstratton/Documents/archive/dog_images/17DOGS-mobileMasterAt3x-v2.jpg'
img = Image.open('/Users/collinstratton/Documents/archive/dog_images/17DOGS-mobileMasterAt3x-v2.jpg')
img.show()

img = np.array(img).astype(float) / 255

img = cv2.resize(img, (224, 224))
predict = model.predict(img.reshape(-1, 224, 224, 3))

In [None]:
# output the result
key_list = list(test_images.class_indices.keys())
value_list = list(test_images.class_indices.values())

position = value_list.index(np.argmax(predict))
print(key_list[position])

# Conclusion
This project was a fun way to implement machine learning in a unique way to showcase the skills learned in this course. The code is very simple and the results are very interesting. Even those the model was trained to be able to predict what type of food is in an image, it is versatile enough to be able to predict what type of food a dog is. The model got up to 80% accuracy on the food set, so we can only assume its high accuracy in predicting what type of food a dog is. 

# Resources
https://www.kaggle.com/datasets/kmader/food41

https://www.kaggle.com/code/sergiogarridomerino/clasificaci-n-de-comida

https://images.cv
