# Assessment 1: BugFinder
### Intro
This notebook is to be the record of completion for Assessment 1: Machine Learning.
### Scenario
Develop a model to be used with a hand-held hyperspectral camera system to identify harmful pests on containers and vessels entering the country, with the aim of preventing those pests from establishing themselves in this country and destroying native wildlife. This project will use a standard camera to develop a proof of concept for this system.

In [5]:
# Setup - imports
from fastai.vision.widgets import *
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to C:\Users\captoj/.cache\torch\hub\checkpoints\resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 96.1MB/s]


epoch,train_loss,valid_loss,error_rate,time
0,0.168688,0.026197,0.012179,00:46


epoch,train_loss,valid_loss,error_rate,time
0,0.063771,0.018406,0.00406,00:44


In [None]:
uploader = widgets.FileUpload()
uploader

## Dataset Preprocessing & Organisation
#### Locating Dataset
I used Kaggle to research datasets containing images of a variety of different insects, and located two potentially suitable, pre-labelled data sets:
- https://www.kaggle.com/datasets/shameinew/insect-images-with-scientific-names
- https://www.kaggle.com/datasets/rtlmhjbn/ip02-dataset

After examining these large datasets, I chose to take a subset of the first dataset as a proof of concept to specialise in identifying pests of highest concern. The following code is importing this dataset for use in the next steps.

In [4]:
insect_types = ['khapra beetle', 'fire ant', 'spongy moth', 'giant snail', 'honey bee']
path = Path('Insects')
if not path.exists():
    path.mkdir()
    for o in insect_types:
        dest = (path/o)
        dest.mkdir(exist_ok=True)

#### Preprocessing Dataset
The following code will take the pre-labelled dataset and correct the sizes of all images to a consistent size.

In [3]:
datablock = DataBlock(blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(valid_pct=0.2, seed=42), get_y=parent_label, item_tfms=RandomResizedCrop(224, min_scale=0.5))
datablock = datablock.new(item_tfms=RandomResizedCrop(224, min_scale=0.5), batch_tfms=aug_transforms())
dls = datablock.dataloaders(path)


### Dataset Organisation
The following code will split the dataset into a training folder, validation folder and testing folder.

In [None]:
# code goes here

## Creating an ML Model:

1. Utilize the fastai library to create an image classification model.
2. Choose an appropriate deep learning architecture (e.g., CNN) for the model.
3. Train the model using the training dataset, considering hyperparameter tuning.
4. Monitor training progress and adjust if necessary.

## Model Scoring:

1. Use the trained model to predict pest species in a given set of images from the validation dataset.
2. Evaluate the model's performance using appropriate metrics (e.g., accuracy, precision, recall).
3. Visualize the model's predictions and actual labels.

## Validation and Test Datasets:

1. Create a validation dataset that was not used during training to assess the model's generalization ability.
2. Ensure that the validation dataset contains images with varying conditions and perspectives.
3. Additionally, prepare a separate test dataset for final model evaluation.

## ML Evaluations:

1. Apply the trained model to the test dataset to evaluate its performance on previously unseen data.
2. Analyze the model's predictions, misclassifications, and potential areas of improvement.
3. Summarize the assessment of the model's capabilities and limitations.