# Tomato Disease Diagnosis

In [1]:
from fastai.vision.all import *

In [2]:
Path.BASE_PATH = path = Path("/notebooks/tomatodiagnosis/data")

## Exploratory Data Analysis

In [3]:
path.ls()

(#12) [Path('Tomato_Yellow_Leaf_Curl_Virus'),Path('healthy'),Path('Late_blight'),Path('Septoria_leaf_spot'),Path('Leaf_Mold'),Path('Spider_mites'),Path('Tomato_mosaic_virus'),Path('models'),Path('.ipynb_checkpoints'),Path('Early_blight')...]

We have 10 classes in our dataset, 9 of these are the tomato diseases whereas the 10th class labels the tomato leaves that are healthy.

In [None]:
import os
subdir_counts = {}
for root, dirs, files in os.walk(path):
    num_files = len(files)
    if root != path and num_files > 0:
        subdir = os.path.basename(root)
        subdir_counts[subdir] = num_files
    
for subdir, count in subdir_counts.items():
    print(f"{subdir}: {count} files")

In [None]:
df = pd.DataFrame.from_dict(subdir_counts, orient="index", columns=["File Count"])
df.plot(kind="bar")

plt.title("Image Counts by class")
plt.xlabel("Disease")
plt.ylabel("Image Count")

plt.show()

The count plot above tells us the distribution of the classes i.e (`Diseases`). It is evident that some of the classes in the dataset are not well represented which implies that the model will struggle to learn how to predict 

More Data for the classes that aren't well represented can be collected to match the well represented class. Another idea could be trying out trying out some statistical technicals like over sampling or undersampling.

In [None]:
files = get_image_files(path)

In [None]:
img = PILImage.create(files[0])
print(img.size)
img.to_thumb(128)

In [None]:
from fastcore.parallel import *

def f(o): return PILImage.create(o).size
sizes = parallel(f, files, n_workers=8)
pd.Series(sizes).value_counts()

There is a total of 18,160 images with the same size of 256 x 256 image pixels. This image size is small and this can imply that our model might struggle while predicting new images that are of a larger size.

In [None]:
dls = ImageDataLoaders.from_folder(path, valid_pct=0.2, seed=42,
    batch_tfms=aug_transforms(size=128, min_scale=0.75))

dls.show_batch(max_n=6)

In [None]:
learn = vision_learner(dls, 'resnet26d', metrics=error_rate, path='.').to_fp16()

In [None]:
learn.lr_find(suggest_funcs=(valley, slide))

In [None]:
learn.fine_tune(3, 0.01)

In [None]:
0.01187269376196545