In [8]:
#NB: Kaggle requires phone verification to use the internet or a GPU. If you haven't done that yet, the cell below will fail
#    This code is only here to check that your internet is enabled. It doesn't do anything else.
#    Here's a help thread on getting your phone number verified: https://www.kaggle.com/product-feedback/135367

import socket,warnings
try:
    socket.setdefaulttimeout(1)
    socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('1.1.1.1', 53))
except socket.error as ex: raise Exception("STOP: No internet. Click '>|' in top right and set 'Internet' switch to on")

In [9]:
# It's a good idea to ensure you're running the latest version of any libraries you need.
# `!pip install -Uqq <libraries>` upgrades to the latest version of <libraries>
# NB: You can safely ignore any warnings or errors pip spits out about running as root or incompatibilities
import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    !pip install -Uqq fastai duckduckgo_search



The basic steps we'll take are:

1. Use DuckDuckGo to search for images of "acne vulgaris photos"
1. Use DuckDuckGo to search for images of "Eczema photos",acrochordons,verruca,Herpes Zoster etc
1. Fine-tune a pretrained neural network to recognise those groups
1. Try running this model on a picture  see if it works.

## Step 1: Download images of skin diseases

In [10]:
from duckduckgo_search import ddg_images
from fastcore.all import *

def search_images(term, max_images=200):
    print(f"Searching for '{term}'")
    return L(ddg_images(term, max_results=max_images)).itemgot('image')

Let's start by searching for a bird photo and seeing what kind of result we get. We'll start by getting URLs from a search:

In [11]:
#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.
#    If you get a JSON error, just try running it again (it may take a couple of tries).
urls = search_images('acne vulgaris photo', max_images=1)
urls[0]

...and then download a URL and take a look at it:

In [12]:
from fastdownload import download_url
dest = 'C0024056-Acne_vulgaris.jpg'
download_url(urls[0], dest, show_progress=False)

from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)

Now let's do the same with "Eczema photos":

Our searches seem to be giving reasonable results, so let's grab a few examples of each of "bird" and "forest" photos, and save each group of photos to a different folder (I'm also trying to grab a range of lighting conditions here):

In [88]:
searches = 'acne vulgaris','Eczema','acrochordons','verruca','Herpes Zoster','Urticaria','sunburn','Diaper Rash','Rosacea','Tinea Pedis','Basal Cell Carcinoma','Genital warts'
path = Path('whichskindis')
from time import sleep

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_images(f'{o} photo'))
    sleep(10)  # Pause between searches to avoid over-loading server
    download_images(dest, urls=search_images(f'{o} closeup photo'))
    sleep(10)
   
    resize_images(path/o, max_size=400, dest=path/o)

## Step 2: Train our model

Some photos might not download correctly which could cause our model training to fail, so we'll remove them:

In [138]:
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

To train a model, we'll need `DataLoaders`, which is an object that contains a *training set* (the images used to create a model) and a *validation set* (the images used to check the accuracy of a model -- not used during training). In `fastai` we can create that easily using a `DataBlock`, and view sample images from it:

In [139]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)

dls.show_batch(max_n=16)

Here what each of the `DataBlock` parameters means:

    blocks=(ImageBlock, CategoryBlock),

The inputs to our model are images, and the outputs are categories (in this case, "bird" or "forest").

    get_items=get_image_files, 

To find all the inputs to our model, run the `get_image_files` function (which returns a list of all image files in a path).

    splitter=RandomSplitter(valid_pct=0.2, seed=42),

Split the data into training and validation sets randomly, using 20% of the data for the validation set.

    get_y=parent_label,

The labels (`y` values) is the name of the `parent` of each file (i.e. the name of the folder they're in, which will be *bird* or *forest*).

    item_tfms=[Resize(192, method='squish')]

Before training, resize each image to 192x192 pixels by "squishing" it (as opposed to cropping it).

Now we're ready to train our model. The fastest widely used computer vision model is `resnet18`. We can train this in a few minutes, even on a CPU! (On a GPU, it generally takes under 10 seconds...)

`fastai` comes with a helpful `fine_tune()` method which automatically uses best practices for fine tuning a pre-trained model, so we'll use that.

In [140]:
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

## Step 3: Use our model 

Let's see what our model thinks about that skin diseases we download:

Download some images

In [156]:
urls = search_images('sunburn photo', max_images=20) 
urls[19]

In [157]:
dest = 'sunburn-symptoms-and-treatment.jpg'
download_url(urls[19], dest, show_progress=False)

from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)

In [158]:
the_dis,o,probs = learn.predict(PILImage.create('sunburn-symptoms-and-treatment.jpg'))
print(f"This is a: {the_dis}.")
print(f"This is a: {o}")
print(f"Probability it's a Eczema : {probs[4]:.4f}")
print(f"Probability it's a Herpes Zoster  : {probs[8]:.4f}")
print(f"Probability it's a Urticaria  : {probs[14]:.4f}")
print(f"Probability it's a Genital warts : {probs[6]:.4f}")
print(f"Probability it's a acrochordons or skin tags : {probs[19]:.4f}")
print(f"Probability it's a Rosacea : {probs[10]:.4f}")
print(f"Probability it's a verruca or genital warts : {probs[6]:.4f}")
print(f"Probability it's a acne vulgaris : {probs[16]:.4f}")
print(f"Probability it's a sunburn : {probs[21]:.4f}")
print(f"Probability it's a diapar Rash : {probs[2]:.4f}")
print(f"Probability it's a Tinea Pedis or athletic feet : {probs[12]:.4f}")
print(f"Probability it's a Basal Cell Carcinoma: {probs[0]:.4f}")

Good job, resnet18. :)

So, as you see, in the space of a few years, creating computer vision classification models has gone from "so hard it's a joke" to "trivially easy and free"!
