<a href="https://www.kaggle.com/code/robtheoceanographer/fastclouds?scriptVersionId=95230916" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# FastClouds
This is my project for the first week fo the fast.ai course v5. This is not meant to be a serious project and os just for my own learning experience.

I have not updated this notebook to include steps from lesson2 on data cleaning and data augmentation.

### The Problem
On ground observations are a key part to weather forecasting. Most observations are taken by autonomous systems but there are still a few routine observations that are done manually by a human. One of these is cloud type classification.

This manual observation is currently done is at major airports around Australia. At these airports one or more highly knowledgeable and accredited aerodrome weather observers is stationed to take manual weather observations on a fixed schedule throughout each day. But, having such specialized observers at all airports all of the time is not cost effective or realistically feasible, especially for remote locations (e.g. uninhabited islands or infrequently used aerodromes). Therefore many of these remote or small areas miss out on observations and perhaps receive lower quality situational awareness and forecasts as a result.


### The Solution
Using deep learning and image classification techniques to classifying cloud types from photographs seemed to me a very plausible solution to this problem. Therefore, after the Fastai course v5 lecture 1 I thought I'd try to do exactly that using a visual learner example Jeremy provided as my starting point.

This algorithm uses a resnet and transfer learning as per the original notebook - [is-it-a-bird] -(https://www.kaggle.com/code/jhoward/is-it-a-bird-creating-a-model-from-your-own-data) but it uses three broad categories of clouds instead of just birds vs forests. These classes were chosen as per the work of Luke Howard in "Essay of the Modifications of Clouds" (1803) (https://www.weather.gov/jetstream/corefour).

In order to create a data set duckduckgo was searched for the terms: cirrus clouds, cumulus clouds, stratus clouds

In [None]:
# import shutil
# shutil.rmtree('./cloud_types/')

In [None]:
!pip install --user -qq torch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 torchtext==0.10.0
!pip install --user -qq fastai
!pip install --user -qq timm

In [None]:
from fastcore.all import *
import time
from fastdownload import download_url
from fastai.vision.all import *
from fastai.vision.widgets import *
import pathlib
import timm

In [None]:
def search_duckduckgo(term, max_images=200):
    """
    Search duckduckgo images for the term
    """
    url = 'https://duckduckgo.com/'
    res = urlread(url, data={'q':term})
    searchObj = re.search(r'vqd=([\d-]+)\&', res)
    requestUrl = url + 'i.js'
    params = dict(l='us-en', o='json', q=term, vqd=searchObj.group(1), f=',,,', p='1', v7exp='a')
    urls, data = set(),{'next':1}
    while len(urls) < max_images and 'next' in data:
        data = urljson(requestUrl, data = params)
        urls.update(L(data['results']).itemgot('image'))
        requestUrl = url + data['next']
        time.sleep(0.2)
    return L(urls)[:max_images]

## Lets pull a sample image to see if our search and download code is making sense:

In [None]:
urls = search_duckduckgo('cirrus clouds', max_images = 1)
print(urls[0])
dest = 'cirrus.jpg'
download_url(urls[0], dest, show_progress=False)
im = Image.open(dest)
im.to_thumb(256,256)

## Download our dataset
Here we download 50 images per search term and put them in to labelled folders ready for modelling.

In [None]:
searches = 'cirrus', 'stratus', 'cumulus'
path = Path('cloud_types')

for o in searches:
    dest = (path/o)
    dest.mkdir(exist_ok=True, parents=True)
    download_images(dest, urls=search_duckduckgo(f'{o} clouds', max_images=50))
    resize_images(path/o, max_size=400, dest=path/o)

In [None]:
# Remove any failed images
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)

## Set up and train a model

This is where we create a fastai DataBlock with some basic configs largely the same as Jeremy's original setup.

In [None]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct = 0.4, seed = 42),
    get_y = parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path)

### Here is an example batch of images

In [None]:
dls.show_batch(max_n=6)

### Train the model

In [None]:
learn = vision_learner(dls, resnet18, metrics=[error_rate, accuracy])
# resnet18, resnet34, resnet50, resnet101, resnet152
# densenet121, densenet169, densenet201, densenet161
learn.fine_tune(10)

## Data Cleaning

The results seem good for not changing much from the example image classifier but perhaps we can implement a few small adjustments to improve our model performance.

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Looks like our model is confusing stratus for cirrus and also cumulus for stratus but does ok on cirrus. This is most likely a garbage in garbage out problem - e.g. the data is not good and is poorly labelled so we will spend a little bit of time reviewing our input data now.

In [None]:
interp.plot_top_losses(8, nrows=2)

Looks like there is some work to be done cleanign up the dataset...

In [None]:
cleaner = ImageClassifierCleaner(learn)
cleaner

In [None]:
for idx in cleaner.delete(): cleaner.fns[idx].unlink()
for idx,cloud in cleaner.change(): shutil.move(str(cleaner.fns[idx]), str(path/cloud/pathlib.Path(cleaner.fns[idx]).stem) + '_' + str(pathlib.Path(cleaner.fns[idx]).suffix))
# for idx,cloud in cleaner.change(): shutil.unlink(str(cleaner.fns[idx]))

In [None]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct = 0.4, seed = 42),
    get_y = parent_label,
    item_tfms=[Resize(192, method='squish')]
).dataloaders(path)

learn = vision_learner(dls, resnet18, metrics=[error_rate, accuracy])
learn.fine_tune(10)

## Data augmentation

In [None]:
dls = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct = 0.4, seed = 42),
    get_y = parent_label,
    item_tfms=[Resize(192, method='squish')],
    batch_tfms=aug_transforms()
).dataloaders(path)

dls.show_batch(max_n=6)

In [1]:
learn = vision_learner(dls, resnet18, metrics=[error_rate, accuracy])
learn.fine_tune(25)

NameError: name 'vision_learner' is not defined

# Use the model on our example image

In [None]:
def predict_cloud(image_file):
    is_cloud, _, prob = learn.predict(PILImage.create(image_file))
    print(f'This is a: {is_cloud}')
    print(f'Probability it is: {prob[int(_)]:.4f}')

#### use the model on a cirus cloud image:

In [None]:
predict_cloud('cirrus.jpg')

## Conculsion
While this is very basic demo and only about 80% accurate, it shows great potential for automating some of the cloud type observing in remote regions of Australia where there is currnetly limited or no human observer coverage, and therefore this work might benifit from future work.