## What's your drawing style?


This notebook will try to aim your drawing style based on the paintings of 50 famous artists.  
You can find the original database here:
[https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time](https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time).

The web-app developed as homework is here: [https://huggingface.co/spaces/nerusskyhigh/drawingstyle](https://huggingface.co/spaces/nerusskyhigh/drawingstyle).

In [1]:
# Make sure we've got the latest version of fastai:
!pip install -Uqq fastai

# Check data
!ls -al ../input/best-artworks-of-all-time/resized/resized | tail -10

As I don't want to tamper the original data, I create a local copy in the folder **./data**.

In [2]:
!mkdir data 
!cp -n ../input/best-artworks-of-all-time/resized/resized/* data/

The original dataset consists of more than 8000 images **not** uniformly distributed between the authors. As an example, the database contains 877 Vincent van Gogh's paintings but only 24 Jackson Pollock's. To both save computing resources and avoid to bias the model, I'll just use up to 100 paintings per artist.

In [3]:
# Enter the folder --> list all files --> grep over the one with number >=100 using regexs
#     --> run "rm" on each one of them --> go back to the original folder
!cd data && ls | grep -P "_\d\d\d.jpg" | xargs -d"\n" rm && cd ..

The **best-artworks-of-all-time/resized/resized** folder distinguish classes based on the names. The following funcion returns the class starting from the file's name.

In [4]:
import string

def getClassName(fileName):
    fileName = fileName[:-4] # remove extension .jpg
    names = fileName.split('_')
    
    if(names[0] == "Albrecht"):
        return "Albrecht Dürer"
    
    artist = ''.join(name+" " for name in names if name[0] in string.ascii_letters)
    
    return artist[:-1] #Remove last space

In [5]:
from fastai.vision.all import *

dls = ImageDataLoaders.from_name_func('.',
    get_image_files('data'), valid_pct=0.2, #seed=42,
    label_func=getClassName,
    item_tfms=RandomResizedCrop(256, min_scale=0.2),
    batch_tfms=aug_transforms())

dls.show_batch(max_n=8, nrows=2)
dls.show_batch(max_n=4, nrows=1, unique=True)
dls.vocab

In [6]:
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

## Analysis

I think this model shows the importance of checking that the input data is what you expect. Take the example of the Gioconda (Leonardo_da_Vinci_121.jpg). Despite being painted by Leonardo da Vinci, the network fails in recognizing it. That's because most of the data related to Leonardo Da Vinci are drawings and not paintings. Probably the network made the association "Leonardo <--> drawings" and seeing the Gioconda as a painting, it tries to assign it to a different artist. 

![Screenshot (32).png](https://huggingface.co/spaces/nerusskyhigh/drawingstyle/resolve/main/Screenshot%20(32).png)

A similar problem happens with Picasso. On the other side, Van Gogh, the one with the biggest number of paintings available, doesn't seem to suffer from this problem.

In [7]:
!cp ../input/best-artworks-of-all-time/resized/resized/Leonardo_da_Vinci_138.jpg Leonardo_da_Vinci_138.jpg
!cp ../input/best-artworks-of-all-time/resized/resized/Leonardo_da_Vinci_121.jpg Leonardo_da_Vinci_121.jpg
!cp ../input/best-artworks-of-all-time/resized/resized/Pablo_Picasso_164.jpg Pablo_Picasso_164.jpg
!cp ../input/best-artworks-of-all-time/resized/resized/Vincent_van_Gogh_154.jpg Vincent_van_Gogh_154.jpg

artist, _, probs = learn.predict(PILImage.create('Leonardo_da_Vinci_138.jpg'))
print(f"Your style resembles {artist}\'s one.\n\n")
Image.open('Leonardo_da_Vinci_138.jpg').to_thumb(256,256)

In [8]:
artist, _, probs = learn.predict(PILImage.create('Leonardo_da_Vinci_121.jpg'))
print(f"Your style resembles {artist}\'s one.\n\n")
Image.open('Leonardo_da_Vinci_121.jpg').to_thumb(256,256)

In [9]:
artist, _, probs = learn.predict(PILImage.create('Vincent_van_Gogh_154.jpg'))
print(f"Your style resembles {artist}\'s one.\n\n")
Image.open('Vincent_van_Gogh_154.jpg').to_thumb(256,256)

In [10]:
artist, _, probs = learn.predict(PILImage.create('Pablo_Picasso_164.jpg'))
print(f"Your style resembles {artist}\'s one.\n\n")
Image.open('Pablo_Picasso_164.jpg').to_thumb(256,256)

Now let's analyse the model more in details

### Confusion Matrix


In [11]:
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(10, 10), dpi=300)

### Top losses

In [12]:
# predicted/expected/loss/confidence of prediction
interp.plot_top_losses(9, nrows=3, figsize=(15, 10))

### Final export

In [13]:
# Export the model to use it on https://huggingface.co/spaces/nerusskyhigh/drawingstyle
learn.export('drawingstyle.pkl')