<a href="https://colab.research.google.com/github/AdamCorbinFAUPhD/CIRCLe-experiments/blob/main/CIRCLe_with_isic2018_with_skin_transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

This notebook is used to modify the implementation of CIRCLe from this paper : [CIRCLe: Color Invariant Representation
Learning for Unbiased Classification of Skin
Lesions](https://arxiv.org/pdf/2208.13528.pdf)

Their github repo is : https://github.com/arezou-pakzad/CIRCLe

This paper uses the Fitzpatrick17k dataset which can be obtained here: https://github.com/mattgroh/fitzpatrick17k

For these set of experiments we will use the ISIC 2017 dataset from: https://github.com/manideep2510/melanoma_segmentation.git 

#TODO list

1. [X] Download 2018 dataset
1. [X] Analize dataset to get Fitzpatrick info. 
1. [X] Save off Fitzpatrick info data so we dont have to do it every time
1. [X] load cached fitzpatrick data
1. [X] Create masks uing https://github.com/DebeshJha/2020-CBMS-DoubleU-Net Because Task 3 for 2018 doesnt havent masks. Trick was to get the higher end GPU and ram (12/29/2022)
1. [X] Create pytorch dataloader for ISIC 2018 dataset including loading masks, images, diagnossis, fitzpatrick type for training (12/30/2022) needed to create custom split function
1. [X] Create dataloaders for test and validation  (12/30/2022)
1. [X] Added jupiter notebook download code into the github repo (1/1/2023)
1. [X] plug in dataloader into CIRCLe main file (1/1/2023)
1. [X] Figure out how to transform image and mask the same from the dataloader (1/2/2023)
1. [X] Use the new dataloader to train the model (1/2/2023)
1. [X] Use new transformer for CIRCLe model (1/3/2023)
1. [ ] test using different base models
1. [ ] test that adding dropout might help with overfitting
1. [ ] Add more metrics such as precision and recall
1. [ ] add fairness metrics
1. [ ] add confusion matrics
1. [ ] add sensitivity and specificity
1. [ ] add metrics for each class
1. [ ] (optional) Go back and download and use larger datasets
1. [ ] (optional) Run Fitzpatrick on larger datasets(currently using the test set from isic 2018 task 3)
1. [ ] The dataloaders need to be split stratified different than the current "training, validation, and test" as given from https://challenge.isic-archive.com/data/#2018 based on skin types. 12/30/2022 - I think this is done BUT we might consider doing k-fold approach which adds another layer of complexity to the dataloaders

# Set up the environment

In [1]:
!python --version
DATASET_USED = "ISIC_2018"  # ISIC_2017_ORIG, ISIC_2018

Python 3.8.16


## Installs & imports

In [2]:
import zipfile
import torch

from enum import Enum
import io
import math
from pathlib import Path
import random

import numpy as np
import seaborn as sns, matplotlib.pyplot as plt


import skimage
from skimage import color, util

import PIL
from PIL import ImageStat
from PIL import Image
from PIL import ImageOps

import cv2

import pandas as pd

import glob
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import CustomObjectScope
from tensorflow.keras.models import load_model

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.applications import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.optimizers import Adam, Nadam
from tensorflow.keras.metrics import *

from sklearn.model_selection import train_test_split

from torchvision import transforms

## Download latest code

In [3]:
!git clone https://github.com/acorbin3/CIRCLe.git

fatal: destination path 'CIRCLe' already exists and is not an empty directory.


In [4]:
%cd ./CIRCLe

/content/CIRCLe


In [5]:
!git checkout -- models/circle.py

In [6]:
!git pull

Already up to date.


In [7]:
!pip3 install -r ./requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# IF ERROR, RESTART RUNTIME due to derm-ita lib
This is due to derm-ita using newer libaries than the Google Colab default(during this time of 12/24/2022)

# Train CIRCLe model 

In [17]:
%mkdir ./saved
%mkdir ./saved/model

In [53]:
!git checkout -- ./models/circle.py

In [59]:
!git checkout -- main.py

In [60]:
!git pull

remote: Enumerating objects: 11, done.[K
remote: Counting objects:   9% (1/11)[Kremote: Counting objects:  18% (2/11)[Kremote: Counting objects:  27% (3/11)[Kremote: Counting objects:  36% (4/11)[Kremote: Counting objects:  45% (5/11)[Kremote: Counting objects:  54% (6/11)[Kremote: Counting objects:  63% (7/11)[Kremote: Counting objects:  72% (8/11)[Kremote: Counting objects:  81% (9/11)[Kremote: Counting objects:  90% (10/11)[Kremote: Counting objects: 100% (11/11)[Kremote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects:  33% (1/3)[Kremote: Compressing objects:  66% (2/3)[Kremote: Compressing objects: 100% (3/3)[Kremote: Compressing objects: 100% (3/3), done.[K
remote: Total 7 (delta 4), reused 7 (delta 4), pack-reused 0[K
Unpacking objects:  14% (1/7)   Unpacking objects:  28% (2/7)   Unpacking objects:  42% (3/7)   Unpacking objects:  57% (4/7)   Unpacking objects:  71% (5/7)   Unpacking objects:  85% (6/7)   Unpacking objec

In [None]:
!python main.py --use_reg_loss True --base mobilenetv3l --dataset isic2018

Flags:
	alpha: 0.1
	base: mobilenetv3l
	batch_size: 32
	data_dir: ../data/fitz17k/images/all/
	dataset: isic2018
	epochs: 100
	gan_path: saved/stargan/
	hidden_dim: 256
	lr: 0.001
	model: circle
	model_save_dir: saved/model/
	num_classes: 7
	seed: 1
	use_reg_loss: True
	weight_decay: 0.001
isic2018 images already downloaded
isic 2018 masks already downladed
Donloading isic 2018 ground truth classification data
Creating dataframe
	 Looking for cached dataframe
		 organize_data/isic_2018/saved_data_2022_12_27_isic_2018.csv
Creating dataframe. Complete!
Splitting up the dataset into train,test, validation datasets
fizpatrick_skin_type: 1 8001
	 train 6400
	 test 800
	 val 801
fizpatrick_skin_type: 2 1049
	 train 839
	 test 105
	 val 105
fizpatrick_skin_type: 3 513
	 train 410
	 test 51
	 val 52
fizpatrick_skin_type: 4 182
	 train 145
	 test 18
	 val 19
fizpatrick_skin_type: 5 107
	 train 85
	 test 11
	 val 11
fizpatrick_skin_type: 6 163
	 train 130
	 test 16
	 val 17
total_train: 8009 79.

In [37]:
%cp ./saved/model/epoch97_acc_0.762.ckpt /content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints/CIRCLE/mobilenetv3l/

In [None]:
!wget https://isic2018task3masks.s3.amazonaws.com/isic_2018_mask_results1_2022_12_29.zip

--2023-01-01 09:12:25--  https://isic2018task3masks.s3.amazonaws.com/isic_2018_mask_results1_2022_12_29.zip
Resolving isic2018task3masks.s3.amazonaws.com (isic2018task3masks.s3.amazonaws.com)... 52.216.136.196, 52.216.187.35, 52.217.38.228, ...
Connecting to isic2018task3masks.s3.amazonaws.com (isic2018task3masks.s3.amazonaws.com)|52.216.136.196|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-01-01 09:12:25 ERROR 403: Forbidden.



In [None]:
!unzip temp -d ISIC_2018 > /dev/null.

  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of temp or
        temp.zip, and cannot find temp.ZIP, period.


In [None]:
print(torch.cuda.device_count())

1


In [None]:
%mkdir /content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints

In [None]:
%cp ./saved/model/*.ckpt /content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints