<a href="https://colab.research.google.com/github/AdamCorbinFAUPhD/CIRCLe-experiments/blob/main/vgg16/2023_01_04/CIRCLe_with_isic2018_with_skin_transformer_vgg16_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

This notebook is used to modify the implementation of CIRCLe from this paper : [CIRCLe: Color Invariant Representation
Learning for Unbiased Classification of Skin
Lesions](https://arxiv.org/pdf/2208.13528.pdf)

Their github repo is : https://github.com/arezou-pakzad/CIRCLe

This paper uses the Fitzpatrick17k dataset which can be obtained here: https://github.com/mattgroh/fitzpatrick17k

For these set of experiments we will use the ISIC 2017 dataset from: https://github.com/manideep2510/melanoma_segmentation.git 

#TODO list

1. [X] Download 2018 dataset
1. [X] Analize dataset to get Fitzpatrick info. 
1. [X] Save off Fitzpatrick info data so we dont have to do it every time
1. [X] load cached fitzpatrick data
1. [X] Create masks uing https://github.com/DebeshJha/2020-CBMS-DoubleU-Net Because Task 3 for 2018 doesnt havent masks. Trick was to get the higher end GPU and ram (12/29/2022)
1. [X] Create pytorch dataloader for ISIC 2018 dataset including loading masks, images, diagnossis, fitzpatrick type for training (12/30/2022) needed to create custom split function
1. [X] Create dataloaders for test and validation  (12/30/2022)
1. [X] Added jupiter notebook download code into the github repo (1/1/2023)
1. [X] plug in dataloader into CIRCLe main file (1/1/2023)
1. [X] Figure out how to transform image and mask the same from the dataloader (1/2/2023)
1. [X] Use the new dataloader to train the model (1/2/2023)
1. [X] Use new transformer for CIRCLe model (1/3/2023)
1. [ ] test using different base models
1. [ ] test that adding dropout might help with overfitting
1. [ ] Add more metrics such as precision and recall
1. [ ] add fairness metrics
1. [ ] add confusion matrics
1. [ ] add sensitivity and specificity
1. [ ] add metrics for each class
1. [ ] (optional) Go back and download and use larger datasets
1. [ ] (optional) Run Fitzpatrick on larger datasets(currently using the test set from isic 2018 task 3)
1. [ ] The dataloaders need to be split stratified different than the current "training, validation, and test" as given from https://challenge.isic-archive.com/data/#2018 based on skin types. 12/30/2022 - I think this is done BUT we might consider doing k-fold approach which adds another layer of complexity to the dataloaders

# Set up the environment

In [1]:
!python --version
DATASET_USED = "ISIC_2018"  # ISIC_2017_ORIG, ISIC_2018

Python 3.8.16


## Installs & imports

In [2]:
import zipfile
import torch

from enum import Enum
import io
import math
from pathlib import Path
import random

import numpy as np
import seaborn as sns, matplotlib.pyplot as plt


import skimage
from skimage import color, util

import PIL
from PIL import ImageStat
from PIL import Image
from PIL import ImageOps

import cv2

import pandas as pd

import glob
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import CustomObjectScope
from tensorflow.keras.models import load_model

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.applications import *
from tensorflow.keras.callbacks import *
from tensorflow.keras.optimizers import Adam, Nadam
from tensorflow.keras.metrics import *

from sklearn.model_selection import train_test_split

from torchvision import transforms

## Download latest code

In [3]:
!git clone https://github.com/acorbin3/CIRCLe.git

fatal: destination path 'CIRCLe' already exists and is not an empty directory.


In [4]:
%cd ./CIRCLe

/content/CIRCLe


In [5]:
!git checkout -- models/circle.py

In [6]:
!git pull

Already up to date.


In [7]:
!pip3 install -r ./requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch==1.12.1
  Using cached torch-1.12.1-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)
Collecting torchvision==0.13.1
  Using cached torchvision-0.13.1-cp38-cp38-manylinux1_x86_64.whl (19.1 MB)
Installing collected packages: torch, torchvision
  Attempting uninstall: torch
    Found existing installation: torch 1.9.0+cu111
    Uninstalling torch-1.9.0+cu111:
      Successfully uninstalled torch-1.9.0+cu111
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.10.0+cu111
    Uninstalling torchvision-0.10.0+cu111:
      Successfully uninstalled torchvision-0.10.0+cu111
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.14.0 requires torch==1.13.0, but you have torch 1.12.1 which is incompatible.
torchaudio 0.9.0 re

**This next block of code will be needed if you get this error: **

A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.

In [8]:
#!pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

# IF ERROR, RESTART RUNTIME due to derm-ita lib
This is due to derm-ita using newer libaries than the Google Colab default(during this time of 12/24/2022)

# Train CIRCLe model 

In [9]:
%mkdir ./saved
%mkdir ./saved/model

mkdir: cannot create directory ‘./saved’: File exists
mkdir: cannot create directory ‘./saved/model’: File exists


In [10]:
!git pull

Already up to date.


In [11]:
!python main.py --use_reg_loss True --base vgg16 --dataset isic2018

Flags:
	alpha: 0.1
	base: vgg16
	batch_size: 32
	data_dir: ../data/fitz17k/images/all/
	dataset: isic2018
	epochs: 100
	gan_path: saved/stargan/
	hidden_dim: 256
	lr: 0.001
	model: circle
	model_save_dir: saved/model/
	num_classes: 7
	seed: 1
	use_reg_loss: True
	weight_decay: 0.001
Downloading isic2018 images
tcmalloc: large alloc 2771738624 bytes == 0xcfb54000 @  0x7fd4897c01e7 0x4d30a0 0x5dede2 0x6758aa 0x4f750a 0x4997a2 0x4f700d 0x4d4aa9 0x55e029 0x55cd91 0x5d8941 0x4fe318 0x5d8416 0x55f797 0x55cd91 0x5d8941 0x4fe318 0x5d8416 0x55f797 0x55cd91 0x5d8941 0x5d8416 0x55f797 0x55cd91 0x5d8941 0x4997c7 0x55cd91 0x5d8941 0x4990ca 0x5d8868 0x4990ca
Downloading isic2018 images. Complete!
Downloading isic2018 masks
Downloading isic2018 masks. Complete!
Resizing masks
Resizing masks. Complete!
Donloading isic 2018 ground truth classification data
Creating dataframe
	 Looking for cached dataframe
		 organize_data/isic_2018/saved_data_2022_12_27_isic_2018.csv
Creating dataframe. Complete!
Split

In [16]:
#%mkdir /content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints

mkdir: cannot create directory ‘/content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints’: No such file or directory


In [12]:
#%cp ./saved/model/epoch97_acc_0.762.ckpt /content/drive/MyDrive/Corbin_Adam_PhD_Workspace/corbin_papers/dissertation_proposal/model_checkpoints/CIRCLE/mobilenetv3l/

cp: cannot stat './saved/model/epoch97_acc_0.762.ckpt': No such file or directory
