# How to:

Download images from ISIC using the API in three steps.

### First Step:

Import cascid module's automated download functions.

In [1]:
# Import isic downloading tools
from cascid.datasets.isic import database, fetcher

2022-11-10 07:13:18.841563: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-10 07:13:18.978760: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-10 07:13:18.978803: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-10 07:13:19.011857: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-10 07:13:19.812332: W tensorflow/stream_executor/platform/de

### Second Step:

Ask fetcher to gather the metadata for however many images you want from each class. The 'fetch_from_isic' function returns a list of objects, which contain the metadata. This list is promptly passed on to the 'save_metadata' function, which interprets these objects, and saves a dataframe to a .csv file. This operation is repeatable, and will never overwrite previously collected data. As such, you have the option of running the cell below multiple times, without fear of loose metadata files, or losing access to already downloaded images. 

In [2]:
images = fetcher.fetch_from_isic(
    n_samples=200, # Number of samples for each label in 'diagnosis_list' 
    diagnosis_list=[
        "melanoma",
        "nevus",
        '"basal cell carcinoma"',
        '"seborrheic keratosis"',
        '"actinic keratosis"',
        '"squamous cell carcinoma"'
    ]
)
fetcher.save_metadata(image_list=images)

Fetching 200 images from ISIC dataset for each of ['melanoma', 'nevus', '"basal cell carcinoma"', '"seborrheic keratosis"', '"actinic keratosis"', '"squamous cell carcinoma"'] diagnosis
                                                                                                    
Done!


The 'database.get_df' function, reads from this csv automatically.

In [3]:
df = database.get_df()
df

Unnamed: 0,isic_id,sex,diagnostic,age_approx,image_url,img_id
0,ISIC_1162337,male,MEL,45,https://content.isic-archive.com/a82fc918-76c7...,ISIC_1162337.jpg
1,ISIC_3909039,male,MEL,50,https://content.isic-archive.com/62d65089-0855...,ISIC_3909039.jpg
2,ISIC_6695831,female,MEL,65,https://content.isic-archive.com/a213f3c1-95e1...,ISIC_6695831.jpg
3,ISIC_2141237,female,MEL,85,https://content.isic-archive.com/7d9c8155-ab69...,ISIC_2141237.jpg
4,ISIC_8252406,male,MEL,85,https://content.isic-archive.com/f7205ec8-20c7...,ISIC_8252406.jpg
...,...,...,...,...,...,...
12067,ISIC_0064985,male,SCC,60,https://content.isic-archive.com/a62e48a3-e6c9...,ISIC_0064985.jpg
12068,ISIC_0064977,male,SCC,80,https://content.isic-archive.com/7db06c7e-7236...,ISIC_0064977.jpg
12069,ISIC_0064878,female,SCC,60,https://content.isic-archive.com/9bc8ec2e-12e1...,ISIC_0064878.jpg
12070,ISIC_0064760,female,SCC,70,https://content.isic-archive.com/9dec418e-59d5...,ISIC_0064760.jpg


### Third Step:

So far, we have only gathered the metadata for the images, but not downloaded any of the images yet.
Supply the read dataframe to the 'update_all_files' function. This function can detect missing images in the storage directory, and download them automatically based on the metadata available. Since the download is very slow on the network IO side, this is done using multiple threads, for a big performance boost. It can still be rather slow, as some images are very large. This can also produce some timeout errors if your network is particularly slow, but simply rerunning the cell will resume progress from where it left off, and continue downloading.  

In [4]:
database.update_all_files(df)

Beginning image downloads...
Done


And that's it! Now you have as many images as you want from the ISIC dataset. You may notice, you asked for 'x' images of each diagnosis, but only got 'y'. ISIC is not endless, and some diagnosis have more images than others. If one diagnosis has no more images, no more images will be downloaded. The program will still attempt to reach the goal for each diagnosis separately though, so if you ask for 5000, and there are 5000, you should get pretty close to 5000 (some images lack basic metadata, such as age or gender, and as such are not downloaded at all, bringing the total down by a few images on occasion).

### Preprocessing

Cascid has some preprocessing built-in. In order to use it, you may want to examine the specifics of each dataset, or simply use 'datasets.pipeline.preprocessing' to preprocess all your images.

In [2]:
from cascid.datasets.pipeline.preprocessing import preprocess_dataset
preprocess_dataset('isic', 'all', image_shape=(512,512)) # Apply all available forms of preprocessing to isic, and save images in 512x512 RGB resolution.

Applying hairless preprocessing to isic dataset, this may take a few minutes, but caching is done automatically, so the next time it should be much faster.
Beginning transformations, this may take a while...
Finished transformations after 0h00min0.33s
Applying hairless_quantized preprocessing to isic dataset, this may take a few minutes, but caching is done automatically, so the next time it should be much faster.
Beginning transformations, this may take a while...
Finished transformations after 0h00min0.35s
