# Metadata loading & Analysis

Before we start looking at the photos, let's gather a few datasets and evaluate the demographic metadata that we have.  For the purposes of this project, we're going to focus on age, gender, and the location of the skin lesion.
We'll focus on the following datasets
- [BCN 20000](https://www.nature.com/articles/s41597-024-03387-w)
- [HAM10000](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T)
- [ISIC 2024](https://challenge2024.isic-archive.com/)
- [Hospital Italiano de Buenos Aires Skin Lesions](https://www.nature.com/articles/s41597-023-02630-0)

To install the isic Datasets we will use their CLI


In [None]:
!pip install isic-cli

# Dataset Downloads

International Skin Imaging Collaboration (ISIC) archive is a massive resource for images and metadata for our project.  Let's take a quick look at the available data.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [None]:
!isic collection list

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\BCN\MetaData', exist_ok=True)
BCN_id = 249
!isic metadata download -c {BCN_id} -o "E:\Capstone Skin Cancer Project\Datasets\BCN\MetaData\BCN_Metadata.csv"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\BCN\Image', exist_ok=True)
!isic image download --collections {BCN_id} "E:\Capstone Skin Cancer Project\Datasets\BCN\Image"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\HAM\MetaData', exist_ok=True)
HAM_id = 212
!isic metadata download -c {HAM_id} -o "E:\Capstone Skin Cancer Project\Datasets\HAM\MetaData\HAM_Metadata.csv"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\HAM\Image', exist_ok=True)
!isic image download --collections {HAM_id} "E:\Capstone Skin Cancer Project\Datasets\HAM\Image"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\BuenosAires\MetaData', exist_ok=True)
BA_id = 390
!isic metadata download -c {BA_id} -o "E:\Capstone Skin Cancer Project\Datasets\BuenosAires\MetaData\BA_Metadata.csv"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\BuenosAires\Image', exist_ok=True)
!isic image download --collections {BA_id} "E:\Capstone Skin Cancer Project\Datasets\BuenosAires\Image"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\Braff\MetaData', exist_ok=True)
Braff_id = 410
!isic metadata download -c {Braff_id} -o "E:\Capstone Skin Cancer Project\Datasets\Braff\MetaData\Braff_Metadata.csv"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\Braff\Image', exist_ok=True)
!isic image download --collections {Braff_id} "E:\Capstone Skin Cancer Project\Datasets\Braff\Image"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\Melo\MetaData', exist_ok=True)
melo_id = 294
!isic metadata download -c {melo_id} -o "E:\Capstone Skin Cancer Project\Datasets\Melo\MetaData\Melo_Metadata.csv"

In [None]:
os.makedirs(r'E:\Capstone Skin Cancer Project\Datasets\Melo\Image', exist_ok=True)
!isic image download --collections {melo_id} "E:\Capstone Skin Cancer Project\Datasets\Melo\Image"

Lets
take
a
look
at
the
data
columns
that
we
currently
have, then
clean
the
data
up
so
we
can
keep
the
items
that
we
will
be
looking
for to see if there's any correlation between the data points and cancer.