# SkinMonitor


## 1.Dataset
The PROVe-AI dataset is a valuable resource for researchers focused on the development and validation of artificial intelligence algorithms for skin cancer diagnosis. It originates from a prospective, observational clinical validation study designed to assess the diagnostic accuracy of an AI algorithm, referred to as ADAE, in predicting melanoma from dermoscopy images of skin lesions.

Key aspects of the PROVe-AI dataset include:
- **Number of Images**: It contains 603 dermoscopy images.
- **Number of Patients**: The dataset covers 435 patients.
- **Number of Lesions**: Each of the 603 images corresponds to a unique lesion, all of which underwent biopsy.
- **Public Accessibility**: The dataset is publicly available, ensuring broad access for research purposes.
- **Licenses**: All images are released under the CC-0 license, allowing for unrestricted use in both academic and commercial projects.
- **DOI**: The dataset can be referenced using its Digital Object Identifier (DOI): [https://doi.org/10.34970/576276](https://doi.org/10.34970/576276).
- **URI**: [https://api.isic-archive.com/collections/218/?page=1](https://api.isic-archive.com/collections/218/?page=1)

This dataset is particularly significant as each lesion included in the study was biopsied, providing a reliable ground truth for training and testing AI models aimed at melanoma detection. The locked status of the dataset indicates controlled access, ensuring data integrity and consistency for all users.

In [2]:
import pandas as pd
metadata = pd.read_csv('.venv/dataset/metadata.csv')
print(metadata.columns)
metadata.head()

Index(['isic_id', 'attribution', 'copyright_license', 'acquisition_day',
       'age_approx', 'anatom_site_general', 'benign_malignant',
       'clin_size_long_diam_mm', 'concomitant_biopsy', 'dermoscopic_type',
       'diagnosis', 'diagnosis_confirm_type', 'family_hx_mm',
       'fitzpatrick_skin_type', 'image_type', 'lesion_id', 'mel_class',
       'mel_thick_mm', 'mel_ulcer', 'melanocytic', 'nevus_type', 'patient_id',
       'personal_hx_mm', 'sex'],
      dtype='object')


Unnamed: 0,isic_id,attribution,copyright_license,acquisition_day,age_approx,anatom_site_general,benign_malignant,clin_size_long_diam_mm,concomitant_biopsy,dermoscopic_type,...,image_type,lesion_id,mel_class,mel_thick_mm,mel_ulcer,melanocytic,nevus_type,patient_id,personal_hx_mm,sex
0,ISIC_0080539,Memorial Sloan Kettering Cancer Center,CC-0,1,50,head/neck,malignant,5.4,True,contact polarized,...,dermoscopic,IL_6342582,melanoma in situ,,,True,,IP_5440286,True,female
1,ISIC_0098024,Memorial Sloan Kettering Cancer Center,CC-0,1,65,upper extremity,benign,4.9,True,contact polarized,...,dermoscopic,IL_2556082,,,,False,,IP_0680784,False,male
2,ISIC_0131983,Memorial Sloan Kettering Cancer Center,CC-0,1,85,posterior torso,benign,4.0,True,contact polarized,...,dermoscopic,IL_5424222,,,,True,,IP_7498505,False,male
3,ISIC_0134155,Memorial Sloan Kettering Cancer Center,CC-0,1,35,head/neck,benign,7.3,True,contact polarized,...,dermoscopic,IL_2907315,,,,True,,IP_1924080,False,male
4,ISIC_0155910,Memorial Sloan Kettering Cancer Center,CC-0,1,75,head/neck,benign,2.7,True,contact polarized,...,dermoscopic,IL_2102236,,,,True,,IP_3792900,True,male


### 1.1 Data Preprocessing

Choosing specific columns from a metadata dataset for building a predictive model, especially in the context of medical image analysis for skin lesion diagnosis, involves selecting features that are most relevant to the condition being studied and that could significantly influence the accuracy of the predictions. Here’s why the selected columns are important:

1. **isic_id**: This unique identifier for each image is crucial for tracking and managing data, ensuring that analysis can be accurately correlated with specific images.

2. **age_approx**: Age is a significant risk factor for many types of skin conditions, including cancer. Older individuals may have a higher risk of certain types of skin lesions, making age an important variable in predictive modeling.

3. **anatom_site_general**: The location of the skin lesion can influence its diagnosis since some types of lesions are more common in specific areas of the body. Including this information can improve the specificity of the model predictions.

4. **benign_malignant**: This is a critical classification target for models designed to distinguish between benign and malignant lesions. This attribute directly corresponds to the primary outcome of many diagnostic models in dermatology.

5. **diagnosis**: Provides detailed clinical diagnosis data, which is useful for models that are intended to perform detailed classification beyond simple benign/malignant categorization. It helps in training more specialized models that can recognize various types of skin diseases.

6. **melanocytic**: Indicates whether the lesion contains melanocytes, which is crucial for diagnosing melanoma, a serious type of skin cancer. This feature helps to tailor the model to focus on characteristics relevant to melanocytic conditions.

7. **sex**: Gender may influence the prevalence and type of skin lesions due to biological, genetic, or behavioral differences. Including sex can aid in understanding these differences and tailoring the diagnosis accordingly.

8. **fitzpatrick_skin_type**: This skin classification system measures how skin reacts to ultraviolet light, which can impact the likelihood of sun damage and skin cancer. It’s an essential factor for models that need to account for risk variations based on skin type.

In [3]:
# data split
model_data = metadata[['isic_id', 'age_approx', 'anatom_site_general', 'benign_malignant',
                 'diagnosis', 'melanocytic', 'sex', 'fitzpatrick_skin_type']]
print(model_data.head())

        isic_id  age_approx anatom_site_general benign_malignant  \
0  ISIC_0080539          50           head/neck        malignant   
1  ISIC_0098024          65     upper extremity           benign   
2  ISIC_0131983          85     posterior torso           benign   
3  ISIC_0134155          35           head/neck           benign   
4  ISIC_0155910          75           head/neck           benign   

             diagnosis  melanocytic     sex fitzpatrick_skin_type  
0             melanoma         True  female                   III  
1  lichenoid keratosis        False    male                   III  
2                nevus         True    male                    II  
3          lentigo NOS         True    male                   III  
4          lentigo NOS         True    male                    II  
