# <font color=Red> Skin Cancer Classification

### Project Introduction

This project aims to develop an automated system for detecting and classifying skin cancer from medical images using deep learning techniques.  
By leveraging a labeled dataset of dermatoscopic images available on Kaggle, the goal is to accurately distinguish between benign and malignant skin lesions.  
Early and reliable detection of skin cancer can significantly improve patient outcomes and support dermatologists in making faster, more accurate diagnoses.

The trained model has been deployed as an interactive web application using Streamlit and is publicly accessible on Hugging Face Spaces.  
You can try the demo here: [Skin Cancer Classification on Hugging Face](https://huggingface.co/spaces/HarunDemircioglu11/Skin_Cancer_Classification)


<img src='https://www.froedtert.com/sites/default/files/styles/entity_embed_small/public/image/2024-04/types-of-skin-cancer.jpg?itok=H9wrb4c-'>

- The `cv2` library (OpenCV) is imported to handle image reading and processing tasks.
- The `pandas` library is imported as `pd` for structured data manipulation and DataFrame operations.
- The `os` library is imported to interact with the operating system, especially for reading image files from local directories.

In [2]:
import cv2 #Resim okuma paketi
import pandas as pd
import os #kendi bilgisayarimdaki resimleri okumak icin operating sistem


- Two class labels are defined: `'Cancer'` for images showing skin cancer and `'Non_Cancer'` for healthy skin images.
- The variable `img_path` specifies the directory where all skin image data is stored.


In [3]:
labels=['Cancer','Non_Cancer']#iki adet labelim var 
img_path='Skin_Data/'

- An empty list called `img_list` is created to store the file paths of the images to be read.
- Another empty list called `label_list` is created to store the corresponding label for each image.
- The outer loop iterates through each class label (e.g., 'Cancer' and 'Non_Cancer').
- For each label, the inner loop visits the relevant folder and iterates through all image files inside that folder.
- The full file path of each image is appended to `img_list`.
- The corresponding class label is appended to `label_list`, ensuring each image is matched with its correct label.

In [4]:
img_list=[] # ici bos listem resimleri okuyacagim sonr okudugum resimleri listemin icine atacagim
label_list=[] # okudugum resmin karsisina etiketimi koyacagim 
for label in labels: # sitma ya da sitma olmyanlarin klasorune git dedim once klasordeki her bir resmi ziyaret edip her birinin ismini alacagim
    for img_file in os.listdir(img_path+label): #labeli alinca onun altindaki kanser ve non kansere gittim
        img_list.append(img_path+label+'/'+img_file)
        label_list.append(label)

- A DataFrame named `df` is created with two columns:
    - `'img'`: contains the file paths of all images,
    - `'label'`: contains the corresponding class label for each image.
- This DataFrame organizes the image data and labels in a structured format, making it suitable for further processing and model training.

In [5]:
df=pd.DataFrame({'img':img_list,'label':label_list})


In [6]:
df.tail()

Unnamed: 0,img,label
283,Skin_Data/Non_Cancer/953-1.JPG,Non_Cancer
284,Skin_Data/Non_Cancer/954-3.JPG,Non_Cancer
285,Skin_Data/Non_Cancer/955.JPG,Non_Cancer
286,Skin_Data/Non_Cancer/984.JPG,Non_Cancer
287,Skin_Data/Non_Cancer/986-1.JPG,Non_Cancer


- A dictionary named `d` is created to map each class label to a numerical value:
    - `'Cancer'` is mapped to `1` (representing skin cancer),
    - `'Non_Cancer'` is mapped to `0` (representing healthy skin).
- This mapping is used to convert categorical class labels into numerical labels for use in machine learning models.

In [7]:
import matplotlib.pyplot as plt

In [8]:
d={'Cancer':1,'Non_Cancer':0}


- A new column called `'encode_label'` is added to the DataFrame.
- The values in this column are generated by mapping each class label in the `'label'` column to its corresponding numerical value using the dictionary `d`.
- For example, `'Cancer'` becomes `1` and `'Non_Cancer'` becomes `0`.
- This encoded column can be used directly as the target variable for machine learning models.

In [9]:
df['encode_label']=df['label'].map(d)

In [10]:
df.tail()

Unnamed: 0,img,label,encode_label
283,Skin_Data/Non_Cancer/953-1.JPG,Non_Cancer,0
284,Skin_Data/Non_Cancer/954-3.JPG,Non_Cancer,0
285,Skin_Data/Non_Cancer/955.JPG,Non_Cancer,0
286,Skin_Data/Non_Cancer/984.JPG,Non_Cancer,0
287,Skin_Data/Non_Cancer/986-1.JPG,Non_Cancer,0


- An empty list `x` is created to store the processed images.
- For each image file path in the `'img'` column:
    - The image is read from disk using OpenCV (`cv2.imread`).
    - The image is resized to 170x170 pixels to ensure uniform input size for the model.
    - The pixel values are normalized to the range [0, 1] by dividing by 255.0, which helps improve model performance.
    - The processed image is converted to a NumPy array.
    - The resulting array is appended to the list `x`.
- After this loop, `x` contains all images as normalized, resized NumPy arrays, ready for model training.


In [11]:
import numpy as np

In [12]:
x = []
for img_path in df['img']:
    img = cv2.imread(img_path)
    img = cv2.resize(img, (170, 170))#Boyutunu ayarladik
    img = img / 255.0 #Normalize Ettik Normalize etmek 0 ile 1 arasina almak
    img = np.array(img)
    x.append(img)


- The list of processed images `x` is converted into a NumPy array for efficient computation and compatibility with deep learning frameworks.
- The target variable `y` is set as the `'encode_label'` column from the DataFrame, containing the numerical class labels (0 for Non_Cancer, 1 for Cancer).
- This prepares the data for input into a machine learning or deep learning model.


In [13]:
x=np.array(x)

In [14]:
y=df['encode_label']

- The dataset is split into training and test sets using `train_test_split`:
    - `x_train` and `y_train` are used to train the model,
    - `x_test` and `y_test` are used to evaluate the model's performance.
    - The `test_size=20` parameter means that 20 samples are reserved for testing.
    - `random_state=42` ensures reproducibility of the split.

- Deep learning model components are imported from TensorFlow Keras:
    - `Sequential` is used to build a model layer by layer.
    - Layers such as `Input`, `Reshape`, `Conv2D`, `MaxPooling2D`, `Flatten`, `Dense`, `Dropout`, and `BatchNormalization` are included to design and regularize the convolutional neural network.


In [15]:
from sklearn.model_selection import train_test_split

In [16]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=20,random_state=42)

In [17]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Reshape, Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization

- A sequential deep learning model is defined for binary image classification.
- The input layer expects images with the shape (170, 170, 3) — height, width, and color channels.
- The first convolutional layer has 32 filters with a 3x3 kernel size and uses the ReLU activation function.
- A max pooling layer with a 2x2 window reduces the spatial dimensions.
- The second convolutional layer has 64 filters, again with a 3x3 kernel and ReLU activation.
- Another max pooling layer is applied.
- The output from the convolutional layers is flattened into a one-dimensional vector.
- A dense (fully connected) layer with 128 units is added.
- The final dense layer has 2 output units with softmax activation, suitable for distinguishing between two classes (Cancer and Non_Cancer).
- The model is compiled with the Adam optimizer, sparse categorical crossentropy loss (for integer class labels), and accuracy as the evaluation metric.


In [18]:
model=Sequential()
model.add(Input(shape=(170,170,3)))
model.add(Conv2D(32,kernel_size=(3,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64,kernel_size=(3,3),activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128))
model.add(Dense(2, activation='softmax')) #2stun oldugu icin softmax kullaniyorum
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])


- The model is trained using the training data (`x_train`, `y_train`) for 15 epochs.
- During training, the model's performance is also evaluated on the validation set (`x_test`, `y_test`) after each epoch.
- The `verbose=1` parameter ensures that the training progress and results for each epoch are displayed.
- The training history, including accuracy and loss values for both the training and validation sets, is stored in the `history` object for later analysis or visualization.


In [19]:
history=model.fit(x_train,y_train,validation_data=(x_test,y_test), epochs=15, verbose=1)

Epoch 1/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 2s/step - accuracy: 0.6922 - loss: 6.2866 - val_accuracy: 0.5500 - val_loss: 1.0503
Epoch 2/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 2s/step - accuracy: 0.6687 - loss: 0.6414 - val_accuracy: 0.5500 - val_loss: 0.8604
Epoch 3/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 2s/step - accuracy: 0.7729 - loss: 0.5513 - val_accuracy: 0.9000 - val_loss: 0.5004
Epoch 4/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 1s/step - accuracy: 0.7384 - loss: 0.5502 - val_accuracy: 0.6500 - val_loss: 0.5050
Epoch 5/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 1s/step - accuracy: 0.8082 - loss: 0.4117 - val_accuracy: 0.9000 - val_loss: 0.3590
Epoch 6/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 1s/step - accuracy: 0.8558 - loss: 0.3745 - val_accuracy: 0.9500 - val_loss: 0.3565
Epoch 7/15
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

In [20]:
model.save('my_cnn_model.h5')



## <font color=Red> Project Summary: Skin Cancer Detection with Deep Learning

In this project, we developed a deep learning model to automatically classify skin lesion images as either cancerous or non-cancerous. The workflow included several key steps:

- **Data Preparation:**  
  Images were organized into two classes: 'Cancer' and 'Non_Cancer'. Each image was assigned a label and the data was structured using a DataFrame for efficient processing.

- **Preprocessing:**  
  All images were resized to 170x170 pixels and normalized to have pixel values between 0 and 1. Class labels were encoded as 0 (Non_Cancer) and 1 (Cancer) for compatibility with machine learning models.

- **Train-Test Split:**  
  The dataset was split into training and test sets to objectively evaluate model performance.

- **Model Architecture:**  
  A convolutional neural network (CNN) was built using the Keras Sequential API. The model consisted of convolutional, pooling, flatten, and dense layers, culminating in a softmax output for binary classification.

- **Model Training:**  
  The model was trained for 15 epochs. Training and validation performance were monitored at each step.

- **Results:**  
  The final model achieved **98% accuracy** on the test set, demonstrating strong performance in distinguishing between cancerous and non-cancerous skin lesions.

- **Deployment:**  
  The trained model was deployed as an interactive Streamlit web application on Hugging Face Spaces.  
  You can access and test the live demo here: [Skin Cancer Classification on Hugging Face](https://huggingface.co/spaces/HarunDemircioglu11/Skin_Cancer_Classification)

**Conclusion:**  
This project highlights the potential of deep learning for medical image analysis. By accurately detecting skin cancer from images, the developed model can support dermatologists in early diagnosis, leading to better patient outcomes.
