<a href="https://colab.research.google.com/github/Srividhyak2011/Demo-Datascienceproject/blob/main/M5_MP1_NB_CT_Medical_Image_Classification_using_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Applied Data Science and Machine Intelligence
## A program by IIT Madras and TalentSprint
### Mini Project: Cancer Detection in CT Scan Images using CNN



## Un-Graded

## Learning Objectives

At the end of the experiment, you will be able to :

* load and visualise the images

* Extract features of images and reshape them

* implement CNN using keras

## Introduction

Effectively classifying medical images play an essential role in aiding clinical care and treatment. For example, Analysis X-ray is the best approach to diagnose pneumonia which causes about 50,000 people to die per year in the US , but classifying pneumonia from chest X-rays needs professional radiologists which is a rare and expensive resource for some regions.

The use of the traditional machine learning methods, such as support vector methods (SVMs), in medical image classification, began long ago. However, these methods have the following disadvantages: the performance is far from the practical standard, and the developing of them is quite slow in recent years. Also, the feature extracting and selection are time-consuming and vary according to different objects . The deep neural networks (DNN), especially the convolutional neural networks (CNNs), are widely used in changing image classification tasks and have achieved significant performance since 2012 . Some research on medical image classification by CNN has achieved performances rivaling human experts. For example, CheXNet, a CNN with 121 layers trained on a dataset with more than 100,000 frontal-view chest X-rays (ChestX-ray 14), achieved a better performance than the average performance of four radiologists.

The medical images are hard to collect, as the collecting and labeling of medical data confronted with both data privacy concerns and the requirement for time-consuming expert explanations. In the two general resolving directions, one is to collect more data, such as crowdsourcing  or digging into the existing clinical reports .

With the different CNN-based deep neural networks developed and achieved a significant result on ImageNet Challenger, which is the most significant image classification and segmentation challenge in the image analyzing field . The CNN-based deep neural system is widely used in the medical classification task. CNN is an excellent feature extractor, therefore utilizing it to classify medical images can avoid complicated and expensive feature engineering, presented a customized CNN with shallow ConvLayer to classify image patches of lung disease.


# Dataset

#### CT images from cancer imaging archive with contrast and patient age
The dataset is designed to allow for different methods to be tested for examining the trends in CT image data associated with using contrast and patient age. The basic idea is to identify image textures, statistical patterns and features correlating strongly with these traits and possibly build simple tools for automatically classifying these images when they have been misclassified (or finding outliers which could be suspicious cases, bad measurements, or poorly calibrated machines)
Data

The data are a tiny subset of images from the cancer imaging archive. They consist of the middle slice of all CT images taken where valid age, modality, and contrast tags could be found. This results in 475 series from 69 different patients.

TCIA Archive Link - https://wiki.cancerimagingarchive.net/display/Public/TCGA-LUAD

## Problem Statement

To build and improve upon a CNN model for the classification of medical images and achieve a high accuracy final model.

In [None]:
#@title Download the data
!gdown "1-C-0X_a2uqVjoUfiZ2RYLIxO4qAxPqdM"
!unzip --qq "CT Medical images.zip"
!rm -rf "full_archive.npz"
!rm -rf "overview.csv"

### Import Required packages

In [None]:
!pip install pydicom

In [None]:
import os
import cv2
import numpy as np
import pydicom as dicom
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

#Keras libraries
import keras
from keras.utils import np_utils
from keras import regularizers, optimizers, metrics
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense,Activation,Flatten,Dropout,BatchNormalization
from keras.layers import Conv2D,MaxPooling2D


###**Excercise 1**

### Loading Images

#### Loading images using pydicom library

* Use pydicom library to load images
* Print the image dimensions(in pixels)

   Hint:  [Pydicom library documentation](https://pydicom.github.io/)

In [None]:
# YOUR CODE HERE

###**Excercise 2**
### Visualisation of images

#### Visualise any one image each for file name ending with contrast 0 and  1



In [None]:
# YOUR CODE HERE

###**Excercise 3**
### Feature Extraction - Extract the following features

####1. Pixel spacing
####2. Slice thickness
####3. aspect_ratio1 = $\frac{Pixel\_spacing[1]}{Pixel\_spacing[0]}$

####4. aspect_ratio2 = $\frac{Pixel\_spacing[1]}{Slice\_thickness}$

####5. aspect_ratio3 = $\frac{Slice\_thickness}{Pixel\_spacing[0]}$

Refer above defintions [here](https://dicom.innolitics.com/ciods/rt-dose/image-plane)


In [None]:
# YOUR CODE HERE

###**Excercise 4**
### Prepare a 3D volume data

* Prepare a 3D volume data from images

* Reshape the 3D volume data as a stack of images

In [None]:
# YOUR CODE HERE

###**Excercise 5**
### CNN for image classification

* #### Building CNN

* #### Use Image Data Generator as image input while fitting the model

* #### Train it (Use 20 epochs for limited compute power)



* Define the keras model and initialize the layers
  - Ensure the input layer is specified with correct image size as input. This can be specified when creating the first layer with the input_shape argument.
* Speicify number of filters Kernel size, Pool size and activation function
  - filters,kernel_size and activation arguments of Conv2D layer can be used
  - pool_size argument of MaxPool2D can be used to set Pool size
* Compile the model
  - Specify the loss function (to evaluate a set of weights), the optimizer (is used to search through different weights for the network) and any optional metrics to collect and report during training.
* Fit and Evaluate the model
  - Fit the data by specifying epochs and evaluate the model

In [None]:
# Step 1 - Build the architecture
# YOUR CODE HERE

In [None]:
# Step 2 - Compile the model
# YOUR CODE HERE

In [None]:
# Step 3 - Train the model
# YOUR CODE HERE

###**Excercise 6**
### Prediction and Evaluation Metrics

* Evaluate the trained model on test set


In [None]:
#YOUR CODE HERE

In [None]:
# YOUR CODE HERE