# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objectives

At the end of the experiment, you will be able to :

* Apply PCA on the features
* Train and classify the PCA transformed data using MLP classifier

In [None]:
#@title Experiment Walkthrough Video
from IPython.display import HTML

HTML("""<video width="854" height="480" controls>
  <source src="https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/MedMNIST.mp4" type="video/mp4">
</video>
""")

## Dataset

### History

It's developed in 2017 by Arturo Polanco Lozano. This is also known as the MedNIST dataset for radiology and medical imaging. Images have been gathered from several datasets – at TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset.

### Description

The dataset contains 58954 medical images belonging to 6 classes – 

* ChestCT(10000 images) - computed tomography of the chest
* BreastMRI(8954 images) - MRI of the breast
* CXR(10000 images) - chest X-RAY
* Hand(10000 images) - hand (X-RAY)
* HeadCT(10000 images) - computed tomography of the head
* AbdomenCT(10000 images) - computed tomography of the abdominal cavity

Images are in the dimensions of 64×64 pixels. The training set has 41259 images and testing set has 17695 images.

In [None]:
! wget http://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/med_mnist.zip
! unzip -qq med_mnist.zip

### Importing Required Packages

In [None]:
import pandas as pd
# Skimage provides easy-to-use functions for reading, displaying, and saving images
# First, import the io module of skimage (skimage.io) so we can read and write images
from skimage.io import imread
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

### Load the train data

In [None]:
%cd med_mnist

In [None]:
df = pd.read_csv('med_mnist_train.csv')

In [None]:
df.head()

### Read and Pre-process the image data

In [None]:
# Below is the function to read and pre-process the images
def read_data(file):
    
    labels = []
    features = []
    # Storing the 'category' and 'image' data to a list
    category_list = list(file['category'])  # category_list contains the class names of the data
    image_list = list(file['image'])        # image_list contains the list of images for each category

    for cat, img in zip(category_list, image_list):
        labels.append(cat)  # Append the label of each category to labels list

        # Using skimage.io.imread() function to read the images 
        image = imread(img)

        # We can load the images and reshape the data arrays to have a single color channel
        feature = image.reshape(64*64)
        features.append(feature)  
    
    return features, labels

In [None]:
# Call the 'read_data' function by passing the dataframe
features, labels = read_data(df)

In [None]:
# Print the length of features and labels
len(features), len(labels)

### Visualizing the Images 

In [None]:
plt.figure(figsize=(8,8))
i = 0
rows, cols = 2, 3
label = []
for X,y in zip(features, labels):
    plt.subplot(rows, cols, i+1)
    # Get the unique labels
    if y not in label:
        plt.imshow(X.reshape(64,64), cmap="gray")
        plt.title('class: {}'.format(str(y)))
        label.append(y)
        i += 1
        if i == rows*cols:
            break

### Split the data into train and test sets

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(features, labels, test_size=0.25, random_state=1) 

In [None]:
len(X_train), len(X_test), len(Y_train), len(Y_test)

### Apply PCA on the train data with different components

In [None]:
# Create an object for PCA 
# Fit and transform the train data using PCA
pca = PCA(n_components=2).fit(X_train)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

### Train the model using MLP Classifier



#### RELU Activation Function

A Rectified Linear Unit has output 0 if the input is less than 0. That is, if the input is greater than or equal to 0, the output is equal to the input. The operation of ReLU is closer to the way our biological neurons work.

![alt text](https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/RELU.png)





In [None]:
# Create a object for MLPClassifier 
clf = MLPClassifier(activation='relu')

In [None]:
# Fit the data to the model       
clf.fit(X_train_pca, Y_train)

# Get the predictions on the test data
pred = clf.predict(X_test_pca)

In [None]:
# Calculate the accuracy
accuracy_score(Y_test, pred)

### Test the model 



In [None]:
df_test = pd.read_csv('med_mnist_test.csv')

In [None]:
df_test.head()

In [None]:
test_features, test_labels = read_data(df_test)

In [None]:
len(test_features), len(test_labels)

#### Apply PCA on Test data

In [None]:
# Create an object for PCA 
x_test_pca = pca.transform(test_features)

#### Test the Model with PCA features

In [None]:
# Get the predictions on the test data
test_pred = clf.predict(x_test_pca)

In [None]:
# Calculate the accuracy
accuracy_score(test_pred, test_labels) 