## 1) What is Gleason score?
The most common scale used to evaluate the grade of prostate cancer cells is called a Gleason score. Gleason scoring combines two numbers and can range from 2 (nonaggressive cancer) to 10 (very aggressive cancer), though the lower part of the range isn't used as often.

## 2)What is ISUP grade?
According to current guidelines by the International Society of Urological Pathology (ISUP), the Gleason scores are summarized into an ISUP grade on a scale from 1 to 5 according to the following rule:

Gleason score 6 = ISUP grade 1   
Gleason score 7 (3 + 4) = ISUP grade 2    
Gleason score 7 (4 + 3) = ISUP grade 3    
Gleason score 8 = ISUP grade 4    
Gleason score 9-10 = ISUP grade 5    
If there is no cancer in the sample, we use the label ISUP grade 0 in this competition.

![](https://storage.googleapis.com/kaggle-media/competitions/PANDA/Screen%20Shot%202020-04-08%20at%202.03.53%20PM.png)

### **What is .tiff format and Why it is used?**

* Tagged Image File Format (TIFF) is a variable-resolution bitmapped image format developed by Aldus in 1986. TIFF is very common for transporting color or gray-scale images into page layout applications, but is less suited to delivering web content.

### **Reasons for Usage:**
* TIFF files are large and of very high quality. Baseline TIFF images are highly portable; most graphics, desktop publishing, and word processing applications understand them.

* The TIFF specification is readily extensible, though this comes at the price of some of its portability. Many applications incorporate their own extensions, but a number of application-independent extensions are recognized by most programs.

### **Import Usefull Libraries.**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import openslide
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

### **Base Folder Path of Dataset.**

In [None]:
BASE_FOLDER = "/kaggle/input/prostate-cancer-grade-assessment/"
!ls {BASE_FOLDER}

### **Read all CSV file & train Images of Dataset.**

In [None]:
IMG_FOLDER = BASE_FOLDER + 'train_images/'
MASK_FOLDER = BASE_FOLDER + 'train_label_masks/'
train = pd.read_csv(BASE_FOLDER+"train.csv")
test = pd.read_csv(BASE_FOLDER+"test.csv")
sub = pd.read_csv(BASE_FOLDER+"sample_submission.csv")

### **Information of Train Dataset**

In [None]:
train.info()

### **First Five Row of Training Dataset.**

In [None]:
train.head()

### **Test Dataset**

In [None]:
test.head()

### **Submission Dataset.**

In [None]:
sub.head()

### **Different Data Providers for PANDA Challenge.**

In [None]:
train['data_provider'].value_counts()

### **Function for count Plot.**

In [None]:
def plot_count(data, order=None):
    fig = plt.figure(figsize=(10,6))
    ax = sns.countplot(data, order=order)
    for p in ax.patches:
        ax.annotate('{:.2f}%'.format(p.get_height()*100/len(data)), ( p.get_x()+ p.get_width()/2, p.get_height()+20), ha='center', va='bottom')
    

### **Count Plot for Data Providers.**

In [None]:
plot_count(train['data_provider'])

### **Number of counts for different ISUP Grade.**

In [None]:
train['isup_grade'].value_counts()

### **Count Plot for ISUP Grade.**

In [None]:
plot_count(train['isup_grade'])

![](https://www.pcf.org/wp-content/uploads/2020/10/Gleason_745x510-676x373.jpg)

### **Number of counts for different Gleason Score.**

In [None]:
train['gleason_score'].value_counts()

### **Count Plot for Gleason Score.**

In [None]:
gleason_order = ['negative', '0+0', '3+3', '3+4', '4+3', '3+5', '4+4', '5+3', '4+5', '5+4', '5+5']
plot_count(train['gleason_score'], order=gleason_order)

### **Checking Which data provider provide negative gleason score.**

In [None]:
train[train.gleason_score == 'negative']['data_provider'].value_counts()

### **Checking Which data provider provide '0+0' gleason score.**

In [None]:
train[train.gleason_score == '0+0']['data_provider'].value_counts()

### **Changing the negative Gleason Score to '0+0'.**

In [None]:
train['gleason_score'] = train['gleason_score'].apply(lambda x : '0+0' if x == 'negative' else x)

### **Function for count Plot.**

In [None]:
def plot_count_with_hue(data, hue, order=None):
    fig = plt.figure(figsize=(16,6))
    ax = sns.countplot(data, hue=hue, order=order)
    for p in ax.patches:
        ax.annotate('{:.2f}%'.format(p.get_height()*100/len(data)), ( p.get_x()+ p.get_width()/2, p.get_height()+20), ha='center', va='bottom')

### **Count Plot for Gleason Score.**

In [None]:
gleason_order = ['0+0', '3+3', '3+4', '4+3', '3+5', '4+4', '5+3', '4+5', '5+4', '5+5']
plot_count_with_hue(train['gleason_score'], train['isup_grade'], order=gleason_order)

### **Mislabelled for Gleason Score '4+3'.**

In [None]:
train[train.gleason_score == '4+3']['isup_grade'].value_counts()

### **Checking the Which row has mislabelled.**

In [None]:
train[(train.gleason_score == '4+3') & (train.isup_grade == 2)]

### **Drop the Mislabelled row from training dataset.**

In [None]:
train.drop([7273],inplace=True)

### **Again plot the Count Plot for Gleason Score.**

In [None]:
plot_count_with_hue(train['gleason_score'], train['isup_grade'], order=gleason_order)

### **Count plot for ISUP Grade for different data providers.**

In [None]:
plot_count_with_hue(train['isup_grade'], train['data_provider'])

### **Count plot for Gleason score for different data providers.**

In [None]:
plot_count_with_hue(train['gleason_score'], train['data_provider'], order=gleason_order)

In [None]:
len(train['image_id'].value_counts())

### **Open Train Image using OpenSlide.**

In [None]:
img = openslide.OpenSlide(IMG_FOLDER + train.loc[0, 'image_id'] + '.tiff')

In [None]:
# The number of levels in the slide. Levels are numbered from 0 (highest resolution) 
# to level_count - 1 (lowest resolution).
img.level_count

In [None]:
# A (width, height) tuple for level 0 of the slide.
img.dimensions

In [None]:
# A list of (width, height) tuples, one for each level of the slide. level_dimensions[k]
# are the dimensions of level k.
img.level_dimensions

In [None]:
img.level_downsamples

### **Plot the Train Image.**

In [None]:
# location, level, size
img.read_region((0,0), 1, (6912, 7360))

### **Plot the 1/4th of previous Image.**

In [None]:
img.read_region((0,0), 1, (6912//2, 7360//2))

In [None]:
img_size = pd.DataFrame(columns=['image_id', '1_width', '1_height', '2_width', '2_height', '3_width', '3_height', 'level1', 'level2', 'level3'])
i = 0
for image_id in train['image_id']:
    data = [image_id]
    img = openslide.OpenSlide(IMG_FOLDER + image_id + '.tiff')
    
    dim = img.level_dimensions
    for width, height in dim:
        data.extend([width, height])
    
    downsamples = img.level_downsamples
    data.extend(downsamples)
    
    img_size.loc[len(img_size)] = data
    
    img.close()
    
    i+= 1
    if i%1000 == 0:
        print("Done ", i, '/', len(train['image_id']))

In [None]:
img_size.head()

In [None]:
img_size.describe()

In [None]:
img_size.info()

In [None]:
img_size[['1_width', '1_height', '2_width', '2_height', '3_width', '3_height']] = img_size[['1_width', '1_height', '2_width', '2_height', '3_width', '3_height']].astype('int')
img_size.info()

In [None]:
img_size.describe()

In [None]:
plt.figure(figsize=(12,6))
ax = plt.gca()
sns.kdeplot(img_size['1_width'],fill=True ,ax=ax, color='#83acf7', label='Width')
sns.kdeplot(img_size['1_height'], fill=True,ax=ax, color='#f7e68f', label='Height')
ax.legend()
plt.xlabel("Width/Height")

In [None]:
sns.scatterplot(img_size['1_width'], img_size['1_height'])

In [None]:
image_indexes = [
'07a7ef0ba3bb0d6564a73f4f3e1c2293',
    '037504061b9fba71ef6e24c48c6df44d',
    '035b1edd3d1aeeffc77ce5d248a01a53',
    '059cbf902c5e42972587c8d17d49efed',
    '06a0cbd8fd6320ef1aa6f19342af2e68',
    '06eda4a6faca84e84a781fee2d5f47e1',
    '0a4b7a7499ed55c71033cefb0765e93d',
    '0838c82917cd9af681df249264d2769c',
    '046b35ae95374bfb48cdca8d7c83233f',
    '074c3e01525681a275a42282cd21cbde',
    '05abe25c883d508ecc15b6e857e59f32',
    '05f4e9415af9fdabc19109c980daf5ad',
    '060121a06476ef401d8a21d6567dee6d',
    '068b0e3be4c35ea983f77accf8351cc8',
    '08f055372c7b8a7e1df97c6586542ac8'
]

fig, ax = plt.subplots(5,3, figsize=(20,20))

for i,image_idx in enumerate(image_indexes):
    img = openslide.OpenSlide(IMG_FOLDER + image_idx + '.tiff')
    img = img.read_region((1780,1950), 0, (256, 256))
#     img = np.array(img)
    ax[i//3][i%3].imshow(img)
    ax[i//3, i%3].axis('off')
plt.show()