In this exercise, you'll receive a dataframe that contains ground truth labels for 128 chest x-rays that can have any of the following disease labels:

Pneumonia
Atelectasis
Effusion
Infiltration
Pneumothorax
Cardiomegaly
Mass
Nodule
The final column in the dataframe is a classification algorithm's assessment of whether or not there is pneumonia in the image. This algorithm's clinical intended use is for the detection of pneumonia on chest x-rays. In this exercise, you will assess the algorithm's performance specifically in the presence of the other diseases, and determine if there are any diseases that significantly impact the algorithm's performance and that should be listed as limitations of the algorithm.



In [None]:
# This block imports essential libraries for data analysis and machine learning metrics.
# - pandas: Used for data manipulation and analysis, especially with tabular data (DataFrames).
# - numpy: Provides support for large, multi-dimensional arrays and matrices, along with mathematical functions.
# - sklearn.metrics: Contains functions to compute various machine learning model evaluation metrics, such as confusion_matrix.

import pandas as pd
import numpy as np
import sklearn.metrics

Read in labels and performance data:

In [3]:
data = pd.read_csv('labels_and_performance.csv')
data.head()

Unnamed: 0,Pneumonia,Atelectasis,Effusion,Pneumothorax,Infiltration,Cardiomegaly,Mass,Nodule,algorithm_output
0,1,1,1,0,0,0,0,0,1
1,1,1,0,0,0,1,0,0,1
2,1,0,1,0,0,0,0,0,1
3,1,1,1,0,0,1,0,0,1
4,0,1,0,0,0,0,0,0,0


First, look at the overall performance of the algorithm for the detection of pneumonia:

In [None]:
# Calculate confusion matrix for overall pneumonia detection using scikit-learn
# This block uses sklearn.metrics.confusion_matrix to compare the ground truth pneumonia labels
# with the algorithm's output. The confusion_matrix function returns the counts of true negatives (tn),
# false positives (fp), false negatives (fn), and true positives (tp) for binary classification.
# These values are used to assess the overall performance of the algorithm.
# .confusion_matrix takes (y_true, y_pred, labels): y_true is the ground truth, y_pred is the predicted labels, and labels specifies the order of classes (here, [0,1] for negative/positive).
# .ravel() flattens the resulting 2x2 confusion matrix into a 1D array: [tn, fp, fn, tp].

tn, fp, fn, tp = sklearn.metrics.confusion_matrix(data.Pneumonia.values,
                                                  data.algorithm_output.values,labels=[0,1]).ravel()

In [5]:
sens = tp/(tp+fn)
sens

np.float64(0.8166666666666667)

In [6]:
spec = tn/(tn+fp)
spec

np.float64(0.8235294117647058)

Now, look at the algorithm's performance in the presence of the other diseases: 

In [None]:
# Assessing Algorithm Performance in the Presence of Other Diseases
# 
# This block uses the scikit-learn library's `confusion_matrix` function to evaluate the performance of a pneumonia detection algorithm
# specifically when other diseases are present in the chest x-rays. For each disease (Atelectasis, Effusion, Pneumothorax, Infiltration,
# Cardiomegaly, Mass, Nodule), the code filters the dataframe to include only cases where that disease is present. It then computes the
# confusion matrix comparing the ground truth pneumonia labels (`Pneumonia`) to the algorithm's predictions (`algorithm_output`).
# The confusion matrix provides counts of true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp).
# Sensitivity (recall) and specificity are calculated for each disease context and printed, allowing assessment of how each comorbidity
# affects the algorithm's performance.

for i in ['Atelectasis','Effusion','Pneumothorax','Infiltration','Cardiomegaly','Mass','Nodule']:

    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(data[data[i]==1].Pneumonia.values,
                                                  data[data[i]==1].algorithm_output.values,labels=[0,1]).ravel()
    sens = tp/(tp+fn)
    spec = tn/(tn+fp)

    print(i)
    print('Sensitivity: '+ str(sens))
    print('Specificity: ' +str(spec))
    print()

Atelectasis
Sensitivity: 0.782608695652174
Specificity: 0.8333333333333334

Effusion
Sensitivity: 0.6521739130434783
Specificity: 0.8571428571428571

Pneumothorax
Sensitivity: 0.8571428571428571
Specificity: 0.6666666666666666

Infiltration
Sensitivity: 0.3888888888888889
Specificity: 0.0

Cardiomegaly
Sensitivity: 0.8888888888888888
Specificity: 1.0

Mass
Sensitivity: 0.9285714285714286
Specificity: 0.8666666666666667

Nodule
Sensitivity: 1.0
Specificity: 0.5384615384615384



### Statement on algorithmic limitations:

The results above indicate that the presence of infiltrations in a chest x-ray is a limitation of this algorithm, and that the algorithm performs very poorly on the accurate detection of pneumonia in the presence of infiltration. The presence of nodules and pneumothorax have a slight impact on the algorithm's sensitivity and may reduce the ability to detect pneumonia, while the presence of effusion has a slight impact on specificity and may increase the number of false positive pneumonia classifications.