#      👨‍⚕️ _OSIC Pulmonary Fibrosis Progression_ 👩‍⚕️ 

![](https://medicaldialogues.in/h-upload/2020/05/18/128958-idiopathic-pulmonary-fibrosis.jpg)

# <font color='red'>1. Introduction</font> 👨🏻‍💻
## _1.1 What is Pulmonary Fibrosis ?_

Pulmonary fibrosis is a lung disease that occurs when lung tissue becomes damaged and scarred. This thickened, stiff tissue makes it more difficult for your lungs to work properly. As pulmonary fibrosis worsens, you become progressively more short of breath.

## _1.2 About OSIC :_ 
***Open Source Imaging Consortium*** (OSIC) is a not-for-profit, co-operative effort between academia, industry and philanthropy. The group enables rapid advances in the fight against Idiopathic Pulmonary Fibrosis (IPF), fibrosing interstitial lung diseases (ILDs), and other respiratory diseases, including emphysematous conditions.

## _1.3 Competition Objective :_

In this competition, you’ll predict a patient’s severity of decline in lung function based on a CT scan of their lungs. You’ll determine lung function based on output from a spirometer, which measures the volume of air inhaled and exhaled. The challenge is to use machine learning techniques to make a prediction with the image, metadata, and baseline FVC as input.

## _1.4 Evaluation Metric :_
**Laplace Log Likelihood (modified version)**: useful to evaluate a model's confidence in its decisions. Accordingly, the metric is designed to reflect both the accuracy and certainty of each prediction.


The error is thresholded at 1000 ml to avoid large errors adversely penalizing results, while the confidence values are clipped at 70 ml to reflect the approximate measurement uncertainty in FVC. 

<font color='red'>Note : The metric values will be negative and higher is better.</font>

## _1.5 Importing relevant packages_ 📦

In [None]:
import os
import cv2
import plotly
import pydicom
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected = True)

from IPython.display import display_html
from PIL import Image
import gc
from scipy.stats import pearsonr
import pydicom # for DICOM images
from skimage.transform import resize
import copy
import re

# Segmentation
from glob import glob
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
import scipy.ndimage
from skimage import morphology
from skimage import measure
from skimage.transform import resize
from sklearn.cluster import KMeans
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.tools import FigureFactory as FF
from plotly.graph_objs import *
init_notebook_mode(connected=True) 

import warnings
warnings.filterwarnings("ignore")

# Set Color Palettes for the notebook
custom_colors = ['#74a09e','#86c1b2','#98e2c6','#f3c969','#f2a553', '#d96548', '#c14953']

## _1.6 Importing the train and test set ..._ 🧪

In [None]:
train = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/train.csv')
train.head()

In [None]:
test = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/test.csv')
test.head()

#### **We see that there are 7 features in each train and test sets. Let's look at what each feature means :**
- **Patient** - a unique Id for each patient (also the name of the patient's DICOM folder)
- **Weeks** - the relative number of weeks pre/post the baseline CT (may be negative)
- **FVC** - the recorded lung capacity in ml (Forced vital capacity)
- **Percent** - a computed field which approximates the patient's FVC as a percent of the typical FVC for a person of similar characteristics
- **Age** - Age of person
- **Sex** - Sex of person (Male/Female)
- **SmokingStatus** - Whether the patient is a smoker/non-smoker/ex-smoker

### _1.6.1 Dimensions of our dataset_

In [None]:
print("Shape of training set : {}".format(train.shape))
print("Shape of testing set : {}".format(test.shape))

# _<font color='red'>2. Exploratory Data Analysis (EDA)</font>_ 📊

In [None]:
train.info()

The above block tells us that there are:
- 1 float64 feature.
- 3 int64 features.
- 3 object type features.

**It also tells us the there is no missing data from the train set as the shape of dataset is equal to the count of non-null values present in each feature.** 

### But just to make sure, let's check if there is any missing data...

In [None]:
print("*** Train set ***")
print(train.isnull().sum())
print("--------------")
print("*** Test set ***")
print(test.isnull().sum())

**Thus, no missing data throughout our train and test set**

### _2.1 Descriptive Statistics_

In [None]:
train.describe()

### _2.2 No. of unique patients, Age and Smoking Status_

In [None]:
print("Q. How many patients are present in train set ?")
print("A.",train['Patient'].nunique())
print("------------")
print("Q. How many unique ages are present in train set ?")
print("A.",train['Age'].nunique())
print("------------")
print("Q. How many smoking statuses are present in train set ?")
print("A.",train['SmokingStatus'].nunique())

### _2.3 How 'FVC', 'Percent' and 'Age' are distributed_

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(17,5))
fig.suptitle('Distribution of different features')

sns.distplot(train['FVC'], color='blue', ax=axes[0])
sns.distplot(train['Percent'], color='orange', ax=axes[1])
sns.distplot(train['Age'],color='green', ax=axes[2])

plt.show()

**It is good to see that the distribution is somewhat normally distributed with a little bit of skewness for the 3 features.**

### _2.4 Let's now have a look at the count and percentage of each - 'Sex' and 'SmokingStatus'_

In [None]:
# Count of unique entities in both 'Sex' and 'SmokingStatus'

sex_count = train['Sex'].value_counts()
print('*** Sex Count ***\n')
print("No. of records for males in train set : {}".format(sex_count[0]))
print("No. of records for females in train set : {}".format(sex_count[1]))

smoker_count = train['SmokingStatus'].value_counts()
print("\n*** Smoker's Count ***\n")
print("No. of records for ex-smokers in train set : {}".format(smoker_count[0]))
print("No. of records for non-smokers in train set : {}".format(smoker_count[1]))
print("No. of records for current smokers in train set : {}".format(smoker_count[2]))

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15,5))
fig.suptitle("Count Plots")
sns.countplot(x='Sex', data=train, ax=axes[0])
sns.countplot(x='SmokingStatus', data=train, ax=axes[1])
plt.show()

In [None]:
# Percentage of unique entities in both 'Sex' and 'SmokingStatus'

sex_count = train['Sex'].value_counts()
print('*** Sex Percentage ***\n')
print("Percentage of males in train set : {:.2f}%".format((sex_count[0] / sex_count.sum()) * 100))
print("Percentage of females in train set : {:.2f}%".format((sex_count[1] / sex_count.sum()) * 100))

smoker_count = train['SmokingStatus'].value_counts()
print("\n*** Smoker's Percentage ***\n")
print("Percentage of ex-smokers in train set : {:.2f}%".format((smoker_count[0] / smoker_count.sum()) * 100))
print("Percentage of non-smokers in train set : {:.2f}%".format((smoker_count[1] / smoker_count.sum()) * 100))
print("Percentage of current smokers in train set : {:.2f}%".format((smoker_count[2] / smoker_count.sum()) * 100))

In [None]:
labels1 = ['Male', 'Female']
values1 = [sex_count[0], sex_count[1]]
labels2 = ['Ex-smokers', 'Never Smoked', 'Current Smokers']
values2 = [smoker_count[0], smoker_count[1], smoker_count[2]]

fig1 = go.Figure(data=[go.Pie(labels=labels1, values=values1)])
fig1.update_layout(
    title={
        'text': "Percentage of Males and Females",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig2 = go.Figure(data=[go.Pie(labels=labels2, values=values2)])
fig2.update_layout(
    title={
        'text': "Percentage of smokers, non-smokers and ex-smokers",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig1.show()
fig2.show()

## _2.5 It's time now to get a bit deeper into the feature 'FVC' (Forced Vital Capacity) as it is the main feature of this competition_

<font color='blue'>Question 1.</font> <font color='red'>So What is FVC ?</font><br><br>
<font color='blue'>Answer 1.</font> <font color='red'> Forced vital capacity (FVC) is the amount of air that can be forcibly exhaled from your lungs after taking the deepest breath possible, as measured by spirometry. This test may help distinguish obstructive lung diseases, such as asthma and COPD, from restrictive lung diseases, such as pulmonary fibrosis and sarcoidosis.</font>

<font color='red'>FVC can also help doctors assess the progression of lung disease and evaluate the effectiveness of treatment. An abnormal FVC value may be chronic, but sometimes the problem is reversible and the FVC can be corrected.</font><br><br>

<font color='blue'>Question 2.</font> <font color='red'>What is the normal FVC range in males and females ?</font><br><br>
<font color='blue'>Answer 2.</font> <font color='red'>  Normal values in healthy males aged 20-60 range from 3500 to 4500 ml, and normal values for females aged 20-60 range from 2500 to 3500 ml.</font>


### _2.5.1 First, we will have a look at the regplot of 'FVC' just to get a fair idea of how FVC trends over time for first 6 patients_

In [None]:
print(train.Patient.unique()[:6])

In [None]:
patient1_df = train[train['Patient']=='ID00007637202177411956430']
patient2_df = train[train['Patient']=='ID00009637202177434476278']
patient3_df = train[train['Patient']=='ID00010637202177584971671']
patient4_df = train[train['Patient']=='ID00011637202177653955184']
patient5_df = train[train['Patient']=='ID00012637202177665765362']
patient6_df = train[train['Patient']=='ID00014637202177757139317']

In [None]:
fig, axes = plt.subplots(3, 2, figsize = (15,12))
fig.suptitle("Trends of FVC over time for 6 different patients")


sns.regplot(patient1_df['Weeks'], patient1_df['FVC'], 
            data=patient1_df, ax=axes[0,0]).set_title("Patient 1")
sns.regplot(patient2_df['Weeks'], patient2_df['FVC'], 
            data=patient2_df, ax=axes[0,1]).set_title("Patient 2")
sns.regplot(patient3_df['Weeks'], patient3_df['FVC'], 
            data=patient3_df, ax=axes[1,0]).set_title("Patient 3")
sns.regplot(patient4_df['Weeks'], patient4_df['FVC'], 
            data=patient4_df, ax=axes[1,1]).set_title("Patient 4")
sns.regplot(patient5_df['Weeks'], patient5_df['FVC'], 
            data=patient5_df, ax=axes[2,0]).set_title("Patient 5")
sns.regplot(patient6_df['Weeks'], patient6_df['FVC'], 
            data=patient6_df, ax=axes[2,1]).set_title("Patient 6")

plt.show()

### It is clear from the plot that the general trend of FVC is downwards, which means it is decreasing over time. This is *bad* because Forced vital capacity (FVC) is the amount of air that can be forcibly exhaled from your lungs after taking the deepest breath possible and a reducing FVC value indicates deteorating condition of lungs and ultimately the deteorating condition of patient.

### _<font color='gray'> Let's now have a look at relation of 'FVC' with other features as well...</font>_

### _2.5.2 First we will analyze 'FVC' and 'SmokingStatus'_

In [None]:
# Segregate train set according to Smoking status into 3 dataframes
ex_smoker_df = train[train['SmokingStatus'] == 'Ex-smoker']
non_smoker_df = train[train['SmokingStatus'] == 'Never smoked']
current_smoker_df = train[train['SmokingStatus'] == 'Currently smokes']

In [None]:
import plotly.express as px
fig = px.histogram(train, x="FVC",
                   title='How FVC is distributed among each Smoker type',
                   opacity=0.7,
                   color='SmokingStatus'
                  )
fig.show()

A lot of patients with this disease were Ex-smokers. Also, this disease seems to be more prominent in an Ex smoker than in other two categories.

**Let's have a look at the KDE plots to have a more clear picture**

In [None]:
fig_dims = (17, 8)
fig, ax = plt.subplots(figsize=fig_dims)
x1 = ex_smoker_df['FVC']
x2 = non_smoker_df['FVC']
x3 = current_smoker_df['FVC']
sns.kdeplot(x1, label="Ex-Smoker", shade=True, ax=ax)
sns.kdeplot(x2, label="Non-Smoker", shade=True, ax=ax)
sns.kdeplot(x3, label="Current Smoker", shade=True, ax=ax)
plt.legend();

All there curves for all three smoking categories is somewhat normally distributed with skewness in each category curve.


- An FVC value of approx. 3000 seems to be more prominent in patients who currently smoke. This is a bit of concern as a healhty male shoul have an FVC value of 3500 - 4500.
- Patients who are non smokers have a much flattened curve indicating that such patients varying range of FVC from 1000 to 4000.
- Patients who were ex-smoker have a range of aprrox. 2000-3500. This is a thing of concern as such patients have already damaged their lungs from smoking and there FVC value is also low. The skewness in the curve depicts that a few patients have an increased level of FVC (~5500-7000).

### _2.5.3 Let's see if we can find anything between 'FVC' and 'Sex'_

In [None]:
male_df = train[train['Sex'] == 'Male']
female_df = train[train['Sex'] == 'Female']

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15,5))
fig.suptitle("'FVC' v/s 'Sex'")
sns.swarmplot(x="Sex", y="FVC", data=train, ax=axes[0])
sns.violinplot(x="Sex", y="FVC", data=train, ax=axes[1])
plt.show()

### FVC values are greater in males as compared to females. This is fine because the normal range of females is less as compared to males.

###  _2.5.4 What about 'FVC' and 'Age'???_

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16,8))
fig.suptitle("'FVC' v/s 'Age'")
sns.scatterplot(x='Age', y='FVC', hue='Sex', data=train, ax=axes[0])
sns.scatterplot(x='Age', y='FVC', hue='SmokingStatus', data=train, ax=axes[1])
plt.show()

### _2.5.5 How many males and females are there across different ages ?_

In [None]:
import plotly.express as px
fig = px.histogram(train, x="Age",
                   title="How many males and females are there across different ages",
                   color='Sex'
                  )
fig.show()

It is clear from the plot that pulmonary fibrosis is more prominent across 'Males' than in 'Females' across all age groups. This thing is clear from the data itself as there are only 325 records who are female and 1224 records who are Male. 

### _2.5.6 'Age' and 'Smoking' -_

In [None]:
import plotly.express as px
fig = px.histogram(train, x="Age",
                   title="Relationship b/w 'Age' and 'SmokingStatus'",
                   color='SmokingStatus'
                  )
fig.show()

### Most of the patients who have been diagnoed with this disease were Ex-Smokers throughout all ages.

### _2.5.7 'Sex' and 'SmokingStatus'_

In [None]:
import plotly.express as px
fig = px.histogram(train, x="SmokingStatus",
                   title="Count of males and females in each smoking category",
                   color='Sex'
                  )
fig.show()

### Let's wrap up the EDA part here and move on to DICOM visualization and Analysis part

# _<font color='red'> 3. DICOM Viz. + Analysis</font>_ 📸

#### **Disclaimer :** *A major part of this section is taken from [Andrada Olteanu's Notebook](https://www.kaggle.com/andradaolteanu/pulmonary-fibrosis-competition-eda-dicom-prep). Do check out her [notebook](https://www.kaggle.com/andradaolteanu/pulmonary-fibrosis-competition-eda-dicom-prep).*

In [None]:
# Create base director for Train .dcm files
director = "../input/osic-pulmonary-fibrosis-progression/train"

# Create path column with the path to each patient's CT
train["Path"] = director + "/" + train["Patient"]

# Create variable that shows how many CT scans each patient has
train["CT_number"] = 0

for k, path in enumerate(train["Path"]):
    train["CT_number"][k] = len(os.listdir(path))

## _3.1 Number of CT scans per patient_
Huge imbalance in the number of CT scans: half of the patients have less that 100 photos registered.

In [None]:
print("Minimum number of CT scans: {}".format(train["CT_number"].min()), "\n" +
      "Maximum number of CT scans: {:,}".format(train["CT_number"].max()))

# Scans per Patient
data = train.groupby(by="Patient")["CT_number"].first().reset_index(drop=False)
# Sort by Weeks
data = data.sort_values(['CT_number']).reset_index(drop=True)

# Plot
plt.figure(figsize = (16, 6))
p = sns.barplot(data["Patient"], data["CT_number"], color=custom_colors[5])
plt.axvline(x=85, color=custom_colors[2], linestyle='--', lw=3)

plt.title("Number of CT Scans per Patient", fontsize = 17)
plt.xlabel('Patient', fontsize=14)
plt.ylabel('Frequency', fontsize=14)

plt.text(86, 850, "Median=94", fontsize=13)

p.axes.get_xaxis().set_visible(False);

## _3.2 Visualize the DICOM Info and Image_

DICOM data can be extracted by using pydicom.dcmread()

In [None]:
class bcolors:
    OKBLUE = '\033[96m'
    OKGREEN = '\033[92m'

In [None]:
path = "../input/osic-pulmonary-fibrosis-progression/train/ID00007637202177411956430/19.dcm"
dataset = pydicom.dcmread(path)

print(bcolors.OKBLUE + "Patient id.......:", dataset.PatientID, "\n" +
      "Modality.........:", dataset.Modality, "\n" +
      "Rows.............:", dataset.Rows, "\n" +
      "Columns..........:", dataset.Columns)

plt.figure(figsize = (7, 7))
plt.imshow(dataset.pixel_array, cmap="gray")
plt.axis('off');

## _3.3 An inhale for the Patient_
You can see how the lungs expand image by image.

In [None]:
patient_dir = "../input/osic-pulmonary-fibrosis-progression/train/ID00007637202177411956430"
datasets = []

# First Order the files in the dataset
files = []
for dcm in list(os.listdir(patient_dir)):
    files.append(dcm) 
files.sort(key=lambda f: int(re.sub('\D', '', f)))

# Read in the Dataset
for dcm in files:
    path = patient_dir + "/" + dcm
    datasets.append(pydicom.dcmread(path))

# Plot the images
fig=plt.figure(figsize=(16, 6))
columns = 10
rows = 3

for i in range(1, columns*rows +1):
    img = datasets[i-1].pixel_array
    fig.add_subplot(rows, columns, i)
    plt.imshow(img, cmap="gray")
    plt.title(i, fontsize = 9)
    plt.axis('off');

## _3.4 GIF from Images_ 🌔🌕🌖
Patients have various number of CT scans: the more scans/patient, the more information we have about well ... their lungs. Here we can see that when patients have low number of scans (~ 12) only an "inhale"? is observed, whereas when we have 80+ scans the details are much more enhanced.

In [None]:
from PIL import Image
from IPython.display import Image as show_gif
import scipy.misc
import matplotlib

In [None]:
def create_gif(number_of_CT = 87):
    """Picks a patient at random and creates a GIF with their CT scans."""
    
    # Select one of the patients
    # patient = "ID00007637202177411956430"
    patient = train[train["CT_number"] == number_of_CT].sample(random_state=1)["Patient"].values[0]
    
    # === READ IN .dcm FILES ===
    patient_dir = "../input/osic-pulmonary-fibrosis-progression/train/" + patient
    datasets = []

    # First Order the files in the dataset
    files = []
    for dcm in list(os.listdir(patient_dir)):
        files.append(dcm) 
    files.sort(key=lambda f: int(re.sub('\D', '', f)))

    # Read in the Dataset from the Patient path
    for dcm in files:
        path = patient_dir + "/" + dcm
        datasets.append(pydicom.dcmread(path))
        
        
    # === SAVE AS .png ===
    # Create directory to save the png files
    if os.path.isdir(f"png_{patient}") == False:
        os.mkdir(f"png_{patient}")

    # Save images to PNG
    for i in range(len(datasets)):
        img = datasets[i].pixel_array
        matplotlib.image.imsave(f'png_{patient}/img_{i}.png', img)
        
        
    # === CREATE GIF ===
    # First Order the files in the dataset (again)
    files = []
    for png in list(os.listdir(f"../working/png_{patient}")):
        files.append(png) 
    files.sort(key=lambda f: int(re.sub('\D', '', f)))

    # Create the frames
    frames = []

    # Create frames
    for file in files:
    #     print("../working/png_images/" + name)
        new_frame = Image.open(f"../working/png_{patient}/" + file)
        frames.append(new_frame)

    # Save into a GIF file that loops forever
    frames[0].save(f'gif_{patient}.gif', format='GIF',
                   append_images=frames[1:],
                   save_all=True,
                   duration=200, loop=0)

### _3.4.1 Create and compare GIFs_

In [None]:
create_gif(number_of_CT=12)
# create_gif(number_of_CT=30)
# create_gif(number_of_CT=87)

# print("First file len:", len(os.listdir("../working/png_ID00165637202237320314458")), "\n" +
#       "Second file len:", len(os.listdir("../working/png_ID00199637202248141386743")), "\n" +
#       "Third file len:", len(os.listdir("../working/png_ID00340637202287399835821")))

### _3.4.2 12 CT Scans GIF_

In [None]:
show_gif(filename="./gif_ID00165637202237320314458.gif", format='png', width=400, height=400)

## _3.5 DICOM Lung Mask_
- Segmentation is part of the preprocessing method
- Has the purpose of auto-detecting the boundaries surrounding a volume of interest (our case is the lungs)
- Drawbacks: be sure you don't exclude important parts (like lesions)

In [None]:
# https://www.raddq.com/dicom-processing-segmentation-visualization-in-python/

def make_lungmask(img, display=False):
    row_size= img.shape[0]
    col_size = img.shape[1]
    
    mean = np.mean(img)
    std = np.std(img)
    img = img-mean
    img = img/std
    
    # Find the average pixel value near the lungs
        # to renormalize washed out images
    middle = img[int(col_size/5):int(col_size/5*4),int(row_size/5):int(row_size/5*4)] 
    mean = np.mean(middle)  
    max = np.max(img)
    min = np.min(img)
    
    # To improve threshold finding, I'm moving the 
    # underflow and overflow on the pixel spectrum
    img[img==max]=mean
    img[img==min]=mean
    
    # Using Kmeans to separate foreground (soft tissue / bone) and background (lung/air)
    
    kmeans = KMeans(n_clusters=2).fit(np.reshape(middle,[np.prod(middle.shape),1]))
    centers = sorted(kmeans.cluster_centers_.flatten())
    threshold = np.mean(centers)
    thresh_img = np.where(img<threshold,1.0,0.0)  # threshold the image

    # First erode away the finer elements, then dilate to include some of the pixels surrounding the lung.  
    # We don't want to accidentally clip the lung.

    eroded = morphology.erosion(thresh_img,np.ones([3,3]))
    dilation = morphology.dilation(eroded,np.ones([8,8]))

    labels = measure.label(dilation) # Different labels are displayed in different colors
    label_vals = np.unique(labels)
    regions = measure.regionprops(labels)
    good_labels = []
    for prop in regions:
        B = prop.bbox
        if B[2]-B[0]<row_size/10*9 and B[3]-B[1]<col_size/10*9 and B[0]>row_size/5 and B[2]<col_size/5*4:
            good_labels.append(prop.label)
    mask = np.ndarray([row_size,col_size],dtype=np.int8)
    mask[:] = 0


    #  After just the lungs are left, we do another large dilation
    #  in order to fill in and out the lung mask 
    
    for N in good_labels:
        mask = mask + np.where(labels==N,1,0)
    mask = morphology.dilation(mask,np.ones([10,10])) # one last dilation

    if (display):
        fig, ax = plt.subplots(3, 2, figsize=[12, 12])
        ax[0, 0].set_title("Original")
        ax[0, 0].imshow(img, cmap='gray')
        ax[0, 0].axis('off')
        ax[0, 1].set_title("Threshold")
        ax[0, 1].imshow(thresh_img, cmap='gray')
        ax[0, 1].axis('off')
        ax[1, 0].set_title("After Erosion and Dilation")
        ax[1, 0].imshow(dilation, cmap='gray')
        ax[1, 0].axis('off')
        ax[1, 1].set_title("Color Labels")
        ax[1, 1].imshow(labels)
        ax[1, 1].axis('off')
        ax[2, 0].set_title("Final Mask")
        ax[2, 0].imshow(mask, cmap='gray')
        ax[2, 0].axis('off')
        ax[2, 1].set_title("Apply Mask on Original")
        ax[2, 1].imshow(mask*img, cmap='gray')
        ax[2, 1].axis('off')
        
        plt.show()
    return mask*img

### _3.5.1 How does the mask work?_

In [None]:
# Select a sample
path = "../input/osic-pulmonary-fibrosis-progression/train/ID00007637202177411956430/19.dcm"
dataset = pydicom.dcmread(path)
img = dataset.pixel_array

# Masked image
mask_img = make_lungmask(img, display=True)

# _Next Up : Baseline model. Stay Tuned..._

# _<font color='red'>4. References</font>_
- https://www.kaggle.com/andradaolteanu/pulmonary-fibrosis-competition-eda-dicom-prep
- https://medium.com/@hengloose/a-comprehensive-starter-guide-to-visualizing-and-analyzing-dicom-images-in-python-7a8430fcb7ed

### <font color='orange'>If you find this kernel useful, please **UPVOTE** it 😊 which keeps me motivated to do more hard work and produce more quality content.</font>