# Lung CT Feature Extraction Using Python and qtim_features

Welcome! This tutorial will walk you through the process of extracting a few statistical, morphological, and textural features from pre-segmented lungs in a large dataset of Lung CT scans. We will be using a Python package developed by the Quantitative Tumor Imaging Lab at MGH called "qtim_tools".

We will first test the feature extraction toolkit on some digitally-generated "phantom" images, where we know what values to expect. We will then extract some features from some actual Lung CT data, to show what these feature values look like in practice.

## Importing qtim_tools

Our first step is to import the qtim_tools package. Ths first line will use the pip package installer to locally install a version of qtim_tools. The second line will will make that package available to you in your local environment for the rest of this tutorial.

In [None]:
!pip install qtim_tools

import qtim_tools.qtim_features as qtim_features

## Calculating Morphology (Size) Features On Phantom Data

Some of the most simple features to calculate are size and shape features. These include volume, surface area, and other properties derived from . We're going to load a few sample datasets and extract morphology features from them to get a sense for how these features vary.

We are first going to load an extremely basic dataset: a series of white squares of different sizes.

<img src="./Example_Images/Intensity_Phantom.png">

We are then going to use the qtim_features packages to extract some simple morphology features.

In [None]:
size_squares_filepath = qtim_features.phantoms.get_phantom_filepath('size_square')

qtim_features.generate_feature_list_batch(size_squares_filepath, features=['morphology'], outfile='size_square_phantom.csv', labels=True)

You will see a spreadsheet file titled "square-intensity_phantom.csv" in your current directory. You can change the outfile parameter to specify a different file destination. It will list a few size and morphology features. Larger squares (e.g. "Size_9_Phantom" should have a greater volume than smaller squares ("Size_0_Phantom"). You might notice other features, such as the surface area to volume ratio slowly decreasing as the phantom squares get larger.

## Calculating Morphology (Shape) Features On Phantom MRI Data

That was just a quick check to make sure your package is working, and to show some basic dynamics of morphology feature changes over progressively larger volumes. We're now going to look at some sample brain MRI data with differently-shaped labels to see how these shape and size features change in practice.

<img src="./Example_Images/Shape_MRI.png">

We'll use the same code as before, but this time we'll load a different phantom.

In [None]:
shape_mri_filepath = qtim_features.phantoms.get_phantom_filepath('shape_mri')

qtim_features.generate_feature_list_batch(shape_mri_filepath, features=['morphology'], outfile='shape_mri_phantom.csv', labels=True)

[FILL IN ANALYSIS OF DIFFERENT SHAPES]

## Calculating Intensity Features On Phantom Data

It also possible to calculate intensity features within regions of interest. Intensity features are summary statistical measures generated from voxel intensities within an ROI. Some simple examples include mean intensity, intensity skew, and intensity range within a given ROI.

We're going to load a phantom with different patterns of black, white, and grey to see how intensity statistics can change - or not change - under different imaging conditions. To see how intensity statistics can change in real-world data, try re-loading the "shape-mri-phantom" from the previous example.

<img src="./Example_Images/Intensity_Phantom_4.png">

In [None]:
intensity_squares_filepath = qtim_features.phantoms.get_phantom_filepath('intensity_square')

qtim_features.generate_feature_list_batch(intensity_squares_filepath, features=['statistics'], outfile='intensity_square_phantom.csv', labels=True)

# Intensity statistics on MRI data..
# shape_mri_filepath = qtim_features.phantoms.get_phantom_filepath('shape_mri')
# qtim_features.generate_feature_list_batch(shape_mri_filepath, features=['morphology','statistics'], outfile='shape_mri_phantom.csv', labels=True)

It is often the case that images that appear wildly different under visual inspection can have similar mean intensities or intensity ranges. Standard deviation, kurtosis, and skew, add additional statistical information that can distinguish between these tough cases.

## Grey-Level Co-Occurence Matrix (GLCM) Texture Features

Grey-Level Co-Occurence Matrices (GLCMs) offer the ability to calculate simple texture measure defined by the differences in intensity from one voxel to the next. We will be calculating 2-D GLCMs on each slice of a particular region of interest, and then aggregating those slices into one summary GLCM. From there, we can extract other texture features, such as "Contrast," "Dissmilarity", "Homogeneity", and "Correlation". These features are derived from matrix calculations on the GLCM extracted from a given ROI.

Depending on the distance and angle that one calculates a GLCM from, the features extracted can be quite different. A GLCM can be calculated from the difference between voxels right next to each other (distance: 1) and derive features that are very sensitive to fine-grain, heavily-textured regions of interest. Another GLCM can be calculated from the difference between intensities several voxels apart (distance: 5-10) to create features sensitive to thicker, heavily-edged images. 

<img src="./Example_Images/GLCM_Distance.png">

Different angles can also result in different features. A GLCM can be calculated based on the intensity difference between voxels located on top of and below each other (angle: 90 degrees), ending up with texture features very sensitive to horizontally-oriented bars of intensity. Similarly, a focus on voxels located to the right and left of each other (angle: 0 degrees) will be sensitive to vertically-oriented texture, but not horizontal texture. Other non-cardinal angles (e.g. angle: 45 degrees) can be used to detect other orientations of texture.

<img src="./Example_Images/GLCM_Angle.png">

Without getting into the specifics of the equations used to extract features from GLCMs, different features reflect different visual qualities of a region of interest. For example, "Contrast" is particularly sensitive to stark differences between bright and dark intensities (e.g. at a tumor border), whereas "Dissimilarity" better reflects heterogeneity in voxel intensity across an entire region of interest. Other features attempt to represent other visual qualities; you can learn more at this link: http://www.fp.ucalgary.ca/mhallbey/texture_calculations.htm




## Calculating Grey-Level Co-Occurence Matrix (GLCM) Texture Features on Phantom Data

We will now use the qtim_features package to calculate simple GLCM features on the texture phantoms pictured above. 

We will calculate GLCMs in 4 directions (0, 45, 90, 135 degrees) and 5 distances (1,2,3,4,5 voxels apart) to extract 6 features each (Contrast, Dissimilarity, Homogeneity, ASM, Energy, Correlation) for a total of 4x5x6 = 120 features. There are 18 different phantoms to extract texture from. They are oriented vertically, horizontally, and in a grid-like pattern, and have stripes at distances 0 (no stripes), 1, 2, 3, 4, and 5.

We'll use just the same code as before to generate our features. You can also calculate texture from the sample brain MRI data to get a sense of how texture plays out in real-world images.

In [None]:
glcm_squares_filepath = qtim_features.phantoms.get_phantom_filepath('glcm_square')

qtim_features.generate_feature_list_batch(glcm_squares_filepath, features=['GLCM'], outfile='GLCM_square_phantom.csv', labels=True)

# GLCM and intensity statistics on MRI data..
# shape_mri_filepath = qtim_features.phantoms.get_phantom_filepath('shape_mri')
# qtim_features.generate_feature_list_batch(shape_mri_filepath, features=['GLCM','statistics','morphology'], outfile='GLCM_mri_phantom.csv', labels=True)