A universal framework for the extracting features from digital H&E images using multiple CNN pretrained models. Extracting features from multiple CNNs models captures a wider range of functionally relevant features.
The core of this tool is built in python3.8 with tensorflow backend and keras functional API, while the downstream analysis is implemented in R programming language.
The best way to get giExtract along with all the dependencies is to install the release from python package installer (pip).
pip install giExtract
This will add two command line scripts:
Script | Context | Usage |
---|---|---|
giCube | Create image patches | giCube -h |
giExtract | Extract features from patches | giExtract -h |
Utility functions can be imported using conventional python system like from giExtract.util import generator
The main input here is the path to the H&E images slides (in .jpg or .png), specified by -p
to load and create patches.
All other arguments are optional and have been set to reasonable default. User can use giCube -h
to show the options and the default settings.
Image patches from the H&E slides, which will be saved in "cubes" directory at the path provided in the input.
The two main inputs are the path to the H&E cubes generated by giCube (.jpg), specified by -p
and path to the meta file (in .csv)
to flow the patches during feature extraction -c
. The context file must have a column with file names matching the patches in the path.
All other arguments are optional and have been set to reasonable default. Use giExtract -h
to see options and default settings.
A table of features extracted by the different CNN models, with patches as rows and features as columns. The columns in the output file is named to indicate CNN origin of the feature example "inception_46".
Name | feature 1 | feature 2 | feature 2 |
---|---|---|---|
patch 1 | 0.2 | 0.1 | 0.6 |
patch 2 | 5.2 | 0.14 | 0.6 |
patch 3 | 0.6 | 0.1 | 0.7 |
An R script for analysing the output of giExtract and identifying differential features (see Manuscript) is included under R/ directory, with a README file on usage. The script giFeature.R script requires two mandatory inputs:
- Path to a csv file with meta information (must have only three columns: Name, slide and Group).
- Path to csv file with cnn features to analyse (must be an output of giExtract). Details about the optional arguments and the requirement for R and tidyverse package are given inside the README file.
To reproduce the analysis reported in the manuscript user can execute run.sh
script inside the manuscript folder.
This assumes giExtract has been installed via pip
as stated above, and R is installed on your system.
The run.sh
script will perform the three core analysis 1) patch generation 2) feature extraction and 3) differential feature analysis.
To generate the plots and automatically extract images, user can run the codes in downstream.R.
Example datasets are provided inside manuscript/data. It these give visuals of what to expect for the input/output files. Note, only a subset of the data is provided due size requirement and access control. Full dataset used for our computational histology subtype inference analysis can be requested from the corresponding authors.
git clone https://github.com/caanene1/giExtract