#### Example Application of Texture Analysis Pipeline

The following example utilizes the Kather dataset, a tiled, color normalized, and MSI/MSS labelled dataset made from the TCGA-CRC and TCGA-STAD datasets. This dataset has already been seperated in seperate folders based off of their labels of MSI or MSS.

Kather dataset can be found at DOI: 10.5281/zenodo.2530834
OR
https://zenodo.org/records/2530835

In [None]:
from main import *

In [None]:
MSI_dir = r'\path\to\MSI\tiles\from\Kather\dataset'
MSS_dir = r'\path\to\MSS\tiles\from\Kather\dataset'

To begin, we want to create a series of subdirectories for multiprocessing purposes. This method utilizes multiple CPU cores for this, so keep that in mind when setting the "n_jobs" variable in the "init_subdirs" command. Default for this parameter is set to "n_jobs=1." HOWEVER, if using" n_jobs > 1", then proper multiprocessing formatting needs to be used for the "feature_extraction" function. 

Please look in "FeatureExtraction.py" for proper formatting of multiprocessing utilization.

The subdirectories will be created in the same folder as the datasets folders. 

The results for the texture analysis will be in the same folder as the code, in a "texture_analysis_results" folder.

In [None]:
MSI_subsets_dir, MSS_subsets_dir, MSI_validation_dir, MSS_validation_dir, save_path = init_subdirs(MSI_dir, MSS_dir, sample_size=5000, validation=True)

Grey level co-occurrence matrices are created for each image, and their texture features are extracted here.

In [None]:
feature_extraction(MSI_subsets_dir, os.path.join(save_path, 'MSI'))
feature_extraction(MSS_subsets_dir, os.path.join(save_path, 'MSS'))

Machine learning is done with Random Forest utilizing paramter tuning. It applies the MSI and MSS labels, and then returns an ROC curve with mean AUROC results across the validation testing sets, as well as the confidence interval.

In [None]:
X_train, X_test, y_train, y_test = preprocess_data(os.path.join(save_path, 'MSI'), os.path.join(save_path, 'MSS'))
plot_predict(X_train,X_test, y_train, y_test, os.path.join(save_path, 'MSI_validation_subsets'), os.path.join(save_path, 'MSS_validation_subsets'), title=('Prediction of MSI in ' + 'TCGA-CRC'))


For a streamlined version of the entire pipeline, "main" can be used to allow for a greater number of CPU's to be utilized.

In [None]:
main(MSI_dir, MSS_dir, sample_size=5000, n_jobs=5, cohort_name='TCGA-CRC', validation=True, subsets=True)