

This coding script makes:

    - Statictical analysis by using:

        1 - Mann Whitney U test : gives p values
        2 - ROC curve analysis : gives AUC ( area under curve)
        3 - Reducing features by selecting the best 'power'
        
    - The script has an option (Collective analysis )for evaluating together different plane MRI data from same sequence.
    
----------


## CONTENTS

-    [1. importing necessary libraries](#1)   
         1.1 importing puclic libraries:
               --if this box is giving an error, this mean you need to check libraries in this cell, 
               you need    to install them               
         1.2 importing user-defined libraries 
               --if this box is giving an error, this mean you need to check
               ipynb file(jupyter notebook) with the name of  'project_helper_functions_classes' 
               within your working directory
                             
-    [2. Preperation of input informations](#2) 
         -- check and set ONLY PARTS : "2.1, 2.2, 2.2"     
         2.1 Specify folder paths
               -- you need to check this box for defining folders correctly         
         2.2 specify features
               -- in this part, you will chose:
                   1 - p and AUC values limits for define statistical analysis as a meaningful
                   2 - if you would like to make collective analysis 
                       (evaluating subfolders together that belong to the same sequence)
                   3 - if you would like to make collective analysis, you need to specify subfolders
                        and their sequences                         
         2.3  specify csv paths for save stat results
                -- specify if you would like to take p and AUC values with features as a csv file
                -- you need to specify csv paths for saving statistic results
         Note:  not change this part
         2.4 preparing file paths for processing and folders for saving MRC results 
                
-    [3. Statistical Analysis](#3)    
          3.1 analysis for subfolders seperately
            - make Mann-Whitney U Test, ROC Curve Analysis
            - Reducing feature numbers:
                For features whose all features same but only powers are different; 
                select 1 feature(1 power) has the maximum AUC score        
            - Save results to csv files
         3.2 analysis for MRI sequences collectively
             -- calculations same as 3.1 for united MRC data



<a name='1'></a>
# 1. importing necessary libraries

### 1.1 importing puclic libraries

In [1]:
import pandas as pd
import os
from tqdm import tqdm

### 1.2 importing user-defined libraries 

In [2]:
from ipynb.fs.full.project_helper_functions_classes import *

<a name='2'></a>
# 2. Preperation of input informations

### 2.1 Specify folder paths

In [3]:
data_root_folder = 'data'
output_root_folder = 'output'      
MRC_results_folder = 'MRC_results'
case_info_excel_file = 'MRI_informations.xlsx'
stat_results_folder = 'MRC_stat_results'  # folder name for saving statistic results, will be created in output folder

### 2.2 specify features

In [4]:
# specify limits for accepting AUC and p meaningful
lim_auc = 0.70           ## the values more then lim_auc will be accept as meaningful
lim_p = 0.01             ## the values less then lim_p will be accept as meaningful

# True: make collective analysis, False: don't make
make_collective_analysis = True

# specify MRI sequences with their subfolders if you would like to make collective analysis (if : make_collective_analysis = True)
sequences = {'t1': ['t1_tra', 't1_sag'], 't2' : ['t2_cor', 't2_tra']}

### 2.3 specify csv paths and features for save stat results

In [5]:
# spcify path for saving selected features based on statistical analysis
selected_features_csv = 'selected_features.csv'
selected_collective_features_csv = 'selected_features_coll.csv' # if you want to make collective analysis

# specify if you would like to sort features based on stat results
sort_columns = True

# specify if you would like to save stat results (makes False if you don't want, make True if you want):
take_p_csv = True
take_auc_csv = True
take_meaningful_csv = True

## csv paths for single subfolder analysis
p_csv = 'p_values.csv'
auc_csv = 'auc_values.csv'
meaningful_stat_csv = 'meaningful_stats.csv'


# specify if you would like to save stat results (makes False if you don't want, make True if you want):
take_collective_p_csv = True
take_collective_auc_csv = True
take_collective_meaningful_csv = True

## csv paths for collective analysis (if : make_collective_analysis = True)
p_collective_csv = 'p_values_collective.csv'
auc_collective_csv = 'auc_values_collective.csv'
meaningful_collective_stat_csv = 'meaningful_collective_stats.csv'


### 2.4 preparing file paths for processing and folders for saving MRC results 

In [6]:


## specifying paths
stripped_path =  os.path.abspath(os.path.join(data_root_folder, 'stripped'))
MRC_path = os.path.abspath(os.path.join(output_root_folder, MRC_results_folder))
case_info_path = os.path.abspath(os.path.join(output_root_folder, case_info_excel_file))

## create folder for saving statistic results
stat_result_path = create_folder(os.path.join(output_root_folder,stat_results_folder))

## specify main folders lists
situations = take_folder_list(stripped_path)

## specify subfolders lists for each situation (control and patients)
subfolders = {}
for situation in situations:
    subfolders[situation] = take_folder_list(os.path.join(stripped_path, situation))
    
control_id = [situation for situation in situations if 'c' == situation[0]][0]
patient_id = [situation for situation in situations if 'p' == situation[0]][0]
    
## specify mutual subfolders for situations
mutual_subfolders = take_mutual_members(subfolders)


<a name='3'></a>
# 3. Statistical Analysis
     -Mann Whitney U Test (p value)
     -ROC Curve Analysis  (AUC)
     - power analysis

### 3.1 analysis for subfolders seperately

In [7]:
## specify column names for identify values in lists
column = ['MRI_type', 'shape', 'primary_rate', 'secondary_rate', 'step', 'power']

all_p_results, all_AUC_results, meaningful_results = [], [], []
all_selected_features = []

for subfolder in tqdm(mutual_subfolders) :
    
    controls_csv_files_paths = take_spesific_files_paths(os.path.join(MRC_path, control_id, subfolder), '.csv')    
    patients_csv_files_paths = take_spesific_files_paths(os.path.join(MRC_path, patient_id, subfolder), '.csv')
    
    ## calculation of p value (Mann-Whitney U) and AUC (ROC Curve analysis)    
    cs = analyse_MRC_results(controls_csv_files_paths, patients_csv_files_paths)   
    p_results, AUC_results, meaningful_features = cs.take_stats(folder_type = subfolder, 
                                                                auc_limit = lim_auc, p_limit = lim_p)
    
    selected_features = cs.power_analysis()
    all_selected_features.extend(selected_features)
 
    ## store results 
    if take_p_csv:
        all_p_results.extend(p_results)
    if take_auc_csv :
        all_AUC_results.extend(AUC_results)
    if take_meaningful_csv :
        meaningful_results.extend(meaningful_features)



## save results to csv files
save_stat_to_csv(all_selected_features, labels = column  + ['AUC_value', 'p_value'], 
                     path = os.path.join(stat_result_path, selected_features_csv), 
                     sort_columns = sort_columns,
                     ascending_order = False, sorting_columns = ['AUC_value', 'p_value'])       
if take_p_csv: 
    save_stat_to_csv(all_p_results, labels = column + ['p_value'], 
                     path = os.path.join(stat_result_path, p_csv), sort_columns = sort_columns,
                     ascending_order = True, sorting_columns = ['p_value'])
if take_auc_csv :
    save_stat_to_csv(all_AUC_results, labels = column + ['AUC_value'], 
                     path = os.path.join(stat_result_path, auc_csv), sort_columns = sort_columns,
                     ascending_order = False, sorting_columns = ['AUC_value'])
if take_meaningful_csv :
    save_stat_to_csv(meaningful_results, labels = column + ['AUC_value', 'p_value'], 
                 path = os.path.join(stat_result_path, meaningful_stat_csv), sort_columns = sort_columns,
                 ascending_order = False, sorting_columns = ['AUC_value', 'p_value'])

del all_p_results, take_auc_csv,  take_meaningful_csv, all_selected_features # we don't need anymore 
print('statistic calculations for per single MRI subfolder has been finished')

100%|█████████████████████████████████████████████| 5/5 [01:10<00:00, 14.02s/it]


statistic calculations for per single MRI subfolder has been finished


### 3.2 analysis for MRI sequences collectively

In [None]:
## make sequences folder empity if user don't want collective analysis
if not make_collective_analysis:
    sequences = {}
    
## specify column names for identify values in lists
column = ['sequence', 'shape', 'primary_rate', 'secondary_rate', 'step', 'power']

all_p_results, all_AUC_results, meaningful_results = [], [], []
all_selected_features = []

for key in tqdm(sequences.keys()):
    
    controls_csv_files_paths, patients_csv_files_paths = {}, {}
    
    for subfolder in sequences[key] : 
        
        controls_csv_files_paths[subfolder] = take_spesific_files_paths(os.path.join(MRC_path, control_id, subfolder),'.csv')        
        patients_csv_files_paths[subfolder] = take_spesific_files_paths(os.path.join(MRC_path, patient_id, subfolder), '.csv')
    
    ## calculation of p value (Mann-Whitney U) and AUC (ROC Curve analysis) 
    cs = analyse_MRC_results(collective_evaluation=make_collective_analysis)      
    cs.activate_collective_evaluation(controls_csv_files_paths, patients_csv_files_paths, 
                            sequence_folders = sequences[key], caseid_info_file=case_info_path, 
                                      control_id = control_id, patient_id = patient_id)    
    p_results, AUC_results, meaningful_features = cs.take_stats(folder_type = key, 
                                                                auc_limit = lim_auc, p_limit = lim_p)        
    selected_features = cs.power_analysis()
    all_selected_features.extend(selected_features)    
    ## store results
    if take_collective_p_csv:
        all_p_results.extend(p_results)
    if take_collective_auc_csv:
        all_AUC_results.extend(AUC_results)
    if take_collective_meaningful_csv:    
        meaningful_results.extend(meaningful_features)
    

## save results to csv files
if make_collective_analysis:    
    save_stat_to_csv(all_selected_features, labels = column  + ['AUC_value', 'p_value'], 
                         path = os.path.join(stat_result_path, selected_collective_features_csv),
                         sort_columns = sort_columns,
                         ascending_order = False, sorting_columns = ['AUC_value', 'p_value'])        
    if take_collective_p_csv:
        save_stat_to_csv(all_p_results, labels = column + ['p_value'], 
                         path = os.path.join(stat_result_path, p_collective_csv), sort_columns = sort_columns,
                         ascending_order = True, sorting_columns = ['p_value'])    
    if take_collective_auc_csv:
        save_stat_to_csv(all_AUC_results, labels = column + ['AUC_value'], 
                         path = os.path.join(stat_result_path, auc_collective_csv), sort_columns = sort_columns,
                         ascending_order = False, sorting_columns = ['AUC_value'])
    if take_collective_meaningful_csv:
        save_stat_to_csv(meaningful_results, labels = column + ['AUC_value', 'p_value'], 
                         path = os.path.join(stat_result_path, meaningful_collective_stat_csv), sort_columns = sort_columns,
                         ascending_order = False, sorting_columns = ['AUC_value', 'p_value'])
print('statistic calculations for per collective MRI sequences has been finished')

 50%|██████████████████████▌                      | 1/2 [00:14<00:14, 14.33s/it]