# Extract image features using pretrained MedicalNet models

Explanation of how to extract features from image datasets to use for dataset analysis using the Silhouette score or the Frechet inception distance (FID). 

### Install MedicalNet

This code relies on the MedicalNet github repository. In order to extract image features using MedicalNet models MedicalNet first needs to be installed. Intructions for this can be found on their github repository at https://github.com/Tencent/MedicalNet. 

### Adapt MedicalNet for feature extraction

In order to use the MedicalNet model to extract image features to use for dataset evaluation the script 'extract_features.py' needs to be used instead of the test.py script. Add this file to the MedicalNet-master mainfolder and adapt to your dataset. 

### Create txt files of data paths

In order to get the image features for the two separate classes two files need to be made which contain the paths for each data sample of each class. This will be used by the MedicalNet model for feature extraction. 

To make these txt files, use the scripts below. 

If you run into memory issues during processing, the feature extraction can be split into smaller sets using the large_dataset=True option in the 'extract_features.py' script. To use this option you need to split your dataset txt file into multiple files named "dataset_0_A.txt" "dataset_0_B.txt" etc to process part of the dataset at a time. 

### Extract features and reduce dimentionality using global average pooling (GPA) and PCA

Use the 'extract_features.py' code to extract features with MedicalNet followed by reducing the dimentionality using GPA and PCA functions. This creates a featurevectors for each sample. These are saved based on the given savepath hardcoded in the extract_features.py file. 

To extract features this way apply the following command in ther terminal for your dataset:

```python /extract_features.py --img_list '/Dataset_ovarian/Preprocessed_data_FAST' --input_H 256 --input_W 256 --resume_path '/pretrain/resnet_50_23dataset.pth' --gpu_id 0```


In [None]:
import pandas as pd
import os

In [None]:
# Paths for project

# Ovarian paths
data_p = './Dataset_ovarian'
preprocessed_p = './Dataset_ovarian/Preprocessed_data_RADIOMICS'

# Pancreas paths
data_p = './Pancreas_cropped'
preprocessed_p = './Pancreas_cropped/Preprocessed_data_RADIOMICS'


# LIDC paths 
data_p = './NIFTI-LIDC'
preprocessed_p = './NIFTI-LIDC/Preprocessed_data_RADIOMICS'

# Liver dataset paths
data_p = './Liver_LITS17'
preprocessed_p = './Liver_LITS17/Preprocessed_data_RADIOMICS'

# FractureMNIST3D
data_p = './MedMNIST/FractureMNIST3D'
preprocessed_p = './MedMNIST/FractureMNIST3D/Preprocessed_data_RADIOMICS'

In [None]:

dataset_csv_path = os.path.join(preprocessed_p, 'preprocessed_data.csv')
#Save paths for dataset path files for each class
save_path_txt_0 = os.path.join(preprocessed_p, 'dataset_0.txt')
save_path_txt_1 = os.path.join(preprocessed_p, 'dataset_1.txt')

In [None]:
# Make txt files for dataset

df = pd.read_csv(dataset_csv_path, usecols=['image_path', 'label'] )
im_paths_0 = []
im_paths_1 = []
print(enumerate(df.iterrows()))
for index, row in df.iterrows():
    print('index', index)
    image_path = row['image_path']
    label = row['label']
    if str(label) == '0':
        im_paths_0.append(image_path)
    
    elif str(label) == '1':
        im_paths_1.append(image_path)
    else:
        print(f'Label not recognized! Label: {label}, type: {type(label)}')
        continue

print('class 0', len(im_paths_0), im_paths_0)
print('class 1', len(im_paths_1), im_paths_1)


with open(save_path_txt_0, "w") as outfile:
    outfile.write("\n".join(im_paths_0))

print('class 1')
print('save', save_path_txt_0)

with open(save_path_txt_1, "w") as outfile:
    outfile.write("\n".join(im_paths_1))