The goal of this notebook is to use the method ```extract_vision_features``` and the datasources in csv to generate ```vision embeddings```.

Note: in this notebook, we only use a sample of 10 patients to make the data processing and extraction easier for our tests.

If ```torchxrayvision``` is not installed, intall it using ``````!pip install torchxrayvision``````

In [None]:
import torchxrayvision as xrv

import skimage
import cv2
import torch

import torch.nn.functional as F

import pandas as pd
import numpy as np

import os
from os import listdir

import os
os.chdir('../')

from src.data import constants
from src.utils import extract_vision_features

Start by reading sample images and sample patients

In [None]:
# Read images:
image_path_folder = constants.image_path_folder

# Read sample patients
df_10_dicoms = pd.read_csv(constants.df_10_dicoms)

We start by creating empty dataframes to store the vision embeddings and the concatenation.

Two types of embeddings are generated in this notebook:
- vision dense embeddings
- vision predictions embeddings

For further details, please refer to : https://github.com/mlmed/torchxrayvision/blob/0eafebf36a3f5f30302dff0faaacef5e52243e87/scripts/process_image.py

In [None]:
#creating empty dataframes to store the vision embeddings and the concatenation:
df_vision_dense_embeddings_fusion = pd.DataFrame()
df_vision_predictions_embeddings_fusion = pd.DataFrame()
vision_embeddings = pd.DataFrame()


#iterating through sample file to read dicom_id for each image and process the corresponding image using torchxrayvision:
for img_id in df_10_dicoms[df_10_dicoms.dicom_id.isin(constants.sample_images)].dicom_id:
    
    for root, dirs, files in os.walk(image_path_folder):
        
        for name in files:
            
            if img_id == name[0:44]: # avoid reading the extension .jpg
                
                # image processing and features extraction:
                img = skimage.io.imread(image_path_folder + name)
                
                # embeddings concatenation for both types:
                df_vision_predictions_embeddings_fusion = df_vision_predictions_embeddings_fusion.append(extract_vision_features(img)[0])
                df_vision_dense_embeddings_fusion = df_vision_dense_embeddings_fusion.append(extract_vision_features(img)[1])
        

# combining both embeddings in one dataframe:
vision_embeddings = pd.concat([ df_vision_predictions_embeddings_fusion , df_vision_dense_embeddings_fusion], axis=1)
vision_embeddings.insert(0, "subject_id", [element for element in df_10_dicoms["subject_id"].unique()])
vision_embeddings.insert(1, "img_id", [element for element in constants.sample_images])

#Display extracted vision_embeddings:
vision_embeddings


Unnamed: 0,subject_id,img_id,vp_0,vp_1,vp_2,vp_3,vp_4,vp_5,vp_6,vp_7,...,vd_1014,vd_1015,vd_1016,vd_1017,vd_1018,vd_1019,vd_1020,vd_1021,vd_1022,vd_1023
0,10004235,074987b9-26c19a32-5d80ebab-28a2fb1c-6191b91f,0.877535,0.780643,0.5,0.502818,0.849441,0.5,0.5,0.873926,...,0.0,1.158451,0.022998,0.763152,0.001713,0.00034,0.436066,0.003181,0.030982,0.000345
0,10004720,53a0e91c-79580b39-f184232b-f105311f-eb2e51d2,0.858921,0.752889,0.5,0.515931,0.74674,0.5,0.5,0.881374,...,0.0,1.244473,0.002509,0.864674,0.004327,0.007722,0.515401,0.0,0.000955,0.0
0,10019003,60f2347b-99b4129d-95de2c7b-ee5cb73c-806efa60,0.792742,0.66952,0.5,0.251875,0.808638,0.5,0.5,0.808827,...,0.0,0.033331,0.0,0.032763,0.053149,0.002751,0.012662,0.013467,0.0,0.096322
0,10020852,e89d7fd0-52d0afc7-097fc4dc-3b7342d3-14b97733,0.741867,0.581389,0.5,0.335037,0.663654,0.5,0.5,0.653123,...,0.0,0.498774,0.007872,0.375536,0.00239,0.018924,0.208382,0.010424,0.02061,0.006576
0,10023708,2026c1e8-873a009a-a6c9549d-a2e8e77f-5266ac77,0.562414,0.364766,0.5,0.045881,0.35119,0.5,0.5,0.298945,...,0.0,0.106657,0.0,0.108097,0.033833,0.000182,0.034597,0.001043,0.0,0.070947
0,10031358,1dfc725a-fb67044b-37c88c4e-e4a80288-18a92be0,0.619143,0.295226,0.5,0.138097,0.60031,0.5,0.5,0.387957,...,0.0,0.026729,0.012108,0.022146,0.03488,0.030794,0.010998,0.018219,0.022194,0.045254
0,10035631,0f33d2cc-cba96c64-8d40983e-4b2a2264-6ff6d3a5,0.667334,0.717353,0.5,0.452934,0.623374,0.5,0.5,0.791604,...,0.0,0.826656,0.001952,0.692491,0.0,0.0,0.357309,0.065947,0.0,0.0
0,10046166,abea5eb9-b7c32823-3a14c5ca-77868030-69c83139,0.671572,0.313246,0.5,0.112746,0.504731,0.5,0.5,0.33443,...,0.0,0.125224,0.0,0.130445,0.030246,0.002892,0.062674,0.010752,0.0,0.056176
0,10047172,43d968ea-b9b838af-5e4a8bef-c5a4808b-04aa4e2c,0.779744,0.599946,0.5,0.231051,0.684293,0.5,0.5,0.674557,...,0.0,0.485786,0.003176,0.566971,0.030178,0.0,0.235767,0.137142,0.0,0.03749
0,10051990,457bdad8-c45edc64-452fde8e-f5adda5c-f386693e,0.481756,0.171399,0.5,0.075162,0.56219,0.5,0.5,0.192893,...,0.0,0.00901,0.0,0.000527,0.011546,0.003044,0.0,0.023434,0.004143,0.017548


Export to a csv file for later use:

In [None]:
vision_embeddings.to_csv("/cvs/fusion_vision.csv", index=True)