The goal of this notebook is to use the method ```extract_vision_features``` and the datasources in csv to generate ```vision embeddings```.

Note: in this notebook, we only use a sample of 10 patients to make the data processing and extraction easier for our tests.

If ```torchxrayvision``` is not installed, intall it using ``````!pip install torchxrayvision``````

In [1]:
import torchxrayvision as xrv

import skimage
import cv2
import torch

import torch.nn.functional as F

import pandas as pd
import numpy as np

import os
from os import listdir

import os
os.chdir('../')

from src.data import constants
from src.utils import extract_vision_features

Start by reading sample images and sample patients

In [3]:
# Read images:
image_path_folder = constants.image_path_folder

# Read sample patients
df_10_dicoms = pd.read_csv(constants.df_10_dicoms)


We start by creating empty dataframes to store the vision embeddings and the concatenation.

Two types of embeddings are generated in this notebook:
- vision dense embeddings
- vision predictions embeddings

For further details, please refer to : https://github.com/mlmed/torchxrayvision/blob/0eafebf36a3f5f30302dff0faaacef5e52243e87/scripts/process_image.py

In [56]:
#creating empty dataframes to store the vision embeddings and the concatenation:
df_vision_dense_embeddings_fusion = pd.DataFrame()
df_vision_predictions_embeddings_fusion = pd.DataFrame()
vision_embeddings = pd.DataFrame()


#iterating through sample file to read dicom_id for each image and process the corresponding image using torchxrayvision:
for img_id in df_10_dicoms.dicom_id:
    
    for root, dirs, files in os.walk(image_path_folder):
        
        for name in files:
            
            if img_id == name[0:44]: # avoid reading the extension .jpg
                
                # image processing and features extraction:
                img = skimage.io.imread(image_path_folder + name)
                
                # embeddings concatenation for both types:
                df_vision_predictions_embeddings_fusion = df_vision_predictions_embeddings_fusion.append(extract_vision_features(img)[0])
                df_vision_dense_embeddings_fusion = df_vision_dense_embeddings_fusion.append(extract_vision_features(img)[1])
        

# combining both embeddings in one dataframe:
vision_embeddings = pd.concat([ df_vision_predictions_embeddings_fusion , df_vision_dense_embeddings_fusion], axis=1)
vision_embeddings.insert(0, "subject_id", [element for element in df_10_dicoms["subject_id"]])
vision_embeddings.insert(1, "img_id", [element for element in df_10_dicoms["dicom_id"]])

#Display extracted vision_embeddings:
vision_embeddings


Unnamed: 0,subject_id,img_id,vp_0,vp_1,vp_2,vp_3,vp_4,vp_5,vp_6,vp_7,...,vd_1014,vd_1015,vd_1016,vd_1017,vd_1018,vd_1019,vd_1020,vd_1021,vd_1022,vd_1023
0,10004235,3813b9b6-88d998b4-941e767b-601ba7c1-98f61102,0.761746,0.597587,0.5,0.194408,0.799667,0.5,0.5,0.795229,...,0.0,0.018440,0.036784,0.013982,0.056655,0.061920,0.014460,0.002953,0.054192,0.011427
0,10004235,5b05c3da-4f8f9c06-7b8c4faf-4c12d978-6cb22b83,0.822613,0.696896,0.5,0.352192,0.836629,0.5,0.5,0.833249,...,0.0,0.177328,0.001983,0.053524,0.037886,0.003546,0.020812,0.000000,0.009739,0.021416
0,10004235,d71a4931-5c0832b8-ae60fd56-1e3658d3-a392959a,0.850423,0.858258,0.5,0.325056,0.880797,0.5,0.5,0.926864,...,0.0,0.048385,0.024567,0.004871,0.062106,0.017659,0.002974,0.000000,0.019351,0.023966
0,10004235,074987b9-26c19a32-5d80ebab-28a2fb1c-6191b91f,0.877535,0.780643,0.5,0.502818,0.849441,0.5,0.5,0.873926,...,0.0,1.158451,0.022998,0.763152,0.001713,0.000340,0.436066,0.003181,0.030982,0.000345
0,10004235,2af702f0-3c2b86f3-82e2112b-7f449f3d-dde1c122,0.780741,0.553683,0.5,0.219153,0.557776,0.5,0.5,0.645230,...,0.0,0.145989,0.000000,0.097559,0.029508,0.000000,0.062154,0.000000,0.002565,0.135253
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,10047172,8c1e9dbe-d9226ffc-664a4b1b-f977d9dd-28c2e221,0.766402,0.623138,0.5,0.177948,0.572700,0.5,0.5,0.727186,...,0.0,0.276326,0.003103,0.390451,0.024190,0.000000,0.136018,0.132001,0.000000,0.009101
0,10051990,c3be88b4-a181c57a-713c47dc-224eed4e-ca2f9f0a,0.808708,0.553835,0.5,0.228712,0.748569,0.5,0.5,0.627512,...,0.0,0.759421,0.060288,0.916903,0.002542,0.032707,0.397692,0.192993,0.018223,0.000000
0,10051990,33d58ec3-6149bb3c-259cbce7-ca794841-9831e36d,0.692923,0.629066,0.5,0.401522,0.553469,0.5,0.5,0.677541,...,0.0,0.878277,0.009138,0.878999,0.001073,0.021765,0.407009,0.110055,0.004193,0.000000
0,10051990,c246abe6-fe4c5191-d914aaab-1eba4d2e-28970fc9,0.615369,0.584915,0.5,0.186577,0.448551,0.5,0.5,0.609550,...,0.0,0.584403,0.001957,0.702810,0.000000,0.008711,0.266601,0.191926,0.004454,0.000000
