# Curating datasets

**Author(s):** Michelle Iskandar [@michellepi](https://github.com/michellepi) and Miguel Xochicale [@mxochicale](https://github.com/mxochicale)   
**Contributors:** 


## Introduction

This notebook presents prototypes for demographic analysis of participants.

### Running notebook
Go to repository path: `cd $HOME/repositories/budai4medtech/midi2023/data`   
Open repo in pycharm and in the terminal type:
```
git checkout master # or the branch
git pull # to bring a local branch up-to-date with its remote version
```
Launch Notebook server:
```
conda activate febusisVE
jupyter notebook --browser=firefox
```
which will open your web-browser.


### Logbook
* August 2022: adds notebook
* Dec 2022: Cleaner notebook


### References
* FETAL_PLANES_DB: Common maternal-fetal ultrasound images.    
The final dataset is comprised of over 12,400 images from 1,792 patients. 
https://zenodo.org/record/3904280





# 2. Jupyter Notebook
## 2.1 Setting imports and datasets paths

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

HOME_PATH = os.path.expanduser(f'~')
USERNAME = os.path.split(HOME_PATH)[1]

REPOSITORY_PATH='repositories/budai4medtech/midl2023'
FULL_REPO_PATH = HOME_PATH+'/'+REPOSITORY_PATH
FULL_DATA_REPO_PATH = FULL_REPO_PATH +'/data/'
CSV_FILENAME_CSV = 'FETAL_PLANES_DB_data.csv'


## Printing Versions and paths
print(f'FULL_DATA_REPO_PATH: {FULL_DATA_REPO_PATH}' )


FULL_DATA_REPO_PATH: /home/mxochicale/repositories/budai4medtech/midl2023/data/


## 2.1 Loading and filtering dataframe tables


In [2]:
pandas_read_csv= pd.read_csv(FULL_DATA_REPO_PATH + CSV_FILENAME_CSV, sep=";")
# df_semi = pd.read_csv(your_csv_file_path, nrows=1, sep=";")
df_datasets = pd.DataFrame(pandas_read_csv)

df_datasets
## Comment/Uncomment to see Labelled vs non labelled data distributions
#df_datasets_fb_Trans_thalamic = df_datasets[  (df_datasets["Plane"] == 'Fetal brain')]

Unnamed: 0,Image_name,Patient_num,Plane,Brain_plane,Operator,US_Machine,Train
0,Patient00001_Plane1_1_of_15,1,Other,Not A Brain,Other,Aloka,1
1,Patient00001_Plane1_2_of_15,1,Other,Not A Brain,Other,Aloka,1
2,Patient00001_Plane1_3_of_15,1,Other,Not A Brain,Other,Aloka,1
3,Patient00001_Plane1_4_of_15,1,Other,Not A Brain,Other,Aloka,1
4,Patient00001_Plane1_5_of_15,1,Other,Not A Brain,Other,Aloka,1
...,...,...,...,...,...,...,...
12395,Patient01791_Plane5_1_of_1,1791,Fetal femur,Not A Brain,Other,Voluson S10,0
12396,Patient01792_Plane2_1_of_1,1792,Fetal abdomen,Not A Brain,Other,Voluson E6,0
12397,Patient01792_Plane3_1_of_1,1792,Fetal brain,Trans-thalamic,Other,Voluson E6,0
12398,Patient01792_Plane5_1_of_1,1792,Fetal femur,Not A Brain,Other,Voluson E6,0


In [3]:
### Uncomment to display outputs
#print(df_datasets["Plane"].to_string())
    ## plane = 'Fetal brain'
    ## Fetal thorax
    ## Maternal cervix
    ## Fetal femur
    ## Fetal thorax
    ## Other

# print(df_datasets["US_Machine"].to_string())
    ## Aloka
    ## Voluson E6
    ## Other
    ## Voluson S10
    
# print(df_datasets["Operator"].to_string())
    ## Other
    ## Op. 1
    ## Op. 2
    ## Op. 3

# print(df_datasets["Patient_num"].to_string())       
    ## 1
    ## ...
    ## 1792
    
# print(df_datasets["Train "].to_string())
    ## 0
    ## 1

## 3. Filtering and creating columns
### 3.1 Fetal brain Trans_thalamic plane

In [4]:
df_datasets_fb_Trans_thalamic = df_datasets[ 
    (df_datasets["Plane"] == 'Fetal brain') &  
    (df_datasets["Brain_plane"] == 'Trans-thalamic') ##1638 rows × 7 columns   
#   & (df_datasets["US_Machine"] == 'Voluson S10') #123 rows × 7 columns
    & (df_datasets["US_Machine"] == 'Voluson E6') #1072 rows × 7 columns
#   & (df_datasets["Operator"] == 'Op. 1') 
    
    ]
df_datasets_fb_Trans_thalamic 


Unnamed: 0,Image_name,Patient_num,Plane,Brain_plane,Operator,US_Machine,Train
1316,Patient00216_Plane3_1_of_5,216,Fetal brain,Trans-thalamic,Other,Voluson E6,1
1318,Patient00216_Plane3_3_of_5,216,Fetal brain,Trans-thalamic,Other,Voluson E6,1
1320,Patient00216_Plane3_5_of_5,216,Fetal brain,Trans-thalamic,Other,Voluson E6,1
1788,Patient00644_Plane3_3_of_3,644,Fetal brain,Trans-thalamic,Other,Voluson E6,0
1848,Patient00669_Plane3_1_of_1,669,Fetal brain,Trans-thalamic,Op. 1,Voluson E6,1
...,...,...,...,...,...,...,...
12275,Patient01774_Plane3_1_of_2,1774,Fetal brain,Trans-thalamic,Op. 1,Voluson E6,0
12276,Patient01774_Plane3_2_of_2,1774,Fetal brain,Trans-thalamic,Op. 1,Voluson E6,0
12329,Patient01781_Plane3_1_of_2,1781,Fetal brain,Trans-thalamic,Op. 1,Voluson E6,0
12330,Patient01781_Plane3_2_of_2,1781,Fetal brain,Trans-thalamic,Op. 1,Voluson E6,0


In [5]:
#print(df_datasets_fb_Trans_thalamic["Image_name"].to_string())

### 3.2 Fetal brain Trans_cerebellum

In [6]:
df_datasets_fb_Trans_cerebellum = df_datasets[  
                                
    (df_datasets["Plane"] == 'Fetal brain') &  
    (df_datasets["Brain_plane"] == 'Trans-cerebellum') #714 rows × 7 columns
    
#     & (df_datasets["US_Machine"] == 'Voluson S10') #68 rows × 7 columns
    & (df_datasets["US_Machine"] == 'Voluson E6') #492 rows × 7 columns
#     & (df_datasets["Operator"] == 'Op. 1') #Voluson E6 #182 rows × 7 columns
    
    ]
df_datasets_fb_Trans_cerebellum 


Unnamed: 0,Image_name,Patient_num,Plane,Brain_plane,Operator,US_Machine,Train
1319,Patient00216_Plane3_4_of_5,216,Fetal brain,Trans-cerebellum,Other,Voluson E6,1
1787,Patient00644_Plane3_2_of_3,644,Fetal brain,Trans-cerebellum,Other,Voluson E6,0
1864,Patient00675_Plane3_1_of_1,675,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,1
1895,Patient00687_Plane3_1_of_1,687,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,1
1904,Patient00690_Plane3_3_of_4,690,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,1
...,...,...,...,...,...,...,...
12078,Patient01737_Plane3_1_of_1,1737,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,0
12117,Patient01745_Plane3_2_of_2,1745,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,0
12122,Patient01747_Plane3_1_of_1,1747,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,0
12267,Patient01773_Plane3_2_of_2,1773,Fetal brain,Trans-cerebellum,Op. 1,Voluson E6,0


In [7]:
#print(df_datasets_fb_Trans_cerebellum["Image_name"].to_string())

### 3.3 Fetal brain Trans_ventricular

In [8]:
df_datasets_fb_Trans_ventricular = df_datasets[
    (df_datasets["Plane"] == 'Fetal brain') &  
    (df_datasets["Brain_plane"] == 'Trans-ventricular') #597 rows × 7 columns 
    
#     & (df_datasets["US_Machine"] == 'Voluson S10')  #59 rows × 7 columns
    & (df_datasets["US_Machine"] == 'Voluson E6') #408 rows × 7 columns
#      & (df_datasets["Operator"] == 'Op. 1') #Voluson E6# 168 rows × 7 columns
    ]

df_datasets_fb_Trans_ventricular

Unnamed: 0,Image_name,Patient_num,Plane,Brain_plane,Operator,US_Machine,Train
1317,Patient00216_Plane3_2_of_5,216,Fetal brain,Trans-ventricular,Other,Voluson E6,1
1786,Patient00644_Plane3_1_of_3,644,Fetal brain,Trans-ventricular,Other,Voluson E6,0
1897,Patient00688_Plane3_1_of_1,688,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,1
1963,Patient00699_Plane3_3_of_4,699,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,1
1969,Patient00700_Plane3_2_of_2,700,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,1
...,...,...,...,...,...,...,...
12116,Patient01745_Plane3_1_of_2,1745,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,0
12176,Patient01758_Plane3_2_of_2,1758,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,0
12252,Patient01771_Plane3_2_of_4,1771,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,0
12258,Patient01772_Plane3_1_of_1,1772,Fetal brain,Trans-ventricular,Op. 1,Voluson E6,0


In [9]:
#print(df_datasets_fb_Trans_ventricular["Image_name"].to_string())