# **Notebook para criar um ficheiro csv com a informação referente ao dataset de scans**
<hr>

As input data, it will be used FDG-PET/CT and radiotherapy planning CT imaging data from 298 head and neck cancer patients.

The data comes from four different institutions:
 - __* (HGJ)  Hôpital Général Juif, Montréal - Discovery ST, GE Healthcare *__ 
             - for some patients the contours were drawn the same way as HMR 
             but the other were directly drawn on the CT scan of the FDG-PET/CT scan
 - __* (CHUS) Centre Hospitalier Universitaire de Sherbooke, Sherbrooke - Guardian Body, Philips Medical Systems *__
              - for some patients the contours were drawn the same way as HMR 
             but the other were directly drawn on the CT scan of the FDG-PET/CT scan
 - __* (HMR)  Hôpital Maisonneuve-Rosemont, Montréal - Discovery STE, GE Healthcare  *__
             - the radiotherapy contours were drawn on a different CT scan dedicated to treatment planning 
             and then propagated to the FDG-PET/CT scan reference frame 
             using deformable registration with the software MIM
 - __* (CHUM) Centre Hospitalier de l’Université de Montréal, Montréal - Discovery ST, GE Healthcare  *__
             - all patients received their FDG-PET/CT scan dedicated to the head and neck area 
             right before their planning CT scan, in the same position with the immobilisation device. 
             Contours defining the gross tumour volume (GTV) and lymph nodes were 
             drawn by an expert radiation oncologist on the planning CT scan
<br><br>

In [1]:
import nibabel as nib
import numpy as np
import os
from glob import glob
import fnmatch
import pandas as pd
import pydicom
import csv
import types
import os.path

In [2]:
def create_csv_dataset(dataPath):
    scansPath = glob(dataPath+'*/*/*/*.dcm', recursive=True) #lista com o caminho de cada um dos dados na diretoria
    print("Total of %d DICOM images." %len(scansPath)) #retorna o numero de scans
    t=0
    scans=[]
    #scans_dose = [] 
    #mod = []
    for file in scansPath: 
        
        ds = pydicom.filereader.dcmread(file)
        modality = ds.Modality
        
            
        #path, filename = os.path.split(file) #print(path) #retorna somente o caminho #print(filename) #retorna somente o nome
        path=file[len(dataPath):] # para retirar o dataPath que veio como parametro
        lf=path.split("/") #para colocar as subdiretorias numa lista
        #['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000068.dcm']
        t+=1
        if t < 10:
            print(lf) #para imprimir os primeiros 10 registos
        id_sep=lf[0].split("-") #para poder ir buscar o nome do hospital
        
        lf.insert(1, id_sep[1]) 
        lf.append(modality)
        
                     
        
        if 'PixelData' in ds:
            rows = int(ds.Rows)
            cols = int(ds.Columns)
            size = len(ds.PixelData)
            
        else:
            rows = 0
            cols = 0
            size = 0

            
                    
        lf.append(rows)
        lf.append(cols)    
        #lf.append(depth)
        lf.append(size)
        #lf.append(numberOfFrames)
        scans.append(lf)
        
    #criar o dataframe
    df = pd.DataFrame(scans, columns = ['patient', 'instituition','path1', 'path2','dcm','Modality','Rows','Columns','Size- bytes']) 
    #df.insert(loc=5, column='Modality', value=mod)
    #df_final = df.loc[(df['Modality'] == 'RTPLAN') & (df['Modality'] == 'RTSTRUCT') & (df['Modality'] == 'REG')]
    #df_1= df.drop(df_final.index)
    #vai ser preferivel colocar toda a informação num dataframe e depois colocar as imagens num numpyarray
    
    #o dataframe deve ser colocado num ficheiro csv para poder ser editado a qualquer momento
    df.to_csv ('complete_Head-Neck_dataset.csv')
    #df_1.to_csv ('semi_Head-Neck_dataset.csv')
    #print("numero de scans na lista:", len(scans)) #retorna o numero de scans
    #print("numero de scans dose na lista:", len(scans_dose)) #retorna o numero de scans com a palavra dose
    return df
    #return df_1


# diretoria onde estao os exames com labels
dataPath = '../../data/tcia/Head-Neck-PET-CT/'
df = create_csv_dataset(dataPath)
df
#df_1

#todos são 123271

Total of 123271 DICOM images.
['HN-HGJ-040', '08-27-1885-HN-04353', '4-ARIA RadOnc Structure Sets-21343', '000000.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000068.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000099.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000126.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000041.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000127.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000110.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000124.dcm']
['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000141.dcm']


Unnamed: 0,patient,instituition,path1,path2,dcm,Modality,Rows,Columns,Size- bytes
0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,4-ARIA RadOnc Structure Sets-21343,000000.dcm,RTSTRUCT,0,0,0
1,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000068.dcm,CT,512,512,524288
2,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000099.dcm,CT,512,512,524288
3,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000126.dcm,CT,512,512,524288
4,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000041.dcm,CT,512,512,524288
5,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000127.dcm,CT,512,512,524288
6,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000110.dcm,CT,512,512,524288
7,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000124.dcm,CT,512,512,524288
8,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000141.dcm,CT,512,512,524288
9,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000086.dcm,CT,512,512,524288


In [16]:
dataset = pd.read_csv('complete_Head-Neck_dataset.csv')

dataset['Modality'].value_counts()

CT          75569
PT          45839
RTSTRUCT      831
RTDOSE        497
RTPLAN        329
REG           206
Name: Modality, dtype: int64

In [17]:
array=['CT','PT','RTSTRUCT']

dataset.loc[dataset['Modality'].isin(array)]

Unnamed: 0.1,Unnamed: 0,patient,instituition,path1,path2,dcm,Modality,Rows,Columns,Size- bytes
0,0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,4-ARIA RadOnc Structure Sets-21343,000000.dcm,RTSTRUCT,0,0,0
1,1,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000068.dcm,CT,512,512,524288
2,2,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000099.dcm,CT,512,512,524288
3,3,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000126.dcm,CT,512,512,524288
4,4,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000041.dcm,CT,512,512,524288
5,5,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000127.dcm,CT,512,512,524288
6,6,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000110.dcm,CT,512,512,524288
7,7,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000124.dcm,CT,512,512,524288
8,8,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000141.dcm,CT,512,512,524288
9,9,HN-HGJ-040,HGJ,08-27-1885-HN-04353,3-2.5mm-20274,000086.dcm,CT,512,512,524288


In [5]:
#todas as pastas que contem o exame CT
#falta a condicao: e que tambem tem PET
#quero todos os valores da coluna path1 em que tenha OBRIGATORIAMENTE modalidade PET e CT


#dataset.loc[dataset['path1'].where(dataset.query(path1= True and (Modality=='CT' or Modality=='PT'))]

In [6]:
pd.Series(dataset.Modality.values,index=dataset.path1)

path1
08-27-1885-HN-04353               RTSTRUCT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27-1885-HN-04353                     CT
08-27

In [7]:
#Criar um dicionario dos paths1- elimina os repetidos
#loc para cada elemento- da as linhas
#Selecionar os que tem as 2 modalidades

# to get key value using pandas
#path_dict = dict(zip(dataset.path1, dataset.Modality))
#print(path_dict)


# converting to dict 
#data_dict = dataset.to_dict() 
  
# printing datatype of first keys value in dict 
#print(type(data_dict['path1'])) 

# display 
#data_dict

#pd.Series(dataset.Modality.values,index=dataset.path1).to_dict()

#len(pd.Series(dataset.Modality.values,index=dataset.path1).to_dict())

#dataset.set_index('path1').T.to_dict('list')


#dataset.groupby('path1').filter(items = [dataset['Modality']]).drop_duplicates(subset=['Modality'], keep="first")


dataset.drop_duplicates(subset=['path1', 'Modality'], keep="first").drop(columns=['Unnamed: 0','path2','dcm','Rows','Columns','Size- bytes']).to_csv('paths.csv')


In [36]:
a = pd.read_csv('paths.csv')

In [37]:
a.drop(columns=['Unnamed: 0'])

Unnamed: 0,patient,instituition,path1,Modality
0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,RTSTRUCT
1,HN-HGJ-040,HGJ,08-27-1885-HN-04353,CT
2,HN-HGJ-040,HGJ,08-27-1885-HN-04353,RTPLAN
3,HN-HGJ-040,HGJ,08-27-1885-HN-04353,RTDOSE
4,HN-HGJ-040,HGJ,08-27-1885-HN-04353,REG
5,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,RTSTRUCT
6,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,CT
7,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,PT
8,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,RTDOSE
9,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,RTSTRUCT


In [38]:
d = {}
for i in a['path1'].unique():
    d[i] = {a['Modality'][j] for j in a[a['path1']==i].index}

In [39]:
print (d)

{'08-27-1885-HN-04353': {'REG', 'RTDOSE', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-PET HEAD AND NECK-48050': {'RTSTRUCT', 'RTDOSE', 'PT', 'CT'}, '08-27-1885-CA ORL FDG TEP POS TRAIT-29027': {'RTDOSE', 'PT', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-TomoTherapy Patient Disease-41326': {'REG', 'RTDOSE', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-PANC. avec C.A. SPHRE ORL  -TP-85645': {'CT', 'RTDOSE', 'PT', 'RTSTRUCT'}, '08-27-1885-CARCINOME FDG TEP-75326': {'RTDOSE', 'PT', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-TEP PANC. SPHERE ORL-20668': {'RTSTRUCT', 'RTDOSE', 'PT', 'CT'}, '08-27-1885-COU-53414': {'REG', 'RTDOSE', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-PET HEAD NECK-34933': {'RTSTRUCT', 'PT', 'CT'}, '08-27-1885-HN CORVUS  CONTRAST-79004': {'RTSTRUCT', 'REG', 'CT'}, '08-27-1885-TEP AVEC SYNCHRONISTIO-30759': {'RTSTRUCT', 'RTDOSE', 'PT', 'CT'}, '08-27-1885-LARYNXCOU-39491': {'REG', 'RTDOSE', 'CT', 'RTPLAN', 'RTSTRUCT'}, '08-27-1885-CA ORL FDG TEP-89483': {'RTDOSE', 'PT', 'CT', 'RTPLAN',

In [40]:
pd.DataFrame(list(d.items()), columns=['path1', 'Modalities'])

Unnamed: 0,path1,Modalities
0,08-27-1885-HN-04353,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
1,08-27-1885-PET HEAD AND NECK-48050,"{RTSTRUCT, RTDOSE, PT, CT}"
2,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
3,08-27-1885-TomoTherapy Patient Disease-41326,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
4,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,"{CT, RTDOSE, PT, RTSTRUCT}"
5,08-27-1885-CARCINOME FDG TEP-75326,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
6,08-27-1885-TEP PANC. SPHERE ORL-20668,"{RTSTRUCT, RTDOSE, PT, CT}"
7,08-27-1885-COU-53414,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
8,08-27-1885-PET HEAD NECK-34933,"{RTSTRUCT, PT, CT}"
9,08-27-1885-HN CORVUS CONTRAST-79004,"{RTSTRUCT, REG, CT}"


In [41]:
#Criar novo csv so com os paths, ou seja, sem os ficheiros dicom todos
def create_novo_csv(dataPath):
    scansPath = glob(dataPath+'*/*', recursive=True) #lista com o caminho de cada um dos dados na diretoria
    scans=[]
    

    for file in scansPath:
            
        #path, filename = os.path.split(file) #print(path) #retorna somente o caminho #print(filename) #retorna somente o nome
        path=file[len(dataPath):] # para retirar o dataPath que veio como parametro
        lf=path.split("/") #para colocar as subdiretorias numa lista
        #['HN-HGJ-040', '08-27-1885-HN-04353', '3-2.5mm-20274', '000068.dcm']

        id_sep=lf[0].split("-") #para poder ir buscar o nome do hospital
        
        lf.insert(1, id_sep[1]) 
                 
        scans.append(lf)
        
    #criar o dataframe
    df1 = pd.DataFrame(scans, columns = ['patient', 'instituition','path1']) 
    #vai ser preferivel colocar toda a informação num dataframe e depois colocar as imagens num numpyarray
    
    #o dataframe deve ser colocado num ficheiro csv para poder ser editado a qualquer momento
    df1.to_csv ('dataset_paths.csv')
  
    return df1



# diretoria onde estao os exames com labels
dataPath = '../../data/tcia/Head-Neck-PET-CT/'
df1 = create_novo_csv(dataPath)
df1



Unnamed: 0,patient,instituition,path1
0,HN-HGJ-040,HGJ,08-27-1885-HN-04353
1,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050
2,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027
3,HN-CHUM-026,CHUM,08-27-1885-TomoTherapy Patient Disease-41326
4,HN-CHUM-026,CHUM,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645
5,HN-CHUS-043,CHUS,08-27-1885-CARCINOME FDG TEP-75326
6,HN-HMR-040,HMR,08-27-1885-TEP PANC. SPHERE ORL-20668
7,HN-HMR-040,HMR,08-27-1885-COU-53414
8,HN-HGJ-056,HGJ,08-27-1885-PET HEAD NECK-34933
9,HN-HGJ-056,HGJ,08-27-1885-HN CORVUS CONTRAST-79004


In [42]:
t = pd.read_csv('dataset_paths.csv')

In [43]:
df2 = pd.DataFrame(list(d.items()), columns=['path1', 'Modalities'])

In [44]:
pd.concat([t, df2], axis=1)

Unnamed: 0.1,Unnamed: 0,patient,instituition,path1,path1.1,Modalities
0,0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,08-27-1885-HN-04353,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
1,1,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,08-27-1885-PET HEAD AND NECK-48050,"{RTSTRUCT, RTDOSE, PT, CT}"
2,2,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
3,3,HN-CHUM-026,CHUM,08-27-1885-TomoTherapy Patient Disease-41326,08-27-1885-TomoTherapy Patient Disease-41326,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
4,4,HN-CHUM-026,CHUM,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,"{CT, RTDOSE, PT, RTSTRUCT}"
5,5,HN-CHUS-043,CHUS,08-27-1885-CARCINOME FDG TEP-75326,08-27-1885-CARCINOME FDG TEP-75326,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
6,6,HN-HMR-040,HMR,08-27-1885-TEP PANC. SPHERE ORL-20668,08-27-1885-TEP PANC. SPHERE ORL-20668,"{RTSTRUCT, RTDOSE, PT, CT}"
7,7,HN-HMR-040,HMR,08-27-1885-COU-53414,08-27-1885-COU-53414,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
8,8,HN-HGJ-056,HGJ,08-27-1885-PET HEAD NECK-34933,08-27-1885-PET HEAD NECK-34933,"{RTSTRUCT, PT, CT}"
9,9,HN-HGJ-056,HGJ,08-27-1885-HN CORVUS CONTRAST-79004,08-27-1885-HN CORVUS CONTRAST-79004,"{RTSTRUCT, REG, CT}"


In [45]:
s = pd.concat([t, df2], axis=1)

In [46]:
s.drop(columns=['Unnamed: 0'])

Unnamed: 0,patient,instituition,path1,path1.1,Modalities
0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,08-27-1885-HN-04353,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
1,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,08-27-1885-PET HEAD AND NECK-48050,"{RTSTRUCT, RTDOSE, PT, CT}"
2,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
3,HN-CHUM-026,CHUM,08-27-1885-TomoTherapy Patient Disease-41326,08-27-1885-TomoTherapy Patient Disease-41326,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
4,HN-CHUM-026,CHUM,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,"{CT, RTDOSE, PT, RTSTRUCT}"
5,HN-CHUS-043,CHUS,08-27-1885-CARCINOME FDG TEP-75326,08-27-1885-CARCINOME FDG TEP-75326,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
6,HN-HMR-040,HMR,08-27-1885-TEP PANC. SPHERE ORL-20668,08-27-1885-TEP PANC. SPHERE ORL-20668,"{RTSTRUCT, RTDOSE, PT, CT}"
7,HN-HMR-040,HMR,08-27-1885-COU-53414,08-27-1885-COU-53414,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
8,HN-HGJ-056,HGJ,08-27-1885-PET HEAD NECK-34933,08-27-1885-PET HEAD NECK-34933,"{RTSTRUCT, PT, CT}"
9,HN-HGJ-056,HGJ,08-27-1885-HN CORVUS CONTRAST-79004,08-27-1885-HN CORVUS CONTRAST-79004,"{RTSTRUCT, REG, CT}"


In [53]:
result = s.loc[:,~s.columns.duplicated()]
result

Unnamed: 0.1,Unnamed: 0,patient,instituition,path1,Modalities
0,0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
1,1,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,"{RTSTRUCT, RTDOSE, PT, CT}"
2,2,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
3,3,HN-CHUM-026,CHUM,08-27-1885-TomoTherapy Patient Disease-41326,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
4,4,HN-CHUM-026,CHUM,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,"{CT, RTDOSE, PT, RTSTRUCT}"
5,5,HN-CHUS-043,CHUS,08-27-1885-CARCINOME FDG TEP-75326,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
6,6,HN-HMR-040,HMR,08-27-1885-TEP PANC. SPHERE ORL-20668,"{RTSTRUCT, RTDOSE, PT, CT}"
7,7,HN-HMR-040,HMR,08-27-1885-COU-53414,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
8,8,HN-HGJ-056,HGJ,08-27-1885-PET HEAD NECK-34933,"{RTSTRUCT, PT, CT}"
9,9,HN-HGJ-056,HGJ,08-27-1885-HN CORVUS CONTRAST-79004,"{RTSTRUCT, REG, CT}"


In [54]:
result.drop(columns=['Unnamed: 0'])

Unnamed: 0,patient,instituition,path1,Modalities
0,HN-HGJ-040,HGJ,08-27-1885-HN-04353,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
1,HN-HGJ-040,HGJ,08-27-1885-PET HEAD AND NECK-48050,"{RTSTRUCT, RTDOSE, PT, CT}"
2,HN-CHUS-091,CHUS,08-27-1885-CA ORL FDG TEP POS TRAIT-29027,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
3,HN-CHUM-026,CHUM,08-27-1885-TomoTherapy Patient Disease-41326,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
4,HN-CHUM-026,CHUM,08-27-1885-PANC. avec C.A. SPHRE ORL -TP-85645,"{CT, RTDOSE, PT, RTSTRUCT}"
5,HN-CHUS-043,CHUS,08-27-1885-CARCINOME FDG TEP-75326,"{RTDOSE, PT, CT, RTPLAN, RTSTRUCT}"
6,HN-HMR-040,HMR,08-27-1885-TEP PANC. SPHERE ORL-20668,"{RTSTRUCT, RTDOSE, PT, CT}"
7,HN-HMR-040,HMR,08-27-1885-COU-53414,"{REG, RTDOSE, CT, RTPLAN, RTSTRUCT}"
8,HN-HGJ-056,HGJ,08-27-1885-PET HEAD NECK-34933,"{RTSTRUCT, PT, CT}"
9,HN-HGJ-056,HGJ,08-27-1885-HN CORVUS CONTRAST-79004,"{RTSTRUCT, REG, CT}"


In [55]:
result.to_csv('final.csv')

In [8]:
rd = pd.read_csv('final.csv')

In [11]:
ordem = rd.sort_values(['patient'])

In [13]:
ordem.to_csv('final2.csv')