# Genera struttura dataset + labels come richiesto da ICAFusion

```text
Per primissima cosa definisci qui il path al dataset tuo personale relativo alla posizione di questo ipynb.
Esempio:
ADL-Project 
     kaist_cvpr15/
            annotations... 
    kaist_change_structure.ipynb

il mio dataset_path = "kaist-cvpr15"
```

In [13]:
from pathlib import Path

def set_dataset_path(path_str):
    """
    Imposta il path del dataset e restituisce un oggetto Path per il resto del notebook.
    """
    dataset_path = Path(path_str).expanduser().resolve()
    print(f"Percorso dataset impostato a:\n{dataset_path}")
    return dataset_path

In [14]:
dataset_path = set_dataset_path('kaist-cvpr15')  # Modifica questo percorso secondo le tue esigenze 

# Karolina /scratch/project/eu-25-19/kaist-cvpr15

Percorso dataset impostato a:
/home/honey/ADL-Project/kaist-cvpr15


#### Dalle annotazioni XML alle annotazioni in formato YOLO (.txt) 
```text
- YOLO Format, alcuni dettagli : 
    Ogni immagine ha un file .txt con lo stesso nome dell’immagine, e dentro ci sono delle righe.
    Ogni riga rappresenta un oggetto, con la seguente struttura:
    
    <class_id> <x_center> <y_center> <width> <height>

    - class_id → indice intero che rappresenta la classe (0,1,2,…).
    - x_center, y_center → coordinate del centro della bounding box
    - width, height → dimensioni della bounding box

    I valori (x, y, w, h) sono sempre normalizzati e quindi compresi tra 0 e 1.
```

### Indentificazione classi da XML

```text
Come primo passo, utilizzando la lista dei sample destinati al training così da garantire imparzialita' al test set ed evitare l’inclusione dei dati usati nel pretraining SSL, identifichiamo tutte le label testuali uniche nelle nostre annotazioni XML, contandone per ciascuna la frequenza.
``` 

--> lista target "Kaist_txt_lists/Training_split_25_forObjDet"

In [15]:
from pathlib import Path
import xml.etree.ElementTree as ET
from collections import defaultdict

def count_objects_from_list(annotation_file_name, list_path):
    """
    Conta le istanze di ciascuna classe solo per i sample elencati nella lista target.
    - annotation_file_name: nome del file dove si trovano le annotazioni XML (es: "annotations-xml-new")
    - list_path: Path del file contenente i sample da scandire (es: "Kaist_txt_lists/Training_split_25_forObjDet.txt")
    """
    base_dir = dataset_path / annotation_file_name
    list_path = Path(list_path)


    # Legge tutti i percorsi relativi dalla lista
    with open(list_path, "r") as f:
        relative_paths = [line.strip() for line in f if line.strip()]

    # Trova i file XML corrispondenti
    xml_files = []
    for rel_path in relative_paths:
        xml_file = base_dir / rel_path
        if not xml_file.suffix.lower() == ".xml":
            xml_file = xml_file.with_suffix(".xml")
        if xml_file.exists():
            xml_files.append(xml_file)
        else:
            print(f"File non trovato: {xml_file}")

    class_names = set()
    class_counts = defaultdict(int)

    print(f"Trovati {len(xml_files)} file XML dalla lista, inizio analisi...")

    for xml_path in xml_files:
        try:
            root = ET.parse(xml_path).getroot()
            for obj in root.findall("object"):
                name = (obj.findtext("name") or "").strip().lower()
                if name:
                    class_names.add(name)
                    class_counts[name] += 1
        except Exception as e:
            print(f"Errore nel file {xml_path}: {e}")

    print("Classi uniche trovate:")
    for cls in sorted(class_names):
        print(f"- {cls}")

    print("\nConteggio oggetti per classe:")
    for cls in sorted(class_counts):
        print(f"{cls}: {class_counts[cls]}")

# Esempio d'uso:
# count_objects_from_list("annotations-xml-new", "miei_sample.txt")

Esecuzione script precedente.
```
N.B. Il path alla lista si presuppone univoca per tutti essendo tracciata da Git

In [16]:
count_objects_from_list("annotations-xml-new", "Kaist_txt_lists/Training_split_25_forObjDet.txt")

Trovati 26835 file XML dalla lista, inizio analisi...
Classi uniche trovate:
- cyclist
- people
- person
- person?

Conteggio oggetti per classe:
cyclist: 2494
people: 2805
person: 23280
person?: 720


In [17]:
#fast check per sanitized
count_objects_from_list("annotations-xml-new-sanitized", "Kaist_txt_lists/Training_split_25_forObjDet.txt")

Trovati 26835 file XML dalla lista, inizio analisi...
Classi uniche trovate:
- cyclist
- people
- person
- person?

Conteggio oggetti per classe:
cyclist: 2494
people: 2805
person: 23280
person?: 720


### Funzione per convertire le annotazioni XML in YOLO

In [21]:
from pathlib import Path
import xml.etree.ElementTree as ET
import os

# Dizionario che mappa le classi testuali agli ID numerici YOLO 
CLASS_MAP = {
    "person": 0,  # sono le classi testuali nei file XML KAIST, mappate ad un ID numerico
    "people": 0,   # per ora unifichiamo in una sola classe tutte le buonding boxes
    "cyclist": 0, 
    "person?": 0  
}


def xml_dir_to_yolo_txt(xml_path, out_path):
    """
    - Input XML 
    - Output TXT (YOLO) 
    
    La struttura di cartelle è preservata.
    """
    
    try:
         # Parsing file XML, ottiene la radice
        root = ET.parse(xml_path).getroot()

        # Ottiene le dimensioni dell'immagine
        W = float(root.findtext("size/width"))
        H = float(root.findtext("size/height"))
        if W <= 0 or H <= 0:
            raise ValueError(f"Dimensioni non valide: W={W}, H={H}")

        lines = []
        for obj in root.findall("object"):
            # Prendi nome classe, rimuovi spazi e minuscolo
            name = (obj.findtext("name") or "").strip().lower()
            if name not in CLASS_MAP:
                # Salta classi non mappate (non dovrebbero esserci)
                continue
            cls_id = CLASS_MAP[name]

            
            # Estrai bbox (in pixel)
            b = obj.find("bndbox")
            x = float(b.findtext("x"))
            y = float(b.findtext("y"))
            w = float(b.findtext("w"))
            h = float(b.findtext("h"))

            # Clipping: calcola angoli, assicura che siano nel range immagine
            x1 = max(0.0, x)
            y1 = max(0.0, y)
            x2 = min(W, x + w)
            y2 = min(H, y + h)

            bw = x2 - x1
            bh = y2 - y1
            if bw <= 0 or bh <= 0:
                # Bbox degenerata o completamente fuori immagine
                continue
 
            # Calcola centro e normalizza
            cx = (x1 + x2) / 2.0 / W
            cy = (y1 + y2) / 2.0 / H
            bw_n = bw / W
            bh_n = bh / H
 
            # Clipping numerico in [0,1] per sicurezza
            cx = min(max(cx, 0.0), 1.0)
            cy = min(max(cy, 0.0), 1.0)
            bw_n = min(max(bw_n, 0.0), 1.0)
            bh_n = min(max(bh_n, 0.0), 1.0)

            # Prepara la riga formato YOLO pronta per il TXT
            lines.append(f"{int(cls_id)} {cx:.8f} {cy:.8f} {bw_n:.8f} {bh_n:.8f}")

            # lines.append(f"{int(cls_id)} {x:.6f} {y:.6f} {w:.6f} {h:.6f}")

        # Scrive nell'output preservando la struttura di cartelle
        # out_path = (out_dir / rel).with_suffix(".txt")
        out_path.parent.mkdir(parents=True, exist_ok=True)
        with open(out_path, "w", encoding="utf-8") as f:
            f.write("\n".join(lines))

    except Exception as e:
        # Avviso in caso di errore parsing/conversione
        print(f"[WARN] Errore su '{xml_path}': {e}")

### Definizione input path e output path

In [22]:
from pathlib import Path
import os

#ricordo che abbiamo impostato precedentemente dataset_path

sanitized = False  # True se vuoi versione sanitized, False altrimenti

if not sanitized:
    xml_dir = dataset_path / "annotations-xml-new"
    txt_dir = dataset_path / "annotations-txt-new"
else:
    xml_dir = dataset_path / "annotations-xml-new-sanitized"
    txt_dir = dataset_path / "annotations-txt-new-sanitized"

os.makedirs(txt_dir, exist_ok=True)    # Crea la cartella di output se non esiste

In [23]:
from pathlib import Path

# xml_dir e txt_dir già definiti come Path, come da cella precedente


for dir in sorted(xml_dir.iterdir()):
    if dir.is_dir():
        print(f"Converting set: {dir.name}...")
        for subdir in sorted(dir.iterdir()):
            if subdir.is_dir():
                for file in sorted(subdir.iterdir()):
                    if file.is_file() and file.suffix.lower() == ".xml":
                        out_path = txt_dir / dir.name / subdir.name / file.with_suffix('.txt').name
                        xml_dir_to_yolo_txt(file, out_path)
                

Converting set: set00...
Converting set: set01...
Converting set: set02...
Converting set: set03...
Converting set: set04...
Converting set: set05...
Converting set: set06...
Converting set: set07...
Converting set: set08...
Converting set: set09...
Converting set: set10...
Converting set: set11...


#### Conformity checks

1) numero di oggetti == numero di righe nel txt

In [27]:
from pathlib import Path

sample = "set00/V000/I01216"

file_xml_path = xml_dir / (sample + ".xml")
file_txt_path = txt_dir / (sample + ".txt")


# Verifica esistenza file
assert file_xml_path.is_file(), f"File XML non trovato: {file_xml_path}"
assert file_txt_path.is_file(), f"File TXT non trovato: {file_txt_path}"

# Stampa contenuti
with open(file_xml_path, 'r', encoding='utf-8') as f:
    lines = f.readlines()
    xml_content = f.read()
    print("------ XML ORIGINALE ------")
    for line in lines[15:]:    # salto le prime 15 righe di header
        print(line, end='')

with open(file_txt_path, 'r', encoding='utf-8') as f:
    txt_content = f.read()
    print("\n------ TXT GENERATO ------")
    print(txt_content)


------ XML ORIGINALE ------
  <object>
    <name>person</name>
    <bndbox>
      <x>194</x>
      <y>213</y>
      <w>20</w>
      <h>42</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>person</name>
    <bndbox>
      <x>209</x>
      <y>215</y>
      <w>21</w>
      <h>43</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
</annotation>

------ TXT GENERATO ------
0 0.31875000 0.45703125 0.03125000 0.08203125
0 0.34296875 0.46191406 0.03281250 0.08398438


2) Se ripeto le trasformazioni sul singolo esempio XML estratto, effettivamente corrisponde a quanto calcolato in precedenza presente ora nel txt?

In [39]:
import xml.etree.ElementTree as ET

# Parsing XML
root = ET.parse(file_xml_path).getroot()
W = float(root.findtext("size/width"))
H = float(root.findtext("size/height"))

# Estrai bbox da XML
boxes = []
for obj in root.findall("object"):
    name = (obj.findtext("name") or "").strip().lower()
    b = obj.find("bndbox")
    x = float(b.findtext("x"))
    y = float(b.findtext("y"))
    w = float(b.findtext("w"))
    h = float(b.findtext("h"))
    # Stesse formule della funzione di conversione !
    x1 = max(0.0, x)
    y1 = max(0.0, y)
    x2 = min(W, x + w)
    y2 = min(H, y + h)
    bw = x2 - x1
    bh = y2 - y1
    if bw <= 0 or bh <= 0:
        continue
    cx = (x1 + x2)/2.0/W
    cy = (y1 + y2)/2.0/H
    bw_n = bw/W
    bh_n = bh/H
    cx = min(max(cx, 0.0), 1.0)
    cy = min(max(cy, 0.0), 1.0)
    bw_n = min(max(bw_n, 0.0), 1.0)
    bh_n = min(max(bh_n, 0.0), 1.0)
    boxes.append((name, cx, cy, bw_n, bh_n))

# Leggi il TXT generato dal convertitore
with open(file_txt_path, 'r', encoding='utf-8') as f:
    txt_lines = [line.strip() for line in f if line.strip()]

print("\nCONFRONTO BBOX (XML → calcolo Python → TXT):\n")

for i, (name, cx, cy, bw_n, bh_n) in enumerate(boxes):
    txt = txt_lines[i] if i < len(txt_lines) else "<NESSUNA RIGA>"
    print(f"BBox {i}: \nCalcolati: cx={cx:.8f} cy={cy:.8f} bw={bw_n:.8f} bh={bh_n:.8f}")
    vals = txt.split()
    print("dal txt:", " ".join(vals[1:]), "\n")
    



CONFRONTO BBOX (XML → calcolo Python → TXT):

BBox 0: 
Calcolati: cx=0.31875000 cy=0.45703125 bw=0.03125000 bh=0.08203125
dal txt: 0.31875000 0.45703125 0.03125000 0.08203125 

BBox 1: 
Calcolati: cx=0.34296875 cy=0.46191406 bw=0.03281250 bh=0.08398438
dal txt: 0.34296875 0.46191406 0.03281250 0.08398438 



In [15]:
sample = "set06/V000/I00259"
# sample = "set06_V000_I00339"

file_xml_path = os.path.join(xml_dir, sample.replace("_", "/") + '.xml')
file_txt_path = os.path.join(txt_dir, sample.replace("_", "/") + '.txt')
file_txt_example_from_icafusion = os.path.join("/scratch/project/eu-25-19/KAIST/labels/test/", sample.replace("/", "_") +".txt")

print(file_xml_path)
print(file_txt_path)

# controllo che i due file esistano:
assert os.path.isfile(file_xml_path), f"File XML non trovato: {file_xml_path}"
assert os.path.isfile(file_txt_path), f"File TXT non trovato: {file_txt_path}"

# stampa il contenuto dei due file:
with open(file_xml_path, 'r', encoding='utf-8') as f:
    xml_content = f.read()
    print("Contenuto XML:")
    print(xml_content)
with open(file_txt_path, 'r', encoding='utf-8') as f:
    txt_content = f.read()
    print("Contenuto TXT:")
    print(txt_content)
with open(file_txt_example_from_icafusion, 'r', encoding='utf-8') as f:
    txt_content = f.read()
    print("Contenuto TXT from icafusion:")
    print(txt_content)

/scratch/project/eu-25-19/kaist-cvpr15/annotations-xml-new/set06/V000/I00259.xml
/scratch/project/eu-25-19/kaist-cvpr15/annotations-txt-new/set06/V000/I00259.txt
Contenuto XML:
<annotation>
  <folder>KAIST Multispectral Ped Benchmark</folder>
  <filename>set06/V000/I00259</filename>
  <source>
    <database>KAIST pedestrian</database>
    <annotation>KAIST pedestrian</annotation>
    <image>KAIST pedestrian</image>
    <url>https://soonminhwang.github.io/rgbt-ped-detection/</url>
  </source>
  <size>
    <width>640</width>
    <height>512</height>
    <depth>4</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>person</name>
    <bndbox>
      <x>460</x>
      <y>218</y>
      <w>20</w>
      <h>36</h>
    </bndbox>
    <pose>unknown</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <occlusion>0</occlusion>
  </object>
  <object>
    <name>person?</name>
    <bndbox>
      <x>72</x>
      <y>202</y>
      <w>33</w>
      <h>51</h>
    </bndbox>
    <pos

Perfetto tutto coerente!

### Struttura dataset per ICAFusion
```text
ICAFusion si aspetta il dataset kaist cosi diviso nelle seguenti cartelle
visible/        
    train/    Contiene tutte le immagini visible senza alcuna divisione in set o video
    test/        
Infrared/
    train/    Contiene tutte le corrispondenti immagini infrared senza alcuna divisione in set o video
    test/     
labels/
    train/    Contiene tutte le corrispondenti annotazioni YOLO che abbiamo appena creato
    test/

Togliendo la divisione in set e video ogni immagine verra' identificata dal nuovo nome composto da SETxx_VYYY_Izzzzz
```

In [44]:
for mode in ["train", "test"]:
    (dataset_path / "visible" / mode).mkdir(parents=True, exist_ok=True)
    (dataset_path / "infrared" / mode).mkdir(parents=True, exist_ok=True)
    (dataset_path / "labels" / mode).mkdir(parents=True, exist_ok=True)



### Riempimento cartelle dataset
```text
Questo script legge due liste di sample (una per il training e una per il test), 
contenenti i path relativi ai sample (es: 'set06/V000/I00001'), e copia:
    - le immagini visibili (“visible”) nelle rispettive cartelle (visible/train, visible/test)
    - le immagini infrarosse (“infrared”) nelle rispettive cartelle (infrared/train, infrared/test)
    - le label YOLO in formato .txt nelle rispettive cartelle (labels/train, labels/test)

Il nome di ogni file copiato viene normalizzato in:
    <set>_<video>_<immagine>.jpg (per visible e infrared)
    <set>_<video>_<immagine>.txt (per labels)
```

In [43]:
from pathlib import Path
import shutil

def copy_all_samples_from_lists(train_list_path, test_list_path):
    """
    Copia immagini (visible, infrared) e label YOLO per i sample presenti nelle liste train/test.
    """
    datasets = {
        'train': train_list_path,
        'test': test_list_path
    }

    for subset, list_path in datasets.items():
        print(f"\n === PROCESSING {subset.upper()} SET ===")
        with open(list_path, "r") as f:
            samples = [line.strip() for line in f if line.strip()]

        for sample in samples:
            tokens = sample.split("/")
            if len(tokens) != 3:
                print(f"Formato sample non valido: {sample}")
                continue

            set_dir, v_dir, file_base = tokens
            # Formato immagini: Ixxxxx.jpg                  !!
            img_name = file_base + ".jpg"
            vis_path = dataset_path / "images" / set_dir / v_dir / "visible" / img_name
            lwir_path = dataset_path / "images" / set_dir / v_dir / "lwir" / img_name

            dest_vis = dataset_path / "visible" / subset / f"{set_dir}_{v_dir}_{img_name}"
            dest_lwir = dataset_path / "infrared" / subset / f"{set_dir}_{v_dir}_{img_name}"

            # Copia immagini visible
            if vis_path.is_file():
                if not dest_vis.is_file():
                    shutil.copy2(vis_path, dest_vis)
                else:
                    print(f"File visibile già esistente (skip): {dest_vis}")
            else:
                print(f"File visibile mancante: {vis_path}")

            # Copia immagini infrared
            if lwir_path.is_file():
                if not dest_lwir.is_file():
                    shutil.copy2(lwir_path, dest_lwir)
                else:
                    print(f"File lwir già esistente (skip): {dest_lwir}")
            else:
                print(f"File lwir mancante: {lwir_path}")

            # ===== LABELS =====
            orig_label = dataset_path / "annotations-txt-new" / set_dir / v_dir / (file_base + ".txt")
            dest_label = dataset_path / "labels" / subset / f"{set_dir}_{v_dir}_{file_base}.txt"
            if orig_label.is_file():
                if not dest_label.is_file():
                    shutil.copy2(orig_label, dest_label)
                else:
                    print(f"Label già esistente (skip): {dest_label}")
            else:
                print(f"Label mancante: {orig_label}")

# === ESEMPIO USO ===
# copy_all_samples_from_lists(dataset_path, "train_list.txt", "test_list.txt")

```
N.B. I path alle liste si presuppongono univoci per tutti essendo tracciati da Git

In [45]:
copy_all_samples_from_lists("Kaist_txt_lists/Training_split_25_forObjDet.txt", "Kaist_txt_lists/Test_split_50.txt")


 === PROCESSING TRAIN SET ===

 === PROCESSING TEST SET ===


### Conformity checks

```text
Verifichiamo:
1- Quanti file sono presenti nelle cartelle 
        * visible/train
        * infrared/train
        * labels/train
    ! Coerente se uguale al numero di sample presenti nella lista di training utilizzata
2- Trova tutte le triplette di file che hanno lo stesso nome base (esclusa l'estensione)
    ! Coerente se vengono individuate esattamente un numero di triplette pari al numero di sample nella lista di training utilizzata
3- Individuazione di eventuali file "orfani", cioè presenti solo in una o due delle cartelle.
    ! Coerente se risulta uguale a zero
```

In [47]:
from pathlib import Path


visible_dir = dataset_path / "visible" / "train"
infrared_dir = dataset_path / "infrared" / "train"
labels_dir = dataset_path / "labels" / "train"

# Ottiene tutti i nomi file (senza estensione)
visible_files = {f.stem for f in visible_dir.iterdir() if f.is_file()}
infrared_files = {f.stem for f in infrared_dir.iterdir() if f.is_file()}
labels_files = {f.stem for f in labels_dir.iterdir() if f.is_file()}

print(f"File in visible/train:  {len(visible_files)}")
print(f"File in infrared/train: {len(infrared_files)}")
print(f"File in labels/train:   {len(labels_files)}")

# Intersezione: solo quelli che hanno lo stesso nome in tutte e tre le cartelle
triplette = visible_files & infrared_files & labels_files

print(f"\nNumero di triplette complete (stesso nome presente in tutte e tre le cartelle): {len(triplette)}")

# Chi manca in una delle tre cartelle
solo_vis = visible_files - triplette
solo_ir = infrared_files - triplette
solo_lbl = labels_files - triplette

print(f"\nSolo visible, non triplette: {len(solo_vis)}")
print(f"Solo infrared, non triplette: {len(solo_ir)}")
print(f"Solo labels, non triplette: {len(solo_lbl)}")



File in visible/train:  26835
File in infrared/train: 26835
File in labels/train:   26835

Numero di triplette complete (stesso nome presente in tutte e tre le cartelle): 26835

Solo visible, non triplette: 0
Solo infrared, non triplette: 0
Solo labels, non triplette: 0


Ripetiamo i checks anche sui sample del test set

In [48]:
from pathlib import Path


visible_dir = dataset_path / "visible" / "test"
infrared_dir = dataset_path / "infrared" / "test"
labels_dir = dataset_path / "labels" / "test"

# Ottiene tutti i nomi file (senza estensione)
visible_files = {f.stem for f in visible_dir.iterdir() if f.is_file()}
infrared_files = {f.stem for f in infrared_dir.iterdir() if f.is_file()}
labels_files = {f.stem for f in labels_dir.iterdir() if f.is_file()}

print(f"File in visible/train:  {len(visible_files)}")
print(f"File in infrared/train: {len(infrared_files)}")
print(f"File in labels/train:   {len(labels_files)}")

# Intersezione: solo quelli che hanno lo stesso nome in tutte e tre le cartelle
triplette = visible_files & infrared_files & labels_files

print(f"\nNumero di triplette complete (stesso nome presente in tutte e tre le cartelle): {len(triplette)}")

# Chi manca in una delle tre cartelle
solo_vis = visible_files - triplette
solo_ir = infrared_files - triplette
solo_lbl = labels_files - triplette

print(f"\nSolo visible, non triplette: {len(solo_vis)}")
print(f"Solo infrared, non triplette: {len(solo_ir)}")
print(f"Solo labels, non triplette: {len(solo_lbl)}")


File in visible/train:  45140
File in infrared/train: 45140
File in labels/train:   45140

Numero di triplette complete (stesso nome presente in tutte e tre le cartelle): 45140

Solo visible, non triplette: 0
Solo infrared, non triplette: 0
Solo labels, non triplette: 0


```
Test superati

# FINE 

# Copio i file dal dataset originale alla "struttura" richiesta da icafusion

In [20]:
import shutil

test_sets_dir = ["set06", "set07", "set08", "set09", "set10", "set11"]
train_objsdet_sets_dir = ["set03", "set04", "set05"]
pretrain_sets_dir = ["set00", "set01", "set02"] # useless for Icafusion

# Copio le immagini:

In [None]:
for set_dir in sorted(os.listdir(KAIST_FOLDER + '/images')):
    if set_dir in test_sets_dir:
        subset = 'test'
    elif set_dir in train_objsdet_sets_dir:
        subset = 'train'
    elif set_dir in pretrain_sets_dir:
        # non ci interessa per icafusion
        continue
    else:
        print(f"Set non riconosciuto (skip): {set_dir}")
        continue

    print(f"Processing set: {set_dir} as {subset}...")
    for v_dir in sorted(os.listdir(os.path.join(KAIST_FOLDER, 'images', set_dir))):
        print(f" Processing subset: {v_dir}..")
        print(f"  Prossessin visible images...")
        for file in sorted(os.listdir(os.path.join(KAIST_FOLDER, 'images', set_dir, v_dir, "visible"))):
            file_path = os.path.join(KAIST_FOLDER, 'images', set_dir, v_dir, "visible", file)
            if os.path.isfile(file_path):
                dest_path = os.path.join(KAIST_FOLDER, 'visible', subset, f"{set_dir}_{v_dir}_{file}")
                if not os.path.isfile(dest_path):
                    shutil.copy2(file_path, dest_path)
                else:
                    print(f"File già esistente (skip): {dest_path}")

        print(f"  Prossessin infrared images...")
        for file in sorted(os.listdir(os.path.join(KAIST_FOLDER, 'images', set_dir, v_dir, "lwir"))):
            file_path = os.path.join(KAIST_FOLDER, 'images', set_dir, v_dir, "lwir", file)
            if os.path.isfile(file_path):
                dest_path = os.path.join(KAIST_FOLDER, 'infrared', subset, f"{set_dir}_{v_dir}_{file}")
                if not os.path.isfile(dest_path):
                    shutil.copy2(file_path, dest_path)
                else:
                    print(f"File già esistente (skip): {dest_path}")

        
    

Processing set: set11 as train...
 Processing subset: V000..
  Prossessin visible images...
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00000.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00001.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00002.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00003.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00004.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00005.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00006.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I00007.jpg
File già esistente (skip): /scratch/project/eu-25-19/kaist-cvpr15/visible/train/set03_V000_I

In [23]:
print(f"Prossessin label files...")

for set_dir in sorted(os.listdir(KAIST_FOLDER + '/annotations-txt-new')):
    if set_dir in test_sets_dir:
        subset = 'test'
    elif set_dir in train_objsdet_sets_dir:
        subset = 'train'
    elif set_dir in pretrain_sets_dir:
        # non ci interessa per icafusion
        continue
    else:
        print(f"Set non riconosciuto (skip): {set_dir}")
        continue

    
    print(f" Processing set: {set_dir} as {subset}...")
    for v_dir in sorted(os.listdir(os.path.join(KAIST_FOLDER, 'annotations-txt-new', set_dir))):
        print(f"  Processing subset: {v_dir}..")
        for file in sorted(os.listdir(os.path.join(KAIST_FOLDER, 'annotations-txt-new', set_dir, v_dir,))):
            file_path = os.path.join(KAIST_FOLDER, 'annotations-txt-new', set_dir, v_dir, file)
            if os.path.isfile(file_path):
                dest_path = os.path.join(KAIST_FOLDER, 'labels', subset, f"{set_dir}_{v_dir}_{file}")
                if not os.path.isfile(dest_path):
                    shutil.copy2(file_path, dest_path)
                else:
                    print(f"File già esistente (skip): {dest_path}")

Prossessin label files...
 Processing set: set03 as train...
  Processing subset: V000..
  Processing subset: V001..
 Processing set: set04 as train...
  Processing subset: V000..
  Processing subset: V001..
 Processing set: set05 as train...
  Processing subset: V000..
 Processing set: set06 as test...
  Processing subset: V000..
  Processing subset: V001..
  Processing subset: V002..
  Processing subset: V003..
  Processing subset: V004..
 Processing set: set07 as test...
  Processing subset: V000..
  Processing subset: V001..
  Processing subset: V002..
 Processing set: set08 as test...
  Processing subset: V000..
  Processing subset: V001..
  Processing subset: V002..
 Processing set: set09 as test...
  Processing subset: V000..
 Processing set: set10 as test...
  Processing subset: V000..
  Processing subset: V001..
 Processing set: set11 as test...
  Processing subset: V000..
  Processing subset: V001..


# Genero tutte la labels per icafusion e le salvo labels-for-icafusion (deprecato)

In [None]:
# !pwd
# 
# !cd /scratch/project/eu-25-19/kaist-cvpr15/
# 
# KAIST_FOLDER = '/scratch/project/eu-25-19/kaist-cvpr15/'
# os.makedirs(f'{KAIST_FOLDER}/labels-for-icafusion', exist_ok=True)

/mnt/proj3/eu-25-19/davide_secco/ADL-Project


In [None]:
# for dir in sorted(os.listdir(f'{KAIST_FOLDER}/annotations-xml-new')):
#     print(f"Convertig set: {dir}...")
#     for subdir in sorted(os.listdir(f'{KAIST_FOLDER}/annotations-xml-new/'+dir)):
#         print(f" Convertig subset: {subdir}..")
#         for file in sorted(os.listdir(os.path.join(f'{KAIST_FOLDER}/annotations-xml-new', dir, subdir))):
#             file_path = os.path.join(f'{KAIST_FOLDER}/annotations-xml-new', dir, subdir, file)
#             if os.path.isfile(file_path):
#                 xml_dir_to_yolo_txt(Path(file_path), Path(f'{KAIST_FOLDER}/labels-for-icafusion/'+dir+subdir+file.replace('.xml', '.txt')))
                

/mnt/proj3/eu-25-19/davide_secco/ADL-Project
Convertig set: set00...
 Convertig subset: V000..
 Convertig subset: V001..
 Convertig subset: V002..
 Convertig subset: V003..
 Convertig subset: V004..
 Convertig subset: V005..
 Convertig subset: V006..
 Convertig subset: V007..
 Convertig subset: V008..
Convertig set: set01...
 Convertig subset: V000..
 Convertig subset: V001..
 Convertig subset: V002..
 Convertig subset: V003..
 Convertig subset: V004..
 Convertig subset: V005..
Convertig set: set02...
 Convertig subset: V000..
 Convertig subset: V001..
 Convertig subset: V002..
 Convertig subset: V003..
 Convertig subset: V004..
Convertig set: set03...
 Convertig subset: V000..
 Convertig subset: V001..
Convertig set: set04...
 Convertig subset: V000..
 Convertig subset: V001..
Convertig set: set05...
 Convertig subset: V000..
Convertig set: set06...
 Convertig subset: V000..
 Convertig subset: V001..
 Convertig subset: V002..
 Convertig subset: V003..
 Convertig subset: V004..
Convert

# Move the test-label to dedicated folder

In [4]:
!pwd

os.chdir(f'{KAIST_FOLDER}')

!ls -l

os.makedirs(f'{KAIST_FOLDER}/labels-test-icafusion', exist_ok=True)

!mv 'labels-for-icafusion'/set06* labels-test-icafusion/ 
!mv 'labels-for-icafusion'/set07* labels-test-icafusion/
!mv 'labels-for-icafusion'/set08* labels-test-icafusion/
!mv 'labels-for-icafusion'/set09* labels-test-icafusion/
!mv 'labels-for-icafusion'/set10* labels-test-icafusion/
!mv 'labels-for-icafusion'/set11* labels-test-icafusion/


/mnt/proj3/eu-25-19/davide_secco/ADL-Project


total 10588
drwxrws---+ 14 it4i-seccod eu-25-19    4096 Sep  5 16:07 annotations-xml-new
drwxrws---+ 14 it4i-seccod eu-25-19    4096 Sep  5 15:29 annotations-xml-new-sanitized
drwxrws---+  3 it4i-seccod eu-25-19    4096 Sep  5 16:14 davide_secco
drwxrws---+ 14 it4i-seccod eu-25-19    4096 Sep  5 16:03 images
drwxrws---+  3 it4i-seccod eu-25-19    4096 Sep  5 15:29 imageSets
drwxrws---+  2 it4i-seccod eu-25-19  151552 Sep  5 15:26 kaist-dets
-rw-rw----+  1 it4i-seccod eu-25-19  776816 Sep  5 15:26 kaist-dets.zip
drwxrws---+  2 it4i-seccod eu-25-19 6684672 Sep  5 17:34 labels-for-icafusion
drwxrws---+  2 it4i-seccod eu-25-19 3186688 Sep  5 17:34 labels-test-icafusion
mv: cannot stat 'labels-for-icafusion/set06*': No such file or directory
mv: cannot stat 'labels-for-icafusion/set07*': No such file or directory
mv: cannot stat 'labels-for-icafusion/set08*': No such file or directory
mv: cannot stat 'labels-for-icafusion/set09*': No such file or directory
mv: cannot stat 'labels-for-icafus

In [7]:
!pwd

!ls images



/scratch/project/eu-25-19/kaist-cvpr15


set00  set02  set04  set06  set08  set10
set01  set03  set05  set07  set09  set11
