## Preprocess the Sim2Real dataset

1. Download the dataset from [Sim2Real-Fire GitHub repository](https://github.com/TJU-IDVLab/Sim2Real-Fire).
2. Extract all files from the dataset into your `Dataset` folder
3. Run the preprocessing function as below

In [None]:
import sys
import os

# Add code to path
module_path = os.path.abspath(".") + "/code"
if module_path not in sys.path:
    sys.path.append(module_path)

from dataset import preprocess_sim2real_dataset

In [6]:
preprocess_sim2real_dataset("Dataset/")

Converting JPG scenarios to NPY...
Converting JPG scenarios to NPY for Dataset/
Converting JPG scenarios to NPY for Dataset/0004_01191


100%|██████████| 1191/1191 [1:38:19<00:00,  4.95s/it]   


Converting JPG scenarios to NPY for Dataset/.DS_Store
Converting JPG scenarios to NPY for Dataset/0003_01715


100%|██████████| 1716/1716 [09:33<00:00,  2.99it/s] 


Converting JPG scenarios to NPY for Dataset/0005_00725


100%|██████████| 726/726 [03:03<00:00,  3.96it/s]


Computing burn maps...
Computing burn map for Dataset/0004_01191/scenarii/


100%|██████████| 1191/1191 [00:39<00:00, 30.04it/s]


Computing burn map for Dataset/0003_01715/scenarii/


100%|██████████| 1715/1715 [00:41<00:00, 41.55it/s]


Computing burn map for Dataset/0005_00725/scenarii/


100%|██████████| 725/725 [00:12<00:00, 57.20it/s]


In [31]:

def listdir_limited(input_dir, max_n_scenarii=None):
    count = 0
    with os.scandir(input_dir) as it:
        for entry in it:
            if entry.is_file() and (max_n_scenarii is None or count < max_n_scenarii):
                yield input_dir + entry.name
                count += 1

In [32]:
def scan_all_files_listdir(input_dir):
    if not input_dir.endswith('/'):
        input_dir += '/'
    count = 0
    for file in os.listdir(input_dir):
        count += 1
    return count

def scan_all_files_listdir_limited(input_dir):
    if not input_dir.endswith('/'):
        input_dir += '/'
    count = 0
    for file in listdir_limited(input_dir):
        count += 1
    return count




In [33]:
# Add timing comparison
import timeit

# Define the test directory
test_dir = "Dataset/0001/Satellite_Images_Mask/"  # Adjust this path as needed

# Time the first function
time_listdir = timeit.timeit(
    lambda: scan_all_files_listdir(test_dir),
    number=10000
)

# Time the second function
time_listdir_limited = timeit.timeit(
    lambda: scan_all_files_listdir_limited(test_dir),
    number=10000
)

print(f"scan_all_files_listdir: {time_listdir:.4f} seconds")
print(f"scan_all_files_listdir_limited: {time_listdir_limited:.4f} seconds")
print(f"Difference: {abs(time_listdir - time_listdir_limited):.4f} seconds")
print(f"scan_all_files_listdir is {time_listdir_limited/time_listdir:.4f} times faster than scan_all_files_listdir_limited")


scan_all_files_listdir: 8.6998 seconds
scan_all_files_listdir_limited: 8.7920 seconds
Difference: 0.0922 seconds
scan_all_files_listdir is 1.0106 times faster than scan_all_files_listdir_limited
