*Note: Place this notebook in STHN directory.*

# TGM (Satellite to Thermal)

In [None]:
%pip install PyYAML transformers timm h5py torch torchvision faiss-cpu einops opencv-python

## Image

[my_eval_pix2pix.py](./global_pipeline/my_eval_pix2pix.py)
- [Input Image](./global_pipeline/my_eval_pix2pix.py:69)
- [Output Image](./global_pipeline/my_eval_pix2pix.py:94)


In [None]:
!python3 ./global_pipeline/my_eval_pix2pix.py --resume="js_models/TGM_nocontrast/best_model.pth" --dataset_name=none --datasets_folder ./maps --G_net unet --GAN_upsample bilinear --GAN_resize 1024 1024

## Folder

[my_TGM_folder2folder.py](./global_pipeline/my_TGM_folder2folder.py)
- [Input Folder](./global_pipeline/my_TGM_folder2folder.py:296)
- [Output Folder](./global_pipeline/my_TGM_folder2folder.py:297)

In [None]:
!py ./global_pipeline/my_TGM_folder2folder.py --resume="./maps/models/TGM_nocontrast/best_model.pth" --dataset_name=none --datasets_folder ./maps --G_net unet --GAN_upsample bilinear --GAN_resize 1024 1024

# SHN (Matching)

In [None]:
%pip install kornia scikit-image wandb openpyxl

[my_myevaluate.py](./local_pipeline/my_myevaluate.py)
- [Input Image](./local_pipeline/my_myevaluate.py:118)
- [Output Image](./local_pipeline/my_myevaluate.py:119)
- [Output Excel](./local_pipeline/my_myevaluate.py:168)

## One-Stage

In [None]:
import torch
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
print("cuda mem allocated:", torch.cuda.memory_allocated() // (1024**2), "MB")
print("cuda mem reserved:",   torch.cuda.memory_reserved()   // (1024**2), "MB")

In [None]:
!py -3.13  ./local_pipeline/my_myevaluate.py --dataset_name none --eval_model js_models/1536_one_stage/STHN.pth --val_positive_dist_threshold 512 --lev0 --database_size 1536 --corr_level 4 --test

## Two-Stages

In [None]:
!python3  ./local_pipeline/my_myevaluate.py --dataset_name none --eval_model js_models/1536_two_stages/STHN.pth --val_positive_dist_threshold 512 --lev0 --database_size 1536 --corr_level 4 --test

[0;93m2026-01-04 03:16:40.004945976 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"[m
  with autocast(enabled=self.args.mixed_precision):
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  with autocast(enabled=self.args.mixed_precision):
‚úÖ Done for image 0
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
‚úÖ Done for image 1
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
‚úÖ Done for image 2
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
‚úÖ Done for image 3
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
‚úÖ Done for image 4
torch.Size([1, 256, 64, 64]) torch.Size([1, 256, 64, 64])
‚úÖ Done for image 5

üìä Average processing time per image: 0.5516 sec
üìÅ Saved all corner points to four_point_1_mul6.xlsx


# TensorRT Conversion (STHN .pth ‚Üí ONNX ‚Üí TensorRT)
- No `pycuda` is used (engine build via `trtexec`).
- TensorRT `.engine` files are GPU/driver-specific: build them on the target machine (e.g., Jetson) with the same TensorRT version.
- One-stage exports 1 ONNX/engine (coarse). Two-stage exports 2 ONNX/engines (coarse + fine). The crop/combine logic between stages remains in Python.

In [None]:
%pip install onnx

## Export ONNX
These commands export the weights from your `.pth` into ONNX files.

In [None]:
!python3 -m tools.export_sthn_onnx --pth "js_models\1536_one_stage\STHN.pth" --out_dir "trt\one_stage" --stage coarse --resize_width 256 --corr_level 4 --iters 6 --database_size 1536

In [None]:
!py -3.13 -m tools.export_sthn_onnx --pth "js_models\1536_two_stages\STHN.pth" --out_dir "trt\two_stages" --stage both --resize_width 256 --corr_level 4 --iters 6  

## Build TensorRT engines (requires TensorRT + `trtexec`)
- Run these on the target GPU machine (e.g., Jetson).
- If `trtexec` is not on your PATH, pass `--trtexec /full/path/to/trtexec`.

### One-stage (coarse)

In [None]:
!python3 -m tools.build_tensorrt_engine --onnx "trt/one_stage/sthn_coarse.onnx" --engine "trt/one_stage/sthn_coarse_fp16.engine" --fp16 --trtexec /usr/src/tensorrt/bin/trtexec

### Two-stage (coarse + fine)

In [None]:
!python3 -m tools.build_tensorrt_engine --onnx "trt/two_stages/sthn_coarse.onnx" --engine "trt/two_stages/sthn_coarse_fp16.engine" --fp16 --trtexec /usr/src/tensorrt/bin/trtexec

In [None]:
!python3 -m tools.build_tensorrt_engine --onnx "trt/two_stages/sthn_fine.onnx" --engine "trt/two_stages/sthn_fine_fp16.engine" --fp16 --trtexec /usr/src/tensorrt/bin/trtexec

In [None]:
# # One-stage (coarse)
# !py -3.13 tools/build_tensorrt_engine.py --onnx trt/one_stage/sthn_coarse.onnx --engine trt/one_stage/sthn_coarse_fp16.engine --fp16 --shapes "min=image1:1x3x256x256,image2:1x3x256x256;opt=image1:1x3x256x256,image2:1x3x256x256;max=image1:1x3x256x256,image2:1x3x256x256"

# # Two-stage (coarse + fine)
# !py -3.13 tools/build_tensorrt_engine.py --onnx trt/two_stages/sthn_coarse.onnx --engine trt/two_stages/sthn_coarse_fp16.engine --fp16 --shapes "min=image1:1x3x256x256,image2:1x3x256x256;opt=image1:1x3x256x256,image2:1x3x256x256;max=image1:1x3x256x256,image2:1x3x256x256"
# !py -3.13 tools/build_tensorrt_engine.py --onnx trt/two_stages/sthn_fine.onnx --engine trt/two_stages/sthn_fine_fp16.engine --fp16 --shapes "min=image1_crop:1x3x256x256,image2:1x3x256x256;opt=image1_crop:1x3x256x256,image2:1x3x256x256;max=image1_crop:1x3x256x256,image2:1x3x256x256"

# TensorRT Inference (.engine)
These use TensorRT engines directly (no `pycuda`).
- One-stage: pass the coarse engine in `--eval_model`
- Two-stage: pass both engines in `--eval_model` and `--eval_model_fine` and add `--two_stages`

In [None]:
# One-stage TensorRT inference (coarse engine only)
!python3 ./local_pipeline/my_myevaluate_trt.py --dataset_name none --eval_model trt/one_stage/sthn_coarse.engine --val_positive_dist_threshold 512 --lev0 --database_size 1536 --corr_level 4 --test

[0;93m2026-01-02 11:53:46.974136665 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card1/device/vendor"[m
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
[01/02/2026-11:53:47] [TRT] [E] [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[01/02/2026-11:53:47] [TRT] [E] [executionContext.cpp::ExecutionContext::565] Error Code 2: OutOfMemory (Requested size was 71320576 bytes.)
Traceback (most recent call last):
  File "/home/rpl/Desktop/RPL/Map-Matching/STHN-JetsonONX8/./local_pipeline/my_myevaluate_trt.py", line 107, in <module>
    test(args)
  File "/home/rpl/Desktop/RPL/Map-Matching/STHN-JetsonONX8/./local_pipeline/my_myevaluate_trt.py", 

In [1]:
# Two-stage TensorRT inference (coarse + fine engines)
!python3 ./local_pipeline/my_myevaluate_trt.py --dataset_name none --two_stages --eval_model trt/two_stages/sthn_coarse.engine --eval_model_fine trt/two_stages/sthn_fine.engine --val_positive_dist_threshold 512 --lev0 --database_size 1536 --corr_level 4 --test

(648.3709716796875, 732.7875366210938)
tensor([[[439.5803, 512.3243],
         [847.2935, 519.3723],
         [453.0194, 950.8202],
         [853.5908, 948.6334]]])
‚úÖ Done for image 0
(834.8838500976562, 753.2387084960938)
tensor([[[ 489.9189,  414.2565],
         [1162.2380,  405.4817],
         [ 521.0660, 1089.0088],
         [1166.3123, 1104.2079]]])
‚úÖ Done for image 1
(875.2380981445312, 712.0355224609375)
tensor([[[799.8723, 634.5186],
         [948.0734, 637.6548],
         [805.1346, 788.9434],
         [947.8721, 787.0253]]])
‚úÖ Done for image 2
(721.9774169921875, 722.1555786132812)
tensor([[[ 386.0450,  380.0518],
         [1044.4659,  397.2370],
         [ 412.5507, 1058.5854],
         [1044.8479, 1052.7480]]])
‚úÖ Done for image 3
(825.1072998046875, 750.33447265625)
tensor([[[ 633.2064,  554.5209],
         [1009.7736,  559.8675],
         [ 645.2020,  945.6371],
         [1012.2469,  941.3124]]])
‚úÖ Done for image 4

üìä Average processing time per image: 0.3184 