### Quick Run Block

In [5]:
# import torch
# from pytorch_lightning.callbacks import ModelCheckpoint

# # Path to the problematic checkpoint
# checkpoint_path = r"c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\pre_trained_models\reid\dukemtmcreid_resnet50_256_128_epoch_120.ckpt"

# # Allow loading of ModelCheckpoint objects (since PyTorch 2.6 blocks this by default)
# torch.serialization.add_safe_globals([ModelCheckpoint])

# # Load the checkpoint safely
# checkpoint = torch.load(checkpoint_path, map_location="cpu", weights_only=False)

# # Print out keys to inspect conflicting ones
# print("Checkpoint keys:", checkpoint.keys())

# # Remove ModelCheckpoint states (if they exist)
# keys_to_remove = [key for key in checkpoint.keys() if "ModelCheckpoint" in key]
# for key in keys_to_remove:
#     del checkpoint[key]

# # Save the cleaned checkpoint
# fixed_checkpoint_path = checkpoint_path.replace(".ckpt", "_fixed.ckpt")
# torch.save(checkpoint, fixed_checkpoint_path)

# print(f"Fixed checkpoint saved to: {fixed_checkpoint_path}")

## Code

In [6]:
import sys
from pathlib import Path
import os

sys.path.append(str(Path.cwd().parent.parent))
print(str(Path.cwd().parent.parent))
print("Current working directory: ", os.getcwd())

from ModelDevelopment.CentralPipeline import CentralPipeline
from ModelDevelopment.ImageBatchPipeline import ImageBatchPipeline
from DataProcessing.DataPreProcessing import DataPaths

c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition
Current working directory:  c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\ModelDevelopment\experiments


## Multithreaded Studies Research Findings
- ThreadPoolExecutor works best for CentralPipeline pre-processing step (i.e. everything before run_pose)
- Time taken to process one batch of 32 tracklets: 251 seconds => (251*32)/3600 = 2.35 hours ETA @ 200 images per batch cap, num_threads = 6x3 = 18
- Note on the GPU_SEMAPHORE constant inside ImageFeatureTransformPipeline: this is always capped at 1 so we only offload 1 batch of images to the GPU
- Since this gate is capped at 1, we can control the amount of data offloaded to the GPU via the image_batch_size param to CentralPipeline and only that
- Since caching is implemented, you can try setting the gate to 2 for even faster results, and if the process crashes, keep restarting with use_cache=True

In [7]:
# TODO: Add tracklets_override to pass a list of tracklets to process; tracklets_override=["8", "9", "10"]
# NOTE: It is recommended you delete the entire processed_data/{challenge/test/train} before running this to 100% avoid issues with old data.
# Furthermore, you should always restart your kernel before every new run because sometimes there are problems with paths
pipeline = CentralPipeline(
  tracklets_to_process_override=["177"],
  #num_tracklets=1210,
  #num_images_per_tracklet=1,
  input_data_path=DataPaths.TEST_DATA_DIR.value,
  output_processed_data_path=DataPaths.PROCESSED_DATA_OUTPUT_DIR_TEST.value,
  common_processed_data_dir=DataPaths.COMMON_PROCESSED_OUTPUT_DATA_TEST.value,
  gt_data_path=DataPaths.TEST_DATA_GT.value,
  single_image_pipeline=False,
  display_transformed_image_sample=False, # NOTE: DO NOT USE. Code is parallelized so we cannot show images anymore. Code breaks, but first one will show if True.
  num_image_samples=1,
  use_cache=True, # Set to false if you encounter data inconsistencies.
  suppress_logging=False,
  
  # --- PARALLELIZATION PARAMS --- These settings are optimal for an NVIDIA RTX 3070 Ti Laptop GPU.
  num_workers=3,            # CRITICAL optimisation param. Adjust accordingly.
  tracklet_batch_size=32,   # CRITICAL optimisation param. Adjust accordingly. 
  image_batch_size=50,      # CRITICAL optimisation param. Adjust accordingly. 
  num_threads_multiplier=2 # CRITICAL optimisation param. Adjust accordingly. Interpretation: num_threads = num_workers * num_threads_multiplier
  )

2025-03-23 16:03:28 [INFO] DataPreProcessing initialized. Universe of available data paths:
2025-03-23 16:03:28 [INFO] ROOT_DATA_DIR: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted
2025-03-23 16:03:28 [INFO] TEST_DATA_GT: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted\test\test_gt.json
2025-03-23 16:03:28 [INFO] TRAIN_DATA_GT: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted\train\train_gt.json
2025-03-23 16:03:28 [INFO] TEST_DATA_DIR: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted\test\images
2025-03-23 16:03:28 [INFO] TRAIN_DATA_DIR: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted\train\images
2025-03-23 16:03:28 [INFO] CHALLENGE_DATA_DIR: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\extracted\challenge\images
2

In [8]:
pipeline.run_soccernet(
  run_soccer_ball_filter=True,
  generate_features=True,
  run_filter=True,
  run_legible=True,
  run_legible_eval=True,
  run_pose=True,
  run_crops=True,
  run_str=True,
  run_combine=True,
  run_eval=True)

2025-03-23 16:03:29 [INFO] Running the SoccerNet pipeline.
2025-03-23 16:03:29 [INFO] Tracklet override applied. Using provided tracklets: 177


  0%|          | 0/1 [00:00<?, ?it/s]

2025-03-23 16:03:29 [INFO] Tracklet batch size: 32
2025-03-23 16:03:29 [INFO] Image batch size: 50
2025-03-23 16:03:29 [INFO] Number of workers: 3
2025-03-23 16:03:29 [INFO] Number of threads created: 6
2025-03-23 16:03:29 [INFO] Cache file missing. Preprocessing cannot be skipped
2025-03-23 16:03:29 [INFO] Using double parallelization: multiprocessing + CUDA batch processing.


Processing tracklets (CUDA + CPU):   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-23 16:03:34 [INFO] Creating placeholder data files for Soccer Ball Filter.
2025-03-23 16:03:34 [INFO] Determine soccer balls in image(s) using pre-trained model.
2025-03-23 16:03:34 [INFO] Found 1 balls, Ball list: ['177']


Processing Batch Tracklets (0-32):   0%|          | 0/1 [00:00<?, ?it/s]

Lightning automatically upgraded your loaded checkpoint from v1.1.4 to v2.5.0.post0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\pre_trained_models\reid\dukemtmcreid_resnet50_256_128_epoch_120.ckpt`


2025-03-23 16:03:35 [INFO] Saved features for tracklet with shape (1, 2048) to c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\processed_data\test\177\features.npy
2025-03-23 16:03:35 [INFO] Identifying and removing outliers by calling gaussian_outliers.py on feature file
2025-03-23 16:03:40 [INFO] 
2025-03-23 16:03:40 [ERROR] 
2025-03-23 16:03:40 [INFO] Done removing outliers
2025-03-23 16:03:40 [INFO] Running model chain on preprocessed image(s).
2025-03-23 16:03:40 [INFO] Classifying legibility of image(s) using pre-trained model.
2025-03-23 16:03:40 [INFO] This tracklet is a soccer ball track. Marking it as not legibile.
2025-03-23 16:03:45 [INFO] Saving legible_tracklets to: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\processed_data\test\177\legible_results.json
2025-03-23 16:03:45 [INFO] Saved legible_tracklets to: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\proc

Evaluating legibility:   0%|          | 0/1 [00:00<?, ?it/s]

2025-03-23 16:03:45 [INFO] Correct 1 out of 1. Accuracy 100.0%.
2025-03-23 16:03:45 [INFO] TP=0, TN=1, FP=0, FN=0
2025-03-23 16:03:45 [INFO] Precision=0, Recall=0
2025-03-23 16:03:45 [INFO] F1=0
2025-03-23 16:03:45 [INFO] Generating json for pose
2025-03-23 16:03:45 [INFO] Aggregating legible & illegible results (cache not used or only one file is missing).
2025-03-23 16:03:45 [INFO] Saved global legible results to: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\processed_data\test\common_data\legible_results.json
2025-03-23 16:03:45 [INFO] Saved global illegible results to: c:\Users\colin\OneDrive\Desktop\Jersey-Number-Recognition\data\SoccerNet\jersey-2023\processed_data\test\common_data\illegible_results.json
2025-03-23 16:03:45 [INFO] Done generating json for pose
2025-03-23 16:03:45 [INFO] Detecting pose
2025-03-23 16:03:47 [ERROR] Error running pose estimation: Command '['conda', 'run', '-n', 'vitpose', 'python', 'c:\\Users\\colin\\OneDrive\\

FileNotFoundError: [Errno 2] No such file or directory: 'c:\\Users\\colin\\OneDrive\\Desktop\\Jersey-Number-Recognition\\data\\SoccerNet\\jersey-2023\\processed_data\\test\\common_data\\pose_results.json'