Dear Author,
Thank you for your excellent work. I noticed that you compared your method with many transformer-based models and applied a multi-scale training strategy using scales {0.75, 1.0, 1.25}. However, changing the input image size can cause issues with patch allocation in transformer-based models. Could you please share how you addressed this problem?
Alternatively, would it be possible for you to share your complete project code, including the implementations of the other models and the training pipeline? I would be very grateful.
Best regards,
Liu.