You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see in the code that num_channels=3 is expected, how easy would be it be to extend to e.g. 4 or more channels? Is this on the roadmap? Thanks in advance
can you say a bit more about your use case? we pretrained on RGB images so the model doesn't really off the shelf know how to work with more than 3 channels. not that it can't finetune with those, but there's less benefit from the pretrain.
Is that to have a reference for fine tuning? I'm working with proprietary datasets that aren't public. I could train on a public dataset but of course would need the 4 channel support to do so. Thanks!
I'm just interested in seeing how other model developers handle this. Specifically, I want to know if any public models are pretrained on RGB images and transferred to RGB+ images. I understand that it's doable and I have a few ideas as to how to get it done, just hoping for prior art, so if you're using some public model already for your training, that would be super helpful :)
@isaacrob-roboflow yes it happens all the time in remote sensing - starting to see more use of domain specific weights, but it is still typically imagenet!
awesome thank you. I'm not seeing anything in their repo about taking a model trained with RGB on imagenet and transferring it to something with more than 3 channels, looks like all the pretrained models that support more than 3 channels are pretrained with those channels originally? feel free to let me know if I'm not seeing something
for example, for resnet50 pretrained on sentinal 2, they have SENTINEL2_MI_MS_SATLAS which has 9 channels and
ResNet50_Weights.SENTINEL2_MI_RGB_SATLAS which has 3. I don't see anything about using the RGB weights for a model with more than 3 input channels
TorchGeo and SMP maintainer here. TorchGeo has ~100 models pre-trained on multispectral imagery (in_channels = ~10), hyperspectral imagery (in_channels = ~250), and synthetic aperture radar (in_channels = 1 or 2). Timm directly supports repeating 3-channel weights for 3+ channel inputs, see here for how this works. For our use cases, we would be happy if the model supported in_channels != 3, even if the pre-trained weights didn't. We have plenty of 1M+ image multispectral and SAR datasets we could use to pretrain a model and redistribute weights for it.
Sounds good to me. @isaacrob-roboflow — what do you think about introducing a similar in_channels parameter, defaulting to 3? When needed, we could simply replicate the pretrained weights across additional channels, as suggested by @adamjstewart. Naturally, we’d assume that anyone using this parameter knows what they’re doing — either planning to train a model on multi-channel inputs or already having weights trained that way.
@adamjstewart — would you be open to sharing a dataset we could use to test the pipeline? Alternatively, would you be willing to help us put together a PR adding this functionality?
As @adamjstewart mentioned, timm's approach of repeating the first layer pretrained weights if you have more than 3 channels makes the most sense and this seems to work well in practice instead of randomly initializing a new first layer.
For example, if you have pretrained RGB weights but a user has a new 6 channel input you would repeat the first layer weights like RGBRGB.
Activity
isaacrob-roboflow commentedon Mar 28, 2025
can you say a bit more about your use case? we pretrained on RGB images so the model doesn't really off the shelf know how to work with more than 3 channels. not that it can't finetune with those, but there's less benefit from the pretrain.
what model, if any, are you using right now?
robmarkcole commentedon Mar 28, 2025
Use case is aerial imagery, think drones with RGB+NIR. It would be sufficient to pad the weights of the RED channel to the new channels.
isaacrob-roboflow commentedon Mar 31, 2025
It would be useful if you had a reference model that does what you're interested to point us to! :)
robmarkcole commentedon Mar 31, 2025
Is that to have a reference for fine tuning? I'm working with proprietary datasets that aren't public. I could train on a public dataset but of course would need the 4 channel support to do so. Thanks!
isaacrob-roboflow commentedon Mar 31, 2025
I'm just interested in seeing how other model developers handle this. Specifically, I want to know if any public models are pretrained on RGB images and transferred to RGB+ images. I understand that it's doable and I have a few ideas as to how to get it done, just hoping for prior art, so if you're using some public model already for your training, that would be super helpful :)
robmarkcole commentedon Apr 1, 2025
@isaacrob-roboflow yes it happens all the time in remote sensing - starting to see more use of domain specific weights, but it is still typically imagenet!
isaacrob-roboflow commentedon Apr 1, 2025
can you link me to an open source repo that you would consider a good example of adapting imagenet pretrained weights to RGB+ images?
robmarkcole commentedon Apr 1, 2025
TorchGeo supports in_channels>3. Example training at https://torchgeo.readthedocs.io/en/latest/tutorials/pretrained_weights.html
isaacrob-roboflow commentedon Apr 1, 2025
awesome thank you. I'm not seeing anything in their repo about taking a model trained with RGB on imagenet and transferring it to something with more than 3 channels, looks like all the pretrained models that support more than 3 channels are pretrained with those channels originally? feel free to let me know if I'm not seeing something
for example, for resnet50 pretrained on sentinal 2, they have SENTINEL2_MI_MS_SATLAS which has 9 channels and
ResNet50_Weights.SENTINEL2_MI_RGB_SATLAS which has 3. I don't see anything about using the RGB weights for a model with more than 3 input channels
robmarkcole commentedon Apr 1, 2025
Sure to clarify it is valid to have
robmarkcole commentedon Apr 21, 2025
For reference
https://github.com/ultralytics/ultralytics/releases/tag/v8.3.112
adamjstewart commentedon Apr 23, 2025
TorchGeo and SMP maintainer here. TorchGeo has ~100 models pre-trained on multispectral imagery (in_channels = ~10), hyperspectral imagery (in_channels = ~250), and synthetic aperture radar (in_channels = 1 or 2). Timm directly supports repeating 3-channel weights for 3+ channel inputs, see here for how this works. For our use cases, we would be happy if the model supported in_channels != 3, even if the pre-trained weights didn't. We have plenty of 1M+ image multispectral and SAR datasets we could use to pretrain a model and redistribute weights for it.
SkalskiP commentedon Apr 23, 2025
Sounds good to me. @isaacrob-roboflow — what do you think about introducing a similar in_channels parameter, defaulting to 3? When needed, we could simply replicate the pretrained weights across additional channels, as suggested by @adamjstewart. Naturally, we’d assume that anyone using this parameter knows what they’re doing — either planning to train a model on multi-channel inputs or already having weights trained that way.
@adamjstewart — would you be open to sharing a dataset we could use to test the pipeline? Alternatively, would you be willing to help us put together a PR adding this functionality?
adamjstewart commentedon Apr 23, 2025
Every single dataset in this table where the last column is not RGB: https://torchgeo.readthedocs.io/en/latest/api/datasets.html#non-geospatial-datasets
Tentatively yes, but not any time soon. @robmarkcole or @isaaccorley may be better equipped to do this.
isaaccorley commentedon Apr 23, 2025
As @adamjstewart mentioned, timm's approach of repeating the first layer pretrained weights if you have more than 3 channels makes the most sense and this seems to work well in practice instead of randomly initializing a new first layer.
For example, if you have pretrained RGB weights but a user has a new 6 channel input you would repeat the first layer weights like RGBRGB.
I'll take a look today at opening a PR.
SkalskiP commentedon Apr 23, 2025
@isaaccorley that would be a massive help in unblocking @robmarkcole as well as others who'd like to use RF-dETR this way. can't wait to see that PR!
isaaccorley commentedon Apr 24, 2025
@SkalskiP @adamjstewart @robmarkcole PR here #180
robmarkcole commentedon May 14, 2025
Noting this requirement ultralytics/ultralytics#20646
isaacrob-roboflow commentedon May 14, 2025
@isaaccorley you'll probably have to divide all weights by 2 if you go from RGB to RGBRGB
isaaccorley commentedon May 14, 2025
@isaacrob-roboflow timm's adapt function already does this for you.
isaacrob-roboflow commentedon May 15, 2025
cool! then please ensure that if a port is applied to this codebase, that that division happens here too :)