-
Notifications
You must be signed in to change notification settings - Fork 117
Closed
Description
Hi,
Thank you for your work on MambaVision! I have been trying to reproduce the reported results, specifically for the model MambaVision-T and ran into some problems.
Here is the summary:
- I ran 16 training runs, using up to 4 GPUs. To match the global batch size, I used a larger per-GPU batch size.
- I experimented with different seeds, but even with the same seed, I observed fluctuations of ~0.1-0.2 percentage points in accuracy.
- My highest achieved accuracy is 82.21%, whereas the reported result is 82.3%.
- When validating using the provided model checkpoint with validate.sh, I get an accuracy of 82.244%, which does not round to 82.3%.
My environment:
- Python 3.10.12
- torch==2.5.1
- timm==1.0.14
- einops==0.8.0
- transformers==4.48.1
- causal-conv1d @ file:///causal-conv1d (using the newest commit as of this post: 82867a9)
- mamba-ssm @ file:///mamba (using the newest commit as of this post: 0cce0fa)
Could you clarify if there are any additional details regarding the training setup or hyperparameters that might explain these discrepancies? Also, was any additional post-processing or averaging applied to obtain the reported accuracy?
Alihjt
Metadata
Metadata
Assignees
Labels
No labels