Reproducibility Issues with Reported Accuracy

Hi,

Thank you for your work on MambaVision! I have been trying to reproduce the reported results, specifically for the model MambaVision-T and ran into some problems. 

Here is the summary:
- I ran 16 training runs, using up to 4 GPUs. To match the global batch size, I used a larger per-GPU batch size.
- I experimented with different seeds, but even with the same seed, I observed fluctuations of ~0.1-0.2 percentage points in accuracy.
- My highest achieved accuracy is 82.21%, whereas the reported result is 82.3%.
- When validating using the provided model checkpoint with validate.sh, I get an accuracy of 82.244%, which does not round to 82.3%.

My environment:
- Python 3.10.12 
- torch==2.5.1
- timm==1.0.14
- einops==0.8.0
- transformers==4.48.1
- causal-conv1d @ file:///causal-conv1d (using the newest commit as of this post: 82867a9)
- mamba-ssm @ file:///mamba (using the newest commit as of this post: 0cce0fa)

Could you clarify if there are any additional details regarding the training setup or hyperparameters that might explain these discrepancies? Also, was any additional post-processing or averaging applied to obtain the reported accuracy? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility Issues with Reported Accuracy #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducibility Issues with Reported Accuracy #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions