Skip to content

Conversation

@AnFreTh
Copy link
Collaborator

@AnFreTh AnFreTh commented Sep 28, 2024

New Model Architectures:

  • MambAttn Class: Introduced a new model class MambAttn that alternates between Mamba blocks and attention layers, providing a flexible architecture for various deep learning tasks. (mambular/arch_utils/mambattn_arch.py)
  • ConvRNN Class: Added the ConvRNN class that combines convolutional layers with RNN layers, supporting various RNN types (RNN, LSTM, GRU) and optional residual connections. (mambular/arch_utils/rnn_utils.py)

Integration and Configuration:

  • MambAttention Model: Implemented the MambAttention model that leverages the MambAttn architecture, with support for various normalization techniques and pooling methods. (mambular/base_models/mambattn.py)
  • Model Registration: Registered the MambAttn model in the __init__.py of base_models to ensure it's accessible within the module. (mambular/base_models/__init__.py) [1] [2]

Optimization Enhancements:

  • Early Pruning and Optimizer Configuration: Enhanced the lightning_wrapper.py to include early pruning based on validation loss and dynamic optimizer configuration, allowing for more flexible and efficient training.
    Include automatic bayesian HPO for all models -> config-mapper for automatic hparam-range detection
    (mambular/base_models/lightning_wrapper.py) [1] [2]

@AnFreTh AnFreTh merged commit ed5a0f3 into develop Sep 28, 2024
@AnFreTh AnFreTh deleted the attn branch November 5, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants