Need Forwarding with state. #22

lxianl455 · 2024-06-14T02:10:56Z

Translation:
When training, it runs without state:
def forward(self, idx: torch.Tensor) -> torch.Tensor: x = self.token_embedding(idx) x = self.emb_dropout(x) x = self.xlstm_block_stack(x) logits = self.lm_head(x) return logits

Can you give a “forward with state” version?
def forward(self, idx: torch.Tensor, state) -> torch.Tensor: x = self.token_embedding(idx) x = self.emb_dropout(x) x = self.xlstm_block_stack(x, state) logits = self. lm_head(x) return logits

The text was updated successfully, but these errors were encountered:

sieusaoml · 2024-06-16T08:42:34Z

Translation: When training, it runs without state: def forward(self, idx: torch.Tensor) -> torch.Tensor: x = self.token_embedding(idx) x = self.emb_dropout(x) x = self.xlstm_block_stack(x) logits = self.lm_head(x) return logits

Can you give a “forward with state” version? def forward(self, idx: torch.Tensor, state) -> torch.Tensor: x = self.token_embedding(idx) x = self.emb_dropout(x) x = self.xlstm_block_stack(x, state) logits = self. lm_head(x) return logits

https://github.com/sieusaoml/xLSTM-custom-block
a custom block xlstm of my

lxianl455 · 2024-06-16T11:52:32Z

Yes, I want to do something similar. But in the code, is it only sLSTM that can be initialized with the previous hidden state? Can't mLSTM be initialized with the previous state?

hiimbach · 2024-06-16T13:06:06Z

The step() method and the forward() method of mLSTMLayer use different type of conv1d forward, so I think if you want to use hidden state, you need to use step() token by token instead of forward all of tokens at the same time.

lxianl455 · 2024-06-16T13:13:48Z

Yes, I am not asking to forward all of the tokens at the same time. In fact, my original model was an LSTM, which processes each token in a loop. I just want to replace this LSTM with xLSTM. But it seems that 'step' is used during inference, right? May I ask if it can backpropagate normally during training? Will the inplace operations lead to backpropagation errors?

sieusaoml · 2024-06-16T17:42:34Z

mLSTMLayer can be used with the previous hidden state, but backpropagate gradient in my test with context_lenght=1 has an error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need Forwarding with state. #22

Need Forwarding with state. #22

lxianl455 commented Jun 14, 2024

sieusaoml commented Jun 16, 2024 •

edited

Loading

lxianl455 commented Jun 16, 2024

hiimbach commented Jun 16, 2024

lxianl455 commented Jun 16, 2024 •

edited

Loading

sieusaoml commented Jun 16, 2024

Need Forwarding with state. #22

Need Forwarding with state. #22

Comments

lxianl455 commented Jun 14, 2024

sieusaoml commented Jun 16, 2024 • edited Loading

lxianl455 commented Jun 16, 2024

hiimbach commented Jun 16, 2024

lxianl455 commented Jun 16, 2024 • edited Loading

sieusaoml commented Jun 16, 2024

sieusaoml commented Jun 16, 2024 •

edited

Loading

lxianl455 commented Jun 16, 2024 •

edited

Loading