Hi,
I'm looking for the implementation of the transformer decoder with cross attention to the encoder in DeepSpeed.
I found the pull request #933 was a candidate for this purpose.
Would #933 be also enough to support the transformer decoder?
Best regards,
Hwidong