First of all, thank you for your work. I noticed that in the paper, R_res is mentioned as contributing the most to performance improvement, followed by R_down. So, if we want to keep the model structure unchanged, since R_down.T can be fused into W_down, can R_down be fused into W_gate and W_up? Thank you very much.