Questions on the implementation

Thanks for the excellent work!

In the paper, the LoRA or Adapter modules are tied within the attention layers from my understanding. I am wondering why the implementation seems to always share the first layer's parameter with other layers, which makes it tied across the attention layers.