You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was planning to use Longformer as a backbone architecture for a different domain than NLP. I was planning to train from scratch using a different type of data. I am using the Huggingface version of the model that looks like has been created by yourself. However, I was wondering whether there is any concrete benefit from using this version instead of the HF one?
The only relevant information about this is reported in the HF documentation:
The self-attention module :obj:`LongformerSelfAttention` implemented here supports the combination of local and
global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive and
dilated attention are more relevant for autoregressive language modeling than finetuning on downstream tasks.
Future release will add support for autoregressive attention, but the support for dilated attention requires a
custom CUDA kernel to be memory and compute efficient.
The text was updated successfully, but these errors were encountered:
Hi @ibeltagy,
I was planning to use Longformer as a backbone architecture for a different domain than NLP. I was planning to train from scratch using a different type of data. I am using the Huggingface version of the model that looks like has been created by yourself. However, I was wondering whether there is any concrete benefit from using this version instead of the HF one?
The only relevant information about this is reported in the HF documentation:
The text was updated successfully, but these errors were encountered: