Difference between this codebase and Huggingface? #210

aleSuglia · 2021-08-24T09:39:51Z

I was planning to use Longformer as a backbone architecture for a different domain than NLP. I was planning to train from scratch using a different type of data. I am using the Huggingface version of the model that looks like has been created by yourself. However, I was wondering whether there is any concrete benefit from using this version instead of the HF one?

The only relevant information about this is reported in the HF documentation:

The self-attention module :obj:`LongformerSelfAttention` implemented here supports the combination of local and
    global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive and
    dilated attention are more relevant for autoregressive language modeling than finetuning on downstream tasks.
    Future release will add support for autoregressive attention, but the support for dilated attention requires a
    custom CUDA kernel to be memory and compute efficient.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between this codebase and Huggingface? #210

Difference between this codebase and Huggingface? #210

aleSuglia commented Aug 24, 2021 •

edited

Difference between this codebase and Huggingface? #210

Difference between this codebase and Huggingface? #210

Comments

aleSuglia commented Aug 24, 2021 • edited

aleSuglia commented Aug 24, 2021 •

edited