Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between this codebase and Huggingface? #210

Open
aleSuglia opened this issue Aug 24, 2021 · 0 comments
Open

Difference between this codebase and Huggingface? #210

aleSuglia opened this issue Aug 24, 2021 · 0 comments

Comments

@aleSuglia
Copy link

aleSuglia commented Aug 24, 2021

Hi @ibeltagy,

I was planning to use Longformer as a backbone architecture for a different domain than NLP. I was planning to train from scratch using a different type of data. I am using the Huggingface version of the model that looks like has been created by yourself. However, I was wondering whether there is any concrete benefit from using this version instead of the HF one?

The only relevant information about this is reported in the HF documentation:

The self-attention module :obj:`LongformerSelfAttention` implemented here supports the combination of local and
    global attention but it lacks support for autoregressive attention and dilated attention. Autoregressive and
    dilated attention are more relevant for autoregressive language modeling than finetuning on downstream tasks.
    Future release will add support for autoregressive attention, but the support for dilated attention requires a
    custom CUDA kernel to be memory and compute efficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant