Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for other huggingface models #6

Closed
agoel00 opened this issue Oct 12, 2021 · 2 comments
Closed

Adding support for other huggingface models #6

agoel00 opened this issue Oct 12, 2021 · 2 comments

Comments

@agoel00
Copy link
Contributor

agoel00 commented Oct 12, 2021

In the example listed here, is it possible to use other models from the huggingface models hub to generate lexical substitutes? I am happy to contribute more models to the repo once I understand the pipeline of adding new models.

Also, which approach from the paper does this example correspond to? Is it the XLNet+embs approach listed in bold in this table from the paper?
Screenshot 2021-10-12 at 11 20 40 AM

@nvanva
Copy link
Contributor

nvanva commented Oct 14, 2021

If you want to use models compared in the paper, take a look at the list of configs https://github.com/Samsung/LexSubGen/tree/main/configs/subst_generators/lexsub and specify one of them instead of "xlnet_embs.jsonnet" in the beginning of the notebook.

If you want to use huggingface models not tested in our paper, this may be not so trivial as it seems, because likely the approach to generating substitutes shall be modified to some degree for each model individually (at least, different pre-processing and post-processing may be required, and likely optimal hyperparameters will be significantly different, like we saw in our experiments when architecturally very similar BERT and RoBERTa were compared ). Technically, you can start from copy-pasting one of the existing configs (this simple one for instance https://github.com/Samsung/LexSubGen/blob/main/configs/subst_generators/lexsub/bert.jsonnet) and then replace pipeline steps with you new steps. For each step you shall use either an existing, or create a new config. If you want a new model, you shall also write a new probability estimator (start from copy-pasting this one https://github.com/Samsung/LexSubGen/blob/main/lexsubgen/prob_estimators/bert_estimator.py) and the corresponding config (https://github.com/Samsung/LexSubGen/blob/main/configs/prob_estimators/lexsub/bert.jsonnet).

I think in the notebook our best model XLNet+embs is used, since config "xlnet_embs.jsonnet" is used.

@agoel00
Copy link
Contributor Author

agoel00 commented Nov 27, 2021

This sounds super helpful. I will try to incorporate more models and get back with the results! Thanks a lot.

@agoel00 agoel00 closed this as completed Nov 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants