Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bar for sBert #1436

Closed
Robdboer opened this issue May 11, 2023 · 11 comments · Fixed by #1439
Closed

Progress bar for sBert #1436

Robdboer opened this issue May 11, 2023 · 11 comments · Fixed by #1439
Assignees

Comments

@Robdboer
Copy link
Contributor

During a conversation with @jteijema we concluded that it would be a good idea to include a progress bar for the sBert feature extractor.

@PeterLombaers
Copy link
Member

This is already implemented in sentence_transformers, so it's just a matter of adjusting the sbert classifier in asreview to use that implementation:

https://www.sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode

@jteijema
Copy link
Member

I think sbert can be cleaner in more ways than one. I'd be willing to pick this up if necessary, if @Robdboer does not want to.

@J535D165
Copy link
Member

I'm interested in a PR as I'm planning to merge this PR #1427. It would help to have an indicator for long-running feature extractors as otherwise it looks like nothing is happening.

@Robdboer
Copy link
Contributor Author

I will start looking into adding a progress bar today!

@rohitgarud
Copy link
Contributor

rohitgarud commented May 16, 2023

I too have one suggestion for updating SBERT. We need to add a way to use models other than pre-trained SBERT models. I would like to work on it, maybe in a different branch or the same as the one for adding a progress bar.

@Robdboer
Copy link
Contributor Author

@rohitgarud I was also thinking of creating a general sentence-transformer feature extractor, where one could choose a pretrained model out of a list at the start of a simulation. What are your thoughts on this?

@PeterLombaers
Copy link
Member

I would suggest that any of these changes would go in a different pull request than the one currently open for the progress bar.

@rohitgarud What models do you have in mind that you want to load via sentence_transformers but is not a pretrained sentence transformer model? There are of course a lot of models on huggingface for example, but sentence_transformers is already able to use models from there that are compatible with the package.

@Robdboer Do you mean in asreview lab in the frontend or via the CLI? Via the CLI you can already choose any pretrained model, though it is a bit of a hassle to get it to work. The SBert feature extractor accepts a transformer_model argument to load a pretrained model.

@Robdboer
Copy link
Contributor Author

@PeterLombaers Wow I did not realize that. I see it in the code now. Since I will be testing different multilingual feature extractors for my thesis this could be a very useful feature. If I find any improvements I will create a new pull-request. Thanks!

@rohitgarud
Copy link
Contributor

@PeterLombaers There are few models such as the SCIBERT, BIOBERT or longformer which does not have the pooling layer at the end to generate sentence embeddings as they are not SBERT models.
I will create PR for this, please review and let me know if it is a good features to add to SBERT

@jteijema
Copy link
Member

@rohitgarud there's a very basic reusable sbert model I've been working on my github. Its not usable for other applications than mine currently though.

Moreover, currently, sbert casts to an array. I don't think this is needed anymore and we can just return the model.encode(texts).

def transform(self, texts):
  _check_st()
  
  model = SentenceTransformer(self.transformer_model)
  X = np.array(model.encode(texts))
  return X

@rohitgarud
Copy link
Contributor

@jteijema The reusable sbert model is indeed very helpful and I think something like this can be added to Makita itself, so if the feature extractor parameters are not changed, use the previously obtained features for the next simulation for the same dataset

@J535D165 J535D165 linked a pull request May 17, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants