Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sklearn embedding backend auto-select more cautious #1984

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

freddyheppell
Copy link

Fixes #1980

  • Checks which module the error ocurred in before selecting an sklearn backend by default. If it was sentence_transformers, select skelarn as it's probably a minimal install. If any other module, re-raise as this is abnormal.
  • Logs an INFO message when the sklearn backend is selected, if verbose is enabled
  • Updates the comments on select_backend as I think they were quite out of date

I ended up instantiating a new logger for this as it felt better to pass verbose than the whole logger.

@freddyheppell freddyheppell changed the title Make sklearn embedding backend more cautious Make sklearn embedding backend auto-select more cautious May 10, 2024
Copy link
Owner

@MaartenGr MaartenGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I had a few questions/suggestions but other than that it looks good. There is currently an issue with the pipeline as a result of the scikit-learn v1.5 upgrade. To get this pipeline working for you, I would advise implementing the same fix as #2008. That way, we can run the pipeline and see if everything works as intended.

Comment on lines +143 to +144
if verbose:
logger.info("Automatically selecting lightweight scikit-learn embedding backend as sentence-transformers appears to not be installed.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if statement is not needed since you already set the logging level right?

from sklearn.pipeline import make_pipeline
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline as ScikitPipeline

logger = MyLogger("WARNING")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might create duplicates of the logger but I am not entirely sure. I believe this might be related: #1894 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn when automatically choosing SklearnEmbedder backend
2 participants