Make sklearn embedding backend auto-select more cautious #1984

freddyheppell · 2024-05-10T12:08:59Z

Fixes #1980

Checks which module the error ocurred in before selecting an sklearn backend by default. If it was sentence_transformers, select skelarn as it's probably a minimal install. If any other module, re-raise as this is abnormal.
Logs an INFO message when the sklearn backend is selected, if verbose is enabled
Updates the comments on select_backend as I think they were quite out of date

I ended up instantiating a new logger for this as it felt better to pass verbose than the whole logger.

MaartenGr

Thanks for the PR! I had a few questions/suggestions but other than that it looks good. There is currently an issue with the pipeline as a result of the scikit-learn v1.5 upgrade. To get this pipeline working for you, I would advise implementing the same fix as #2008. That way, we can run the pipeline and see if everything works as intended.

MaartenGr · 2024-05-23T13:28:26Z

bertopic/backend/_utils.py

+            if verbose:
+                logger.info("Automatically selecting lightweight scikit-learn embedding backend as sentence-transformers appears to not be installed.")


The if statement is not needed since you already set the logging level right?

MaartenGr · 2024-05-23T13:29:21Z

bertopic/backend/_utils.py

 from sklearn.pipeline import make_pipeline
 from sklearn.decomposition import TruncatedSVD
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.pipeline import Pipeline as ScikitPipeline

+logger = MyLogger("WARNING")


This might create duplicates of the logger but I am not entirely sure. I believe this might be related: #1894 (comment)

freddyheppell added 2 commits May 10, 2024 13:03

check name of module error before selecting sklearn backend

82cfeea

rearrange imports

57d4a9f

freddyheppell changed the title ~~Make sklearn embedding backend more cautious~~ Make sklearn embedding backend auto-select more cautious May 10, 2024

MaartenGr reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sklearn embedding backend auto-select more cautious #1984

Make sklearn embedding backend auto-select more cautious #1984

freddyheppell commented May 10, 2024

MaartenGr left a comment

MaartenGr May 23, 2024

MaartenGr May 23, 2024

		if verbose:
		logger.info("Automatically selecting lightweight scikit-learn embedding backend as sentence-transformers appears to not be installed.")

Make sklearn embedding backend auto-select more cautious #1984

Are you sure you want to change the base?

Make sklearn embedding backend auto-select more cautious #1984

Conversation

freddyheppell commented May 10, 2024

MaartenGr left a comment

Choose a reason for hiding this comment

MaartenGr May 23, 2024

Choose a reason for hiding this comment

MaartenGr May 23, 2024

Choose a reason for hiding this comment