Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing clustering #64

Closed
rawsh opened this issue Jan 20, 2024 · 2 comments
Closed

Utilizing clustering #64

rawsh opened this issue Jan 20, 2024 · 2 comments

Comments

@rawsh
Copy link

rawsh commented Jan 20, 2024

Hi, sorry if this is an ignorant question but would it be possible to use the calculated centroids for nlp tasks such as summarization? With dense embeddings it's possible to cluster the dataset and use documents close to the centroids from each cluster as a representation of the cluster for summarization. Would it be possible to do something similar with the centroids calculated by ragatouille?

@okhat
Copy link
Collaborator

okhat commented Jan 20, 2024

Amazing question. Maybe the answer can become yes with some exploration!

But right now, it’s tricky. The centroids are for tokens, not whole documents

@rawsh
Copy link
Author

rawsh commented Jan 22, 2024

@okhat Thank you very much for the insight! Will try to look into this at some point. Closing for now.

@rawsh rawsh closed this as completed Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants