-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustering: User defined distances and keyers #6612
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks really great!
The next step would be to register the keyer and distance so that they can be used in the compute-clusters HTTP command. This registration is done in the controller.js
file (main/webapp/modules/core/MOD-INF/controller.js
).
One problem that you will encounter is how to pass the clustering expression to the keyer. You can have a look at how it's currently done for the ngram clusterer for instance: it's quite ad-hoc and not extensible. It's worth thinking about ways to improve that, or keep using the same strategy if we think it's still the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it intentional that this branch was created in the organization's repository? I think it would be preferable to do the development on a private fork of the repo and submit the PR from there, but I don't know if @wetneb told you to do it this way for some reason.
It would also be useful to reference the issue that this addresses so that the PR and the issue are linked together.
@tfmorris no, it wasn't intentional. Sorry for that mistake. I will work on my private fork in the future PRs. |
The failure is likely due to the fact that I have just merged #6574, which means that you'll need to explicitly register GREL in your test (see that PR for examples) |
Thank you so much. You've saved me time. |
I thought about the idea of passing the clustering expression to the keyer like how it's currently done for the n-gram clusterer but this will need adapting to:
So I have decided to only check if there is an expression in the request. If yes, then pass it to the constructors of the Keyer or Distance. The changes are only made in I also tested passing expressions in the request manually and it works fine. Should we have a unit test for it ? |
I think the approach you used to configure the keyer / distance is fine. Our existing architecture does not make it possible to register configurable keyers or distances, so some workaround like you used is necessary. It would be good to clean this up in the future, but I would say this can be part of a separate refactoring, especially since it will impact other clustering methods (such as ngrams). |
I'm changing the title of this pull request to make its topic clearer (there is nothing in it that indicates that it's about clustering so far). |
The first pr in the GSoC project #4301
Changes in this pull request:
UserDefinedKeyer
class + unit tests for itUserDefinedDistance
class + unit tests for it