Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Kazu #20

Open
raylite opened this issue Feb 2, 2024 · 8 comments
Open

Extending Kazu #20

raylite opened this issue Feb 2, 2024 · 8 comments

Comments

@raylite
Copy link

raylite commented Feb 2, 2024

I am new to Kazu and quite fascinated by it, but I want to find out if Kazu is flexible to the point that a developer can bring an additional (or custom) ontology/knowledgebase in addition to what's already in use for a certain entity or even diasble what's built into Kazu just to use a different one?

@EFord36 EFord36 closed this as completed Feb 2, 2024
@EFord36 EFord36 reopened this Feb 2, 2024
@EFord36
Copy link
Collaborator

EFord36 commented Feb 2, 2024

oops - just hit the 'comment and close issue button' by accident midway through writing a reply, sorry! real reply pending

@EFord36
Copy link
Collaborator

EFord36 commented Feb 2, 2024

Yes, Kazu is very flexible - the downside is that it's so flexible, we haven't yet done a great job of documenting all that flexibility.

In order to bring an additional (custom) ontology/kb, you would need to build your own model pack. Doing this is something we have had on our backlog to document for a while, but don't have anything good yet unfortunately.

One note is that we currently in the process of releasing a new version of Kazu - 2.0 . This doesn't change much for a user of the default model pack, but changes some of the details of providing config for 'Curating' knowledge bases to e.g. filter out bad synonyms for NER. How urgently are you looking at this - if I waited until the new version is out sometime next week to give you a proper guide, would that be ok for you, or would you rather than something to get you started sooner, even if it means some re-work if you want to upgrade to 2.0 later?

@EFord36
Copy link
Collaborator

EFord36 commented Feb 2, 2024

Disabling some of the existing ontologies alone has one way of doing it that should be considerably simpler - with the downside that the string matching facilities of Kazu will still have the disabled ontology 'baked in' (which will take up memory, but shouldn't affect compute much), unless the model pack was rebuilt. Is this something you're interested in, or mainly the adding of additional ontologies, and therefore building a custom model pack?

@raylite
Copy link
Author

raylite commented Feb 2, 2024

Yes, Kazu is very flexible - the downside is that it's so flexible, we haven't yet done a great job of documenting all that flexibility.

In order to bring an additional (custom) ontology/kb, you would need to build your own model pack. Doing this is something we have had on our backlog to document for a while, but don't have anything good yet unfortunately.

One note is that we currently in the process of releasing a new version of Kazu - 2.0 . This doesn't change much for a user of the default model pack, but changes some of the details of providing config for 'Curating' knowledge bases to e.g. filter out bad synonyms for NER. How urgently are you looking at this - if I waited until the new version is out sometime next week to give you a proper guide, would that be ok for you, or would you rather than something to get you started sooner, even if it means some re-work if you want to upgrade to 2.0 later?

Yes, I can wait until the new model is out, so I am working with the latest version once and for all. Next week is not bad for me. I work day-day in this domain and have built a similar tool for my org, I see common approaches, themes and packages like ahocorasick, but Kazu appears more matured robust to fuzzy matching particularly when terms overlap. So, I am thinking why re-invent the wheel if I can build on and extend Kazu for my local need.

@raylite
Copy link
Author

raylite commented Feb 2, 2024

Disabling some of the existing ontologies alone has one way of doing it that should be considerably simpler - with the downside that the string matching facilities of Kazu will still have the disabled ontology 'baked in' (which will take up memory, but shouldn't affect compute much), unless the model pack was rebuilt. Is this something you're interested in, or mainly the adding of additional ontologies, and therefore building a custom model pack?

At some point both I may need to do both. But I will give priority to adding custom kb.

@EFord36
Copy link
Collaborator

EFord36 commented Feb 2, 2024

Sounds good - in which case I think waiting for the new model pack and release is best.

Incidentally, Kazu actually uses pyahocorasick "under the hood" for its exact string matching in the MemoryEfficientStringMatchingStep, so it provides functionality on top of it.

@EFord36
Copy link
Collaborator

EFord36 commented Feb 9, 2024

To keep you in the loop, it's taken me a little longer this week to progress the next release, but we're making good progress, should be sometime next week. Sorry for the delay!

@raylite
Copy link
Author

raylite commented Feb 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants