Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linker Ensemble #53

Merged
merged 25 commits into from
Mar 30, 2023
Merged

Add Linker Ensemble #53

merged 25 commits into from
Mar 30, 2023

Conversation

marmg
Copy link
Collaborator

@marmg marmg commented Mar 24, 2023

Status Type ⚠️ Core Change Issue
Ready Feature No Link

Problem

Add linker ensemble to allow using different linkers and different descriptions to improve the performance.

Solution

Implementation of LinkerEnsemble which takes as input the list of linkers to use, the strategy (one of: max, count) and the threshold (to save entities).

It will group the entities by the name, and create combinations of them to extract with each of the linkers that set of entities, to finally group the results.

Example:

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM, LinkerTARS
from zshot.linker.linker_ensemble import LinkerEnsemble
from zshot.utils.data_models import Entity
from zshot import displacy

nlp = spacy.blank("en")

config = PipelineConfig(
    entities=[
        Entity(name="fruits", description="The sweet and fleshy product of a tree or other plant."),
        Entity(name="fruits", description="Names of fruits such as banana, oranges"),
        Entity(name="vitamin", description="A nutrient that the body needs in small amounts to function " \
                                           "and stay healthy"),
        Entity(name="vitamin", description="Vitamins are substances that our bodies need to develop and " \
                                           "function normally")
    ],
    linker=LinkerEnsemble(
        linkers=[
            LinkerSMXM(),
            LinkerTARS(),
        ],
        threshold=0.25
    )
)

nlp.add_pipe("zshot", config=config, last=True)
# annotate a piece of text
doc = nlp('Apple or oranges have a lot of vitamin C.')

# Visualize the result
displacy.render(doc, style='ent')

@marmg marmg self-assigned this Mar 24, 2023
@marmg marmg force-pushed the feature/ensembling branch 2 times, most recently from c1c14fc to c3a9b50 Compare March 24, 2023 13:59
@codecov
Copy link

codecov bot commented Mar 24, 2023

Codecov Report

Patch coverage: 92.21% and project coverage change: -0.53 ⚠️

Comparison is base (0fc473c) 93.04% compared to head (55e151c) 92.51%.

❗ Current head 55e151c differs from pull request most recent head 1745b5b. Consider uploading reports for the commit 1745b5b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
- Coverage   93.04%   92.51%   -0.53%     
==========================================
  Files          67       73       +6     
  Lines        2832     3047     +215     
==========================================
+ Hits         2635     2819     +184     
- Misses        197      228      +31     
Impacted Files Coverage Δ
zshot/linker/linker_regen/utils.py 60.52% <ø> (-17.53%) ⬇️
zshot/linker/linker_ensemble/utils.py 63.33% <63.33%> (ø)
zshot/linker/linker_ensemble/linker_ensemble.py 87.50% <87.50%> (ø)
zshot/utils/ensembler.py 98.33% <98.33%> (ø)
zshot/linker/__init__.py 100.00% <100.00%> (ø)
zshot/linker/linker_ensemble/__init__.py 100.00% <100.00%> (ø)
zshot/linker/linker_tars.py 97.87% <100.00%> (+4.25%) ⬆️
zshot/tests/linker/test_ensemble_linker.py 100.00% <100.00%> (ø)
zshot/tests/linker/test_linker.py 96.92% <100.00%> (ø)
zshot/tests/linker/test_regen_linker.py 91.78% <100.00%> (-8.22%) ⬇️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

* ✨ Add regen wikification

* 🐛 Fix Wikification

* 🐛 Reduce tests complexity

* 🐛 Reduce test resources

* 🐛 Fix test

* ➖ Remove test file

* ✏️ Remove too expensive test

Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
GabrielePicco and others added 22 commits March 30, 2023 14:32
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
…ed evaluation documentation. (#51)

Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
… up tests

Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
Signed-off-by: Marcos Martinez <Marcos.Martinez.Galindo@ibm.com>
@marmg marmg merged commit e5e0c7f into main Mar 30, 2023
@marmg marmg deleted the feature/ensembling branch March 30, 2023 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants