Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-86947ja9y dill old weights #411

Merged
merged 7 commits into from
Apr 9, 2024
Merged

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Apr 4, 2024

Related to #410

Tried on local (MacOS) with python3.10 - works fine
Tried on a fresh Ubuntu with python3.10 - works fine
Tried on Ubuntu with python3.11 - verified the issue

Checked the depdenency versions on Ubuntu - they were near identical (only differences being async-timeout==4.0.3 and exceptiongroup==1.2.0 additions in 3.10).

It seems to have to do with something with pickling the partial function. Perhaps the performance enhancements in 3.11 broke something?
I've verified that the config.linking.weighted_average_function value is a functools.partial and it has a keyword argument factor=0.0004 in python3.11.

So what this PR does:

  • Adds a test that failed in python 3.11 (and only in 3.11) for the previous state
    • Includes a known-broken (otherwise blank) CDB in old format (which includes the config)
    • You can check out the PR on my fork for details:
  • Identifies and fixes the issue when setting value to config.linking.weighted_average_function
    • During loading, a default config is originally created
    • And then the data off disk is merged
    • So this means that the issue doesn't arise in the constructor / __init__
  • If a method that cannot be called is found, attempts to fix it
    • Though is only able to do this if functools.partial with args and/ir kwargs to medcat.config.weighted_average is used
    • Logs that there was a change made
  • If the method does not work but is not a partial of medcat.config.weighted_average the fix fails
    • This is logged as well and a workaround is suggested

This fix is only really able to remedy the default medcat.config.weighted_average method based values. Otherwise it's not straight forward to figure out what the underlying method may be.

With that said, all this PR does is apply a patch for the issue at hand. I still don't fully understand why this happens. Though I did verify that this seems to be a broader issue of saving using dill to dump functools.partial methods in older python (up to and including 3.10) and subsequently loading them in newer python (3.11+). I also verified that the issue does not seem to exist if you do the same within the same python version (be it 3.10 or 3.11).

@tomolopolis
Copy link
Member

…default weighted average function (to avoid circular imports)
Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit 9c69aa9 into master Apr 9, 2024
5 checks passed
@mart-r mart-r deleted the CU-86947ja9y-dill-old-weights branch August 12, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants