Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF.get_method() used to choose implementation in UDFPartRunner._run_tile #1509

Merged
merged 22 commits into from Sep 14, 2023

Conversation

matbryan52
Copy link
Member

@matbryan52 matbryan52 commented Sep 11, 2023

Closes #1508

  • Shifts the string identifiers "frame", "tile", "partition" onto libertem.common.udf.UDFMethodEnum
  • Default UDF.get_method() performs the runtime protocol check previously in UDFPartRunner._run_tile and returns an enum member that UDFPartRunner matches against
  • Additional checking added to verify UDF.get_method() return is valid (both is an enum member, and actually implements the desired Protocol)
  • Additional test cases for invalid UDF.get_method() implementations

Notes:

  • Returning an enum member from get_method is technically a breaking change, because UDFMethodEnum.FRAME == 'frame' is False, if we want to be completely backwards compatible we would need to use a StrEnum (either from the PyPi package or as a backport from 3.11). I think the API is/was not public though so no real requirement to do this!
  • I had to add explicit assert isinstance(udf, UDFFrameMixin) inside _run_tile for each method in order to make mypy happy, this is just additional runtime overhead!
  • The current implementation (nor the previous implementation) doesn't store the processing mode that the UDF wants to use, which adds a little bit of overhead, but also more importantly doesn't guarantee the processing mode can't change in the time between tiling scheme negotiation and actually processing tiles. On the one hand this would be a 'strange' thing for a UDF to do, but could be a hard to track down bug.
  • I haven't added any documentation yet, would have to decide if this should be public facing?

Contributor Checklist:

Reviewer Checklist:

  • /azp run libertem.libertem-data passed
  • No import of GPL code from MIT code

@codecov
Copy link

codecov bot commented Sep 11, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.37% ⚠️

Comparison is base (d839573) 67.65% compared to head (aaa5eca) 67.28%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1509      +/-   ##
==========================================
- Coverage   67.65%   67.28%   -0.37%     
==========================================
  Files         317      317              
  Lines       26233    26254      +21     
  Branches     2993     2998       +5     
==========================================
- Hits        17748    17666      -82     
- Misses       7940     8040     +100     
- Partials      545      548       +3     
Files Changed Coverage Δ
src/libertem/common/udf.py 100.00% <100.00%> (ø)
src/libertem/io/dataset/base/tiling_scheme.py 95.07% <100.00%> (+0.54%) ⬆️
src/libertem/udf/base.py 93.67% <100.00%> (+0.20%) ⬆️

... and 19 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@matbryan52
Copy link
Member Author

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@LiberTEM LiberTEM deleted a comment from azure-pipelines bot Sep 11, 2023
Copy link
Member

@sk1p sk1p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API is/was not public though so no real requirement to do this!

Indeed, I'm fine with changing the type, no need for StrEnum.

I had to add explicit assert isinstance(udf, UDFFrameMixin) inside _run_tile for each method in order to make mypy happy, this is just additional runtime overhead!

See the inline comment - the assert is the safest way, but as this is very close to the inner loop, where overheads do matter quite a bit, maybe a cast is justified here?

I haven't added any documentation yet, would have to decide if this should be public facing?

Indeed. If this is needed for #1304 or similar UDFs, it has to become a public interface - otherwise we have a two-class system (like Apple with some of their system APIs, which can't be used by third-party apps...) and it becomes hard to reason what is actually part of the public API.

I guess this needs a small paragraph in the advanced UDF docs, and maybe some changes in the current "precedence" docs?

src/libertem/io/dataset/base/tiling_scheme.py Outdated Show resolved Hide resolved
src/libertem/udf/base.py Outdated Show resolved Hide resolved
src/libertem/udf/base.py Outdated Show resolved Hide resolved
src/libertem/udf/base.py Outdated Show resolved Hide resolved
src/libertem/common/udf.py Outdated Show resolved Hide resolved
@sk1p sk1p added the enhancement New feature or request label Sep 12, 2023
@matbryan52
Copy link
Member Author

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

tests/udf/test_simple_udf.py Show resolved Hide resolved
docs/source/reference/udf.rst Show resolved Hide resolved
@matbryan52
Copy link
Member Author

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sk1p
Copy link
Member

sk1p commented Sep 14, 2023

/azp run libertem.libertem-data

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sk1p
Copy link
Member

sk1p commented Sep 14, 2023

CI failure is unrelated, I'm re-triggering failed jobs now

Copy link
Member

@sk1p sk1p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! CI is still failing, somewhere in pyxempyFAInumexpr. As far as I can see, a new numexpr release should come soon (they are aware of the breakage), so I'm inclined to just go ahead and merge now.

@sk1p sk1p merged commit f954d2e into LiberTEM:master Sep 14, 2023
28 of 32 checks passed
@matbryan52
Copy link
Member Author

Looks good! CI is still failing, somewhere in pyxempyFAInumexpr. As far as I can see, a new numexpr release should come soon (they are aware of the breakage), so I'm inclined to just go ahead and merge now.

I did also see some logs like "No space left on device" during tests on failed CI runs ?

Thanks for all the reviews and feedback!

@sk1p
Copy link
Member

sk1p commented Sep 14, 2023

Looks good! CI is still failing, somewhere in pyxempyFAInumexpr. As far as I can see, a new numexpr release should come soon (they are aware of the breakage), so I'm inclined to just go ahead and merge now.

I did also see some logs like "No space left on device" during tests on failed CI runs ?

Yeah, that's a curious issue with our ci setup - after some time, /dev/shm in some containers runs out of space - didn't manage to find time to track down the reason for that, and re-creating the containers is a quick workaround...

Thanks for all the reviews and feedback!

You're welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDFs with multiple implementations / UDF.get_method() discrepancy
2 participants