New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIMO #1726
base: main
Are you sure you want to change the base?
PIMO #1726
Conversation
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this? Last I checked
__future__.annotations
does not behave well with jsonargparse
i thought it was necessary for the annotations like range: tuple[float, float]
, is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, (i think) i re-figured out why i used this.
In the other file, the class AUPIMOResult
has annotations using the class itself (eg from_pimoresult(...) -> AUPIMOResult:
) and apparently the linter didnt like it, but adding __future__.annotations
solves it.
Now, idk if this is a good justification. What do you prefer to do about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our minimum python version is 3.10. I think we can safely omit __future__.annotations
@@ -0,0 +1,119 @@ | |||
"""Binary classification matrix curve (NUMBA implementation of low level functions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the convention behind naming files starting with
_
?
it's similar to the _ attributes inside a module
a function _func in module.py is supposed to be used within the module scope, right?
similarly, the _validate is private to per_image/, so it can be used across other modules inside the subpackage per_image but not out of it
raise ValueError(msg) | ||
|
||
|
||
def file_path(file_path: str | Path, must_exist: bool, extension: str | None, pathlib_ok: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move this method in data.utils.
i am trying to avoid things out of per_image for the sake of synchronizing the standalone repo more easily
could we eventually put this in a list of "refactors for later"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand that it is already a large PR but it might be a good idea to sync with src/anomalib/data/utils/path.py
. We already have utils that perform similar checks
raise ValueError(msg) | ||
|
||
|
||
def file_paths(file_paths: list[str | Path], must_exist: bool, extension: str | None, pathlib_ok: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and maybe we can rename it to validate_path_are_files. A bit verbose but improves readibility imo.
so the idea of the _validate module is to have short names because the "validate" part is common to all the methods; the functions are always called with _validate.file_paths() to make the "validate" show up
sounds logical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I get what you mean but I feel a lot of methods defined here might be useful across the repo as well. Like the same_shape
and is_tensor
. But It might be too much work and too many changed files. We can address it in another PR but it would be nice to then add a TODO at the top of the file and create an issue for it so that we don't forget to revisit it later.
|
||
|
||
@dataclass | ||
class BinclfAlgorithm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of using this over enums?
i just wanted to avoid enums for the sake of simplicity, yet i wanted to put these contants together for consistency
should i switch to enum or is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With risk of sounding dogmatic, I am in favour of switching to enums. The mental model in my head for dataclasses is to store data, and for enums is for switching. From what I see, this is used for switching. Also, it is easy to validate enums by just wrapping the some passed value with enum type.
Take line 196 for example.
algorithm: str = BinclfAlgorithm.NUMBA
I understand the objective behind this but I would rather go with
class BinclfAlgorithm(Enum):
...
algorithm: BinclfAlgorithm | str = BinclfAlgorithm.NUMBA,
) -> ndarray:
algorithm = BinclfAlgorithm(algorithm)
this has the same desired effect as validate
"The lower bound of the shared FPR integration range is not exactly achieved. " | ||
f"Expected {fpr_lower_bound} but got {fpr_lower_bound_defacto}, which is not within {rtol=}." | ||
) | ||
warnings.warn(msg, RuntimeWarning, stacklevel=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of using both logger.warning and warnings.warn. What happens if we want to pipe the warnings to a file and keep the console empty? A file handler can be passed to the logging module in such a scenario.
tbh i didnt know what was the right policy here π³
i put both cuz i was sure you'd find and tell me what to do :P hehe
so should i use the logger?
on a side note: warnings.warn allows one to make this raise an exception with warnings.filterwarnings("error");
maybe that could be interesting?
for context: this warning is due to having too few points to integrate the AUC curve, which may result in unprecise AUC values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick to logger.warning
normalization_factor = aupimo_normalizing_factor(fpr_bounds) | ||
aucs = (aucs / normalization_factor).clip(0, 1) | ||
|
||
return threshs, shared_fpr, per_image_tprs, image_classes, aucs, num_points_integral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should capture these in a dataclass?
there is a dataclass for the torch interface but not for the numpy interface
i couldnt figure out a nice, maintainable way to make the two versions
perhaps we can discuss this in a call? i'll explain better and maybe you'll know a good solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it as is for now. We are planning on refactoring both the inferencers.
another unresolved issue from the previous PR [about "ATTENTION..." in docstrings]
how can i check that? |
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
β¦tors Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
|
# Original Code | ||
# Copyright (c) 2024 @jpcbertoldo | ||
# https://github.com/jpcbertoldo/aupimo | ||
# SPDX-License-Identifier: MIT | ||
# | ||
# Modified | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good idea, but we might need to double check with GSoC guidelines in regards to code ownership.
there are some [metadata] stuff showing up, idk why apparently sphinx doesnt like dataclasses? |
Tree structure is also messed up a bit. It might be an idea to split each metric into a separate section. |
https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html it's not quite working as expected
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot has changed since this PR was submitted, but I've finally gotten around to reviewing it. This is a huge PR with a lot of efforts behind it. However, I have some concerns. I've gone over it once but I think I'll require a few more passes for a more thorough review. But meanwhile we can start the discussions for the current opens.
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our minimum python version is 3.10. I think we can safely omit __future__.annotations
# Original Code | ||
# Copyright (c) 2024 @jpcbertoldo | ||
# https://github.com/jpcbertoldo/aupimo | ||
# SPDX-License-Identifier: MIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we allowed to release MIT licensed work under Apache license?
@@ -16,3 +16,4 @@ torch>=2,<2.2.0 # rkde export fails even with ONNX 17 (latest) with torch 2.2.0. | |||
torchmetrics==0.10.3 | |||
rich-argparse | |||
open-clip-torch>=2.23.0 | |||
numba>=0.58.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now only use pyproject.toml but this comment might be relevant there as well. Maybe we need to make this requirement part of optional depencencies. There is an open issue requesting smaller core size.
|
||
def test_compare_models_pairwise_wilcoxon(scores_per_model: dict, alternative: str, higher_is_better: bool) -> None: | ||
"""Test `compare_models_pairwise_wilcoxon`.""" | ||
from anomalib.metrics.per_image import AUPIMOResult, compare_models_pairwise_wilcoxon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale behind the import statement here?
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove __future__
across the code. It is not relevant anymore
Numpy version docstring | ||
======================= | ||
|
||
{docstring} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
class BinclfAlgorithm: | ||
"""Algorithm to use.""" | ||
|
||
PYTHON: ClassVar[str] = "python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it python or numpy?
|
||
|
||
@numba.jit(nopython=True) | ||
def binclf_one_curve_numba(scores: ndarray, gts: ndarray, threshs: ndarray) -> ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed something but the only difference between this and the non-numba method is the decorator. This way we might be able to reduce the file count. Maybe you can use a generic decorator that uses numba's jit when it is available
from anomalib.utils.exceptions import try_import
import numpy as np
def numba_accelerate(func):
if try_import('numba'):
from numba import jit
return jit(nopython=True)(func)
else:
return func
@numba_accelerate
def add(a:np.ndarray, b: np.ndarray):
return a + b
This is a quick something I tried. Feel free to polish it
"The lower bound of the shared FPR integration range is not exactly achieved. " | ||
f"Expected {fpr_lower_bound} but got {fpr_lower_bound_defacto}, which is not within {rtol=}." | ||
) | ||
warnings.warn(msg, RuntimeWarning, stacklevel=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick to logger.warning
normalization_factor = aupimo_normalizing_factor(fpr_bounds) | ||
aucs = (aucs / normalization_factor).clip(0, 1) | ||
|
||
return threshs, shared_fpr, per_image_tprs, image_classes, aucs, num_points_integral |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it as is for now. We are planning on refactoring both the inferencers.
π Description
Replace #1557, which replaces the PRs from https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88 .
Implements refactors from https://github.com/jpcbertoldo/anomalib/blob/metrics/refactors/src/anomalib/utils/metrics/perimg/.refactors .
arxiv: https://arxiv.org/abs/2401.01984
medium post: https://medium.com/p/c653ac30e802
GSoC deliverable: https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88
Closes #1728 1728
β¨ Changes
Select what type of change your PR is:
β Checklist
Before you submit your pull request, please make sure you have completed the following steps:
For more information about code review checklists, see the Code Review Checklist.