Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIMO #1726

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open

PIMO #1726

wants to merge 37 commits into from

Conversation

jpcbertoldo
Copy link
Contributor

@jpcbertoldo jpcbertoldo commented Feb 9, 2024

πŸ“ Description

Replace #1557, which replaces the PRs from https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88 .

Implements refactors from https://github.com/jpcbertoldo/anomalib/blob/metrics/refactors/src/anomalib/utils/metrics/perimg/.refactors .

arxiv: https://arxiv.org/abs/2401.01984
medium post: https://medium.com/p/c653ac30e802
GSoC deliverable: https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88

Closes #1728 1728

✨ Changes

Select what type of change your PR is:

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • πŸ”¨ Refactor (non-breaking change which refactors the code base)
  • πŸš€ New feature (non-breaking change which adds functionality)
  • πŸ’₯ Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • πŸ“š Documentation update
  • πŸ”’ Security update

βœ… Checklist

Before you submit your pull request, please make sure you have completed the following steps:

  • πŸ“‹ I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
  • πŸ“š I have made the necessary updates to the documentation (if applicable).
  • πŸ§ͺ I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

Can we avoid this? Last I checked __future__.annotations does not behave well with jsonargparse

i thought it was necessary for the annotations like range: tuple[float, float], is it not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, (i think) i re-figured out why i used this.
In the other file, the class AUPIMOResult has annotations using the class itself (eg from_pimoresult(...) -> AUPIMOResult:) and apparently the linter didnt like it, but adding __future__.annotations solves it.
Now, idk if this is a good justification. What do you prefer to do about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our minimum python version is 3.10. I think we can safely omit __future__.annotations

@@ -0,0 +1,119 @@
"""Binary classification matrix curve (NUMBA implementation of low level functions).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

What's the convention behind naming files starting with _?

it's similar to the _ attributes inside a module

a function _func in module.py is supposed to be used within the module scope, right?

similarly, the _validate is private to per_image/, so it can be used across other modules inside the subpackage per_image but not out of it

raise ValueError(msg)


def file_path(file_path: str | Path, must_exist: bool, extension: str | None, pathlib_ok: bool) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

Maybe we can move this method in data.utils.

i am trying to avoid things out of per_image for the sake of synchronizing the standalone repo more easily
could we eventually put this in a list of "refactors for later"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand that it is already a large PR but it might be a good idea to sync with src/anomalib/data/utils/path.py. We already have utils that perform similar checks

raise ValueError(msg)


def file_paths(file_paths: list[str | Path], must_exist: bool, extension: str | None, pathlib_ok: bool) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

same here and maybe we can rename it to validate_path_are_files. A bit verbose but improves readibility imo.

so the idea of the _validate module is to have short names because the "validate" part is common to all the methods; the functions are always called with _validate.file_paths() to make the "validate" show up

sounds logical?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I get what you mean but I feel a lot of methods defined here might be useful across the repo as well. Like the same_shape and is_tensor. But It might be too much work and too many changed files. We can address it in another PR but it would be nice to then add a TODO at the top of the file and create an issue for it so that we don't forget to revisit it later.



@dataclass
class BinclfAlgorithm:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

What's the advantage of using this over enums?

i just wanted to avoid enums for the sake of simplicity, yet i wanted to put these contants together for consistency
should i switch to enum or is that ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With risk of sounding dogmatic, I am in favour of switching to enums. The mental model in my head for dataclasses is to store data, and for enums is for switching. From what I see, this is used for switching. Also, it is easy to validate enums by just wrapping the some passed value with enum type.
image
Take line 196 for example.
algorithm: str = BinclfAlgorithm.NUMBA
I understand the objective behind this but I would rather go with

class BinclfAlgorithm(Enum):
   ...
    algorithm: BinclfAlgorithm | str = BinclfAlgorithm.NUMBA,
) -> ndarray:
    algorithm =  BinclfAlgorithm(algorithm)

this has the same desired effect as validate

"The lower bound of the shared FPR integration range is not exactly achieved. "
f"Expected {fpr_lower_bound} but got {fpr_lower_bound_defacto}, which is not within {rtol=}."
)
warnings.warn(msg, RuntimeWarning, stacklevel=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

What's the advantage of using both logger.warning and warnings.warn. What happens if we want to pipe the warnings to a file and keep the console empty? A file handler can be passed to the logging module in such a scenario.

tbh i didnt know what was the right policy here 😳
i put both cuz i was sure you'd find and tell me what to do :P hehe

so should i use the logger?

on a side note: warnings.warn allows one to make this raise an exception with warnings.filterwarnings("error");
maybe that could be interesting?

for context: this warning is due to having too few points to integrate the AUC curve, which may result in unprecise AUC values

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to logger.warning

normalization_factor = aupimo_normalizing_factor(fpr_bounds)
aucs = (aucs / normalization_factor).clip(0, 1)

return threshs, shared_fpr, per_image_tprs, image_classes, aucs, num_points_integral
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinvaidya17

Do you think we should capture these in a dataclass?

there is a dataclass for the torch interface but not for the numpy interface
i couldnt figure out a nice, maintainable way to make the two versions
perhaps we can discuss this in a call? i'll explain better and maybe you'll know a good solution

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as is for now. We are planning on refactoring both the inferencers.

@jpcbertoldo
Copy link
Contributor Author

another unresolved issue from the previous PR

[about "ATTENTION..." in docstrings]

@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
…tors

Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
@samet-akcay
Copy link
Contributor

samet-akcay commented Feb 9, 2024

another unresolved issue from the previous PR

[about "ATTENTION..." in docstrings]

@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes
image

Comment on lines +6 to +13
# Original Code
# Copyright (c) 2024 @jpcbertoldo
# https://github.com/jpcbertoldo/aupimo
# SPDX-License-Identifier: MIT
#
# Modified
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good idea, but we might need to double check with GSoC guidelines in regards to code ownership.

Signed-off-by: jpcbertoldo <24547377+jpcbertoldo@users.noreply.github.com>
@jpcbertoldo
Copy link
Contributor Author

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

there are some [metadata] stuff showing up, idk why

apparently sphinx doesnt like dataclasses?
https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html#id5
this field is in the class AUPIMOResult but it's showing as if it was a function or w/e in the root (?)
while AUPIMOResult doesnt show at all

@samet-akcay
Copy link
Contributor

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

there are some [metadata] stuff showing up, idk why

apparently sphinx doesnt like dataclasses? https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html#id5 this field is in the class AUPIMOResult but it's showing as if it was a function or w/e in the root (?) while AUPIMOResult doesnt show at all

Tree structure is also messed up a bit. It might be an idea to split each metric into a separate section.

@jpcbertoldo
Copy link
Contributor Author

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html

it's not quite working as expected

  1. i expected per_image to show as submenu in metrics, how could I do that?

  2. it seems not to like dataclasses; there are attributes of PIMOResult and AUPIMOResult showing as if it was a function (?) and the classes themselves dont' show

@samet-akcay samet-akcay added this to the v1.1.0 milestone Feb 29, 2024
@samet-akcay samet-akcay added Feature and removed Dependencies Pull requests that update a dependency file Tests labels Mar 25, 2024
@samet-akcay samet-akcay modified the milestones: v1.1.0, v1.2.0 May 14, 2024
Copy link
Collaborator

@ashwinvaidya17 ashwinvaidya17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot has changed since this PR was submitted, but I've finally gotten around to reviewing it. This is a huge PR with a lot of efforts behind it. However, I have some concerns. I've gone over it once but I think I'll require a few more passes for a more thorough review. But meanwhile we can start the discussions for the current opens.

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our minimum python version is 3.10. I think we can safely omit __future__.annotations

# Original Code
# Copyright (c) 2024 @jpcbertoldo
# https://github.com/jpcbertoldo/aupimo
# SPDX-License-Identifier: MIT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we allowed to release MIT licensed work under Apache license?

@@ -16,3 +16,4 @@ torch>=2,<2.2.0 # rkde export fails even with ONNX 17 (latest) with torch 2.2.0.
torchmetrics==0.10.3
rich-argparse
open-clip-torch>=2.23.0
numba>=0.58.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now only use pyproject.toml but this comment might be relevant there as well. Maybe we need to make this requirement part of optional depencencies. There is an open issue requesting smaller core size.


def test_compare_models_pairwise_wilcoxon(scores_per_model: dict, alternative: str, higher_is_better: bool) -> None:
"""Test `compare_models_pairwise_wilcoxon`."""
from anomalib.metrics.per_image import AUPIMOResult, compare_models_pairwise_wilcoxon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind the import statement here?

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

from __future__ import annotations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove __future__ across the code. It is not relevant anymore

Numpy version docstring
=======================

{docstring}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

class BinclfAlgorithm:
"""Algorithm to use."""

PYTHON: ClassVar[str] = "python"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call it python or numpy?



@numba.jit(nopython=True)
def binclf_one_curve_numba(scores: ndarray, gts: ndarray, threshs: ndarray) -> ndarray:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something but the only difference between this and the non-numba method is the decorator. This way we might be able to reduce the file count. Maybe you can use a generic decorator that uses numba's jit when it is available

from anomalib.utils.exceptions import try_import
import numpy as np

def numba_accelerate(func):
    if try_import('numba'):
        from numba import jit
        return jit(nopython=True)(func)
    else:
        return func

@numba_accelerate
def add(a:np.ndarray, b: np.ndarray):
    return a + b

This is a quick something I tried. Feel free to polish it

"The lower bound of the shared FPR integration range is not exactly achieved. "
f"Expected {fpr_lower_bound} but got {fpr_lower_bound_defacto}, which is not within {rtol=}."
)
warnings.warn(msg, RuntimeWarning, stacklevel=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick to logger.warning

normalization_factor = aupimo_normalizing_factor(fpr_bounds)
aucs = (aucs / normalization_factor).clip(0, 1)

return threshs, shared_fpr, per_image_tprs, image_classes, aucs, num_points_integral
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave it as is for now. We are planning on refactoring both the inferencers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PIMO metric to anomalib
3 participants