Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDataGrid.restrict option with_bounds #561

Merged
merged 13 commits into from
Aug 17, 2023
Merged

Conversation

ego-thales
Copy link
Contributor

@ego-thales ego-thales commented Aug 11, 2023

Implemented proposed enhancement from #560.

A few notes to keep in mind when (or before) merging:

  • with_bounds defaults to False,
  • With high dim_domain, domain's boundary potentially contains a lot of points, which might end up pretty slow. That said, I guess it has to happen when manipulating a functional object on a huge domain size,
  • No 1D optimization was made. I don't know the general spirit of your implementation, but if you generally treat 1D differently with lighter/faster code, then this should also be done here.

Waiting for your feedback!

Cheers :)
Élie

@ego-thales
Copy link
Contributor Author

I now notice that this option potentially modifies the result of every FDataGrid.integrate call with a domain argument. This probably explains the failed tests.

One easy solution would be to set default value to False.
The other option is to go through every failed test and check whether or not the difference can indeed be due to the more precise calculus.

In turn, it might be necessary to change some doc example, I'm not sure about it.

In any case, I will wait for your feedback and insights regarding this, bcause you have a much deeper view of the whole package.
:)

@vnmabus
Copy link
Member

vnmabus commented Aug 11, 2023

The "proper" way to do it would be:

  • Decide if we want that the user can include the bounds as new points. As far as I see it, the main advantage is increased accuracy in the evaluation, while the main issue is that we add points that were not directly measured. Both behaviors may be surprising.
  • Note that there is also a third option (keep ALL points, just change the domain_range attribute) which could in principle be implemented. I think that currently the FDataGrid constructor disallows having grid points outside the domain range, but this invariant is never used AFAIK, so it could be removed. This approach keeps the original accuracy and does not add "synthetic" points, but it is probably wasteful, as most points are no longer needed.
  • Thus, we need to discuss which options could be selected by the user, and if a bool is enough for that choice or we need an enum or a Literal parameter.
  • We would probably want to define this method in a way that can be extended in the future to other FData subclasses such as FDataBasis and the future FDataIrregular class. For the former these options make no sense, but it make sense for the latter, so it would be better to take that into account from the beginning.
  • Once we have decided the possible choices, we have to pick a default one, looking at the pros and cons of each.
  • If the default option is different from the current one, we ideally should keep the old default with a deprecation warning for a while (requesting the user to explicitly choose an option to remove the warning). If we see that this method has no users outside the package, we can skip this. We could in principle do breaking changes as we are still in 0.x version, but it is not very polite.
  • We have to check the internal uses of this method to choose the more appropriate option for each.

@eliegoudout
Copy link
Contributor

Hi,
Just a quick answer from my personal phone regarding other possible ways to do this.

I will argue against keeping outside points for 2 reasons:

  • It will make interpolation vs. extrapolation blurry. For example, how to you extrapolate at x = -.09, with domain [0, 1] and grid points -.1, .1, .3? If you use -.1 value, then you're much closer to interpolation than extrapolation, and if you don't, you lose precision.
  • The general idea of "restricting" IS, imo, forgetting about outside information. Imagine wanting to test out an extrapolation method, then it is important that restriction doesn't know about outside values.

I agree that using Tru as default value might not be polite. I thus think that if we end up deciding that it would be the best option, then the way to go surely is warning users during the next release (or more), before applying the change.

Is there a way to know more about FDataIrregular?

I think a good way to choose the defaul value will be to test the accuracy gains on plausible cases, as well as execution time lost.

@vnmabus
Copy link
Member

vnmabus commented Aug 15, 2023

* It will make interpolation vs. extrapolation blurry. For example, how to you extrapolate at `x = -.09`, with domain `[0, 1]` and grid points `-.1, .1, .3`? If you use `-.1` value, then you're much closer to interpolation than extrapolation, and if you don't, you lose precision.

That is true, although it is more a semantic issue. By default, the extrapolation strategy of FDataGrid delegates to interpolation. If you change it to something else, it would be applied to the points outside the domain, as usual.

* The general idea of "restricting" **IS**, imo, forgetting about outside information. Imagine wanting to test out an extrapolation method, then it is important that restriction doesn't know about outside values.

Extrapolation would be applied for the points outside the domain, in spite of having values measured for them. The default extrapolation, of course, uses these values.

Is there a way to know more about FDataIrregular?

Yes, it is a class to represent functions measured at different points (not a fixed grid). It was first proposed in #498 and a tentative implementation is in #536.

I think a good way to choose the defaul value will be to test the accuracy gains on plausible cases, as well as execution time lost.

I want to discuss also the issue with the other people involved in the project, as they may provide more insights. They are currently on vacation but we will meet again in September.

skfda/representation/grid.py Outdated Show resolved Hide resolved
skfda/representation/grid.py Outdated Show resolved Hide resolved
skfda/tests/test_grid.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02% 🎉

Comparison is base (6a066f7) 86.02% compared to head (df19f39) 86.05%.

❗ Current head df19f39 differs from pull request most recent head e1dfe7d. Consider uploading reports for the commit e1dfe7d to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #561      +/-   ##
===========================================
+ Coverage    86.02%   86.05%   +0.02%     
===========================================
  Files          147      147              
  Lines        11660    11678      +18     
===========================================
+ Hits         10031    10049      +18     
  Misses        1629     1629              
Files Changed Coverage Δ
skfda/representation/grid.py 86.50% <100.00%> (+0.25%) ⬆️
skfda/tests/test_grid.py 99.36% <100.00%> (+0.04%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skfda/tests/test_grid.py Outdated Show resolved Hide resolved
skfda/tests/test_grid.py Outdated Show resolved Hide resolved
skfda/tests/test_grid.py Outdated Show resolved Hide resolved
skfda/tests/test_grid.py Outdated Show resolved Hide resolved
skfda/representation/grid.py Outdated Show resolved Hide resolved
dim_points = grid_points[dim]
left = [a] * (a < dim_points[0])
right = [b] * (b > dim_points[-1])
grid_points[dim] = np.concatenate((left, dim_points, right))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Mypy error in this line seems to be numpy/numpy#20901. It surprises me, as I think that it should be fixed in the most recent NumPy versions.

vnmabus
vnmabus previously approved these changes Aug 17, 2023
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one silly suggestion by the style checker. The rest LGTM.

skfda/tests/test_grid.py Outdated Show resolved Hide resolved
Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@vnmabus vnmabus merged commit 02d26bc into GAA-UAM:develop Aug 17, 2023
8 of 9 checks passed
@vnmabus
Copy link
Member

vnmabus commented Aug 17, 2023

@all-contributors please add @ego-thales for his code contributions and ideas.

@allcontributors
Copy link
Contributor

@vnmabus

I've put up a pull request to add @ego-thales! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants