Speed up the calculation of the penalization matrix for FDataGrid #519

Ddelval · 2023-03-01T21:29:28Z

Taking advantage of the sparsity of the penalization matrix, it is possible to improve the time complexity of its calculation significantly

From this point on, memory usage might start to be a problem. Thus, it does not make sense to optimize the running time of the algoritm much more.

skfda/misc/operators/_linear_differential_operator.py

codecov · 2023-03-01T21:34:19Z

Codecov Report

Patch coverage: 93.25% and project coverage change: +0.01 🎉

Comparison is base (0154d69) 85.75% compared to head (f69db8f) 85.77%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #519      +/-   ##
===========================================
+ Coverage    85.75%   85.77%   +0.01%     
===========================================
  Files          144      145       +1     
  Lines        11428    11500      +72     
===========================================
+ Hits          9800     9864      +64     
- Misses        1628     1636       +8

Impacted Files	Coverage Δ
skfda/representation/basis/_grid_basis.py	`80.00% <80.00%> (ø)`
...da/misc/operators/_linear_differential_operator.py	`95.74% <96.07%> (-0.18%)`	⬇️
skfda/preprocessing/dim_reduction/_fpca.py	`89.84% <100.00%> (-0.08%)`	⬇️
skfda/representation/basis/_custom_basis.py	`86.25% <100.00%> (ø)`
skfda/tests/test_linear_differential_operator.py	`98.50% <100.00%> (+0.39%)`	⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Otherwise, mypy is not able to infer a type for it. I think that is due to the mutability introduced by having it as a paramter in a function call.

skfda/misc/operators/_linear_differential_operator.py

With these changes, a FDataBasis using a GridBasis cannot be evaluated. This functionality is removed since, when evaluation is required, an FDataGrid should be used

skfda/misc/operators/_linear_differential_operator.py

skfda/tests/test_fpca_regression.py

skfda/misc/operators/_linear_differential_operator.py

…alization

skfda/misc/operators/_linear_differential_operator.py

skfda/tests/test_linear_differential_operator.py

skfda/misc/operators/_linear_differential_operator.py

skfda/tests/test_linear_differential_operator.py

skfda/representation/basis/_grid_basis.py

skfda/misc/operators/_linear_differential_operator.py

skfda/representation/basis/_custom_basis.py

vnmabus · 2023-06-23T14:29:11Z

skfda/representation/basis/_grid_basis.py

+        return f"{type(self).__name__}(grid_points={self.grid_points}) "
+
+    def __hash__(self) -> int:
+        return hash((super(), f"{self.grid_points}"))


Is grid_points hashable?

@vnmabus,
By itself no. But converted to a string it is no longer mutable so I would hope so. I might be missing something but that's why it's in an f-string.

Uh-uh, that is a bit fishy. I would convert it to tuples instead or something like that (see what BSplineBasis does with the knots).

In the long term, I think that the hash for the bases should not depend on the attributes, as the class is mutable.

I would consider that changing the grid_points results in a different grid basis. That is why I included it in the hash. I think it defines the identity of the basis.
I have changed it to use a tuple instead of convertin to string

That is true. However, we only need that objects that compare equal have the same hash. The opposite, although desirable for performance reasons, is not required. Note that our only reason for having hashable bases in the first case is because they are an attribute for the FDataBasis ExtensionDType, and pandas requires that those are hashable.

I think that in the long term we should just hash the class type itself or something like that.

I could change the grid_points definition to be a property. That would make it inmutable and then the hash would make more sense.
I understand that hashing the class type itself is enough for the current needs. However, providing hashable objets with good performance is not a bad thing. Someone might want to store functional data in a dictionary or a set.

For dictionaries, only the keys need to be hashable. Normally, mutable objects such as lists or arrays are not made hashable on purpose.

Yeah, you are right. I was also thinking of sets, and I got them mixed up.
To me, GridBasis does seem like it should be an immutable object. And it is fully characterized by the discretization points. With that said, since we do not have a need to implement effective hash methods, I think it is not unreasonable just to use the basis type as a hash. But then the method would be basically returning a constant.

Personally, I'd rather change the grid_points to a property, making the object immutable, and keep the points in the hash. But I think the other option also makes sense. In any case, let me know what you think because I want to get to closing this PR rather sooner than later. The PLS functionality depends on this, and I'd like to have a clean PLS PR at this point (with these commits integrated into develop).

I think I will accept this PR in its current state. If we change the hash behaviour for basis objects, that should be done in another PR.

For me, at the beginning I though of basis objects as immutable too. However, they were several disadvantages with that approach, and I am reconsidering it now. First, Python does not have first-class support for custom immutable objects, apart from frozen dataclasses. Second, and more important, the current approach makes difficult or even impossible the choice of basis parameters in a BasisSmoother transformer using hyperparameter selection tools such as GridSearchCV. For example, for choosing the number of elements in the basis, one has to create different Basis objects with the appropriate parameter set. Choosing several basis parameters at the same time is too complex for average users.

Forgetting the idea of immutable basis, we can make Basis inherit from BaseEstimator, thus allowing it to play well with the hyperparameter selection tools in scikit-learn. I think we should investigate this approach further.

Ddelval added 4 commits March 1, 2023 22:24

Usable up to 2000 points

c21d72a

Regularization usable up to 10 000 points.

258027d

From this point on, memory usage might start to be a problem. Thus, it does not make sense to optimize the running time of the algoritm much more.

Improve regularization optimization

fa9e86d

Complete documentation

d106fbb

github-actions bot reviewed Mar 1, 2023

View reviewed changes

Assign the fdata before using it.

5d6ccda

Otherwise, mypy is not able to infer a type for it. I think that is due to the mutability introduced by having it as a paramter in a function call.

Ddelval requested a review from vnmabus March 1, 2023 22:33

vnmabus requested changes Mar 5, 2023

View reviewed changes

Ddelval added 2 commits March 7, 2023 01:23

Simplify custombasis_penalty_matrix_optimized

bb05968

Improve fdatagrid penalty matrix calculation

6d60b05

github-actions bot reviewed Mar 7, 2023

View reviewed changes

skfda/misc/operators/_linear_differential_operator.py Outdated Show resolved Hide resolved

github-actions bot reviewed Mar 7, 2023

View reviewed changes

Ddelval added 5 commits March 7, 2023 01:40

Remove weird imports introduced by autocomplete

d714160

Avoid circular import

3a950c6

Barebones implementation of GridBasis

5a19294

Use gridBasis in fpca regularization

350bce4

Complete grid_basis implementation

138d678

Ddelval marked this pull request as draft March 22, 2023 16:05

Ddelval added 2 commits April 16, 2023 01:53

Simplify GridBasis implementation

6b08ed7

With these changes, a FDataBasis using a GridBasis cannot be evaluated. This functionality is removed since, when evaluation is required, an FDataGrid should be used

Extract fdatagrid specialization to function

8f39f18

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/misc/operators/_linear_differential_operator.py Show resolved Hide resolved

github-actions bot reviewed Apr 16, 2023

View reviewed changes

Style fixes

aba68ec

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/misc/operators/_linear_differential_operator.py Outdated Show resolved Hide resolved

skfda/misc/operators/_linear_differential_operator.py Show resolved Hide resolved

Ddelval added 5 commits April 16, 2023 02:10

More style changes

ed68f68

Merge remote-tracking branch 'origin/develop' into feature/speedUpPen…

9efe090

…alization

Fix unregistered function

3bfab32

Add _optimized_operator_evaluation_in_grid example

f8d4554

Add a test for the GridBasis penalty matrix

c1389d9

Sort imports

188e7bd

github-actions bot reviewed Apr 16, 2023

View reviewed changes

Style fixes

93bd28f

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/misc/operators/_linear_differential_operator.py Show resolved Hide resolved

skfda/misc/operators/_linear_differential_operator.py Show resolved Hide resolved

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/tests/test_linear_differential_operator.py Outdated Show resolved Hide resolved

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/tests/test_linear_differential_operator.py Outdated Show resolved Hide resolved

Ddelval added 3 commits April 16, 2023 11:57

Remove trailing whitespace

979e8d2

Fix docstring spacing

5d0a9e0

Remove left-over parameter

7bdc97d

github-actions bot reviewed Apr 16, 2023

View reviewed changes

skfda/representation/basis/_grid_basis.py Outdated Show resolved Hide resolved

Remove left-over import

4b46b1c

Ddelval marked this pull request as ready for review April 16, 2023 10:46

Ddelval requested a review from vnmabus April 16, 2023 10:46

vnmabus requested changes Jun 16, 2023

View reviewed changes

Ddelval added 5 commits June 21, 2023 22:45

Use constant weights in linear_differential_operator

73a9c79

Fix typo

7b480c4

Tidy grid basis implementation

1ec886e

Change grid basis hash calculation

ff23777

Add missing empty line between methods

56fe4de

github-actions bot reviewed Jun 21, 2023

View reviewed changes

skfda/representation/basis/_custom_basis.py Show resolved Hide resolved

Merge branch 'develop' into feature/speedUpPenalization

cd92f80

Ddelval requested a review from vnmabus June 22, 2023 06:24

vnmabus requested changes Jun 23, 2023

View reviewed changes

Ddelval added 2 commits June 23, 2023 17:37

Merge branch 'develop' into feature/speedUpPenalization

6757f1b

Use tuples in the hash calculation

f69db8f

vnmabus approved these changes Jun 24, 2023

View reviewed changes

vnmabus merged commit c40e372 into develop Jun 24, 2023
16 of 17 checks passed

vnmabus deleted the feature/speedUpPenalization branch August 18, 2023 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up the calculation of the penalization matrix for FDataGrid #519

Speed up the calculation of the penalization matrix for FDataGrid #519

Ddelval commented Mar 1, 2023

codecov bot commented Mar 1, 2023 •

edited

Loading

vnmabus Jun 23, 2023

Ddelval Jun 23, 2023

vnmabus Jun 23, 2023

Ddelval Jun 23, 2023

vnmabus Jun 23, 2023

Ddelval Jun 23, 2023

vnmabus Jun 23, 2023

Ddelval Jun 23, 2023

vnmabus Jun 24, 2023

Speed up the calculation of the penalization matrix for FDataGrid #519

Speed up the calculation of the penalization matrix for FDataGrid #519

Conversation

Ddelval commented Mar 1, 2023

codecov bot commented Mar 1, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 1, 2023 •

edited

Loading