Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New definitions for fit_curve and predict_curve #420

Merged
merged 6 commits into from
Apr 29, 2023
Merged

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Mar 14, 2023

Make fit_curve usable in apply_dimension, scrap gap filling for predict_curve, clarify no data handling, etc.

@m-mohr m-mohr added this to the 2.0.0 milestone Mar 14, 2023
@m-mohr m-mohr marked this pull request as ready for review March 14, 2023 11:55
@LukeWeidenwalker
Copy link
Contributor

@ValentinaHutter you know this better than I do! :)

@soxofaan
Copy link
Member

Small side question:

throws an InvalidValues exception if invalid values are encountered

Does "encountered" here mean just in the input data, or also during the fitting procedure?

@ValentinaHutter
Copy link

I agree that finite numbers should be valid.

Small side question:

throws an InvalidValues exception if invalid values are encountered

Does "encountered" here mean just in the input data, or also during the fitting procedure?

In our implementation we actually do not consider invalid values in the fitting procedure.... At least, we explicitely exclude NaN values that naturally occur in the datasets, there. When an infinite number is in the dataset, an exception is raised and the fitting cannot be started. Let me know if we should change this :)

@m-mohr
Copy link
Member Author

m-mohr commented Mar 17, 2023

It was meant to cater for invalid values in the input data, but not sure whether and how we should cater for other cases?

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion

proposals/fit_curve.json Outdated Show resolved Hide resolved
@clausmichele
Copy link
Member

This process must be able to handle invalid data internally and should not return an error if some invalid points are present in the time series.

@m-mohr m-mohr self-assigned this Mar 29, 2023
@m-mohr
Copy link
Member Author

m-mohr commented Mar 29, 2023

@clausmichele So you are saying we should remove the InvalidValues exception and all mentions of it and instead say that invalid values must be dealt with smoothly? Is this possible in all cases?

@clausmichele
Copy link
Member

Indeed. I can't think about all the possible scenarios, but the one for which we created this process, fitting an harmonic curve to each pixel of a raster-cube, must handle NaNs occurring due to cloud masked or or invalid pixels along the time series.
Should we make this part explicit? Adding a default ignore_nodata = true ? I don't know if there could be a case where the user wants to be warned if there are invalid values in the raster-cube/time series.

@soxofaan
Copy link
Member

Doesn't it actually all depend on how the callback provided in the function parameter works?

"name": "function",
"description": "The model function. It must take the parameters to fit as array through the first argument and the independent variable `x` as the second argument.\n\nIt is recommended to store the model function as a user-defined process on the back-end to be able to re-use the model function with the computed optimal values for the parameters afterwards.",
"schema": {
"type": "object",
"subtype": "process-graph",
"parameters": [

If that function properly supports handling invalid data, then the whole fit_curve call will work. If it doesn't, then fir_curve will fail when given data that contains invalid values.

So this seems like to user's responsibility to make it work with or without invalid data support?

Or, would it be a feature of fit_curve to skip parts of the raster data cube with invalid values and only provide function the parts where the x input is fully valid?

@clausmichele
Copy link
Member

@soxofaan yes exactly! So fit_curve shouldn't check the input data, but it's the fitting function that takes care of it.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 30, 2023

I cherry-picked the single word change from this PR which seemed okay. The issues that came up now are somewhat related, but not strictly related to the intial PR. So having the first action done, we can now discuss additional changes here independantly.

About removing the InvalidValues error, what happens if a data cube containing strings is passed? The fitting function is defined for numbers only according to the schemas. I think with the wordsmithing updates from @soxofaan the process description makes sense to me, but please correct me if you think otherwise.

@m-mohr m-mohr linked an issue Mar 30, 2023 that may be closed by this pull request
@m-mohr
Copy link
Member Author

m-mohr commented Mar 31, 2023

Rough example from today's discussions:

labels = dimension_labels(datacube)
seconds = array_apply(labels, fn = to_seconds)
cube2 = apply_dimension(datacube, fn, "t", seconds)

function fit:
params = fit_curve(context, [1,1,1], fn);

predictions = predict_curve(cube2)

@m-mohr m-mohr changed the title Clarify invalid values in fit_curve New definitions for fit_curve and predict_curve Mar 31, 2023
@m-mohr m-mohr removed their assignment Mar 31, 2023
@m-mohr m-mohr requested a review from mkadunc March 31, 2023 14:26
@m-mohr m-mohr requested a review from soxofaan April 18, 2023 13:09
@m-mohr m-mohr merged commit bab65c2 into draft Apr 29, 2023
@m-mohr m-mohr deleted the fix-fit-curve branch May 4, 2023 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fit_curve return schema
6 participants