Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_ml_model id: overlap between batch job id and file path? #384

Closed
soxofaan opened this issue Sep 14, 2022 · 7 comments · Fixed by #413
Closed

load_ml_model id: overlap between batch job id and file path? #384

soxofaan opened this issue Sep 14, 2022 · 7 comments · Fixed by #413
Labels
help wanted Extra attention is needed ML must-have
Milestone

Comments

@soxofaan
Copy link
Member

"title": "Batch Job ID",
"description": "Loading a model by batch job ID is possible only if a single model has been saved by the job. Otherwise, you have to load a specific model from a batch job by URL.",
"type": "string",
"subtype": "job-id",
"pattern": "^[\\w\\-\\.~]+$"
},
{
"title": "User-uploaded File",
"type": "string",
"subtype": "file-path",
"pattern": "^[^\r\n\\:'\"]+$"

Batch job id has regex "^[\\w\\-\\.~]+$" and user-uploaded file has regex "^[^\r\n\\:'\"]+$".
Both regexes will match on simple things like foo-bar, so a back-end will not be able to determine if given id is intended as job id or a user uploaded file? Or is a back-end supposed to try the available options and pick the first match?

@m-mohr
Copy link
Member

m-mohr commented Sep 14, 2022

True, that's not ideal. Don't have a good idea yet how to resolve it though...

@m-mohr
Copy link
Member

m-mohr commented Nov 28, 2022

The only thing I could think of right now is to require e.g. a "./" at the beginning of file paths. This doesn't feel ideal though.
Alternatively, back-ends would need to be given a priority list, I guess... Any opinions or ideas?

@m-mohr m-mohr added this to the 1.3.0 milestone Nov 28, 2022
@m-mohr m-mohr added the help wanted Extra attention is needed label Nov 28, 2022
@soxofaan
Copy link
Member Author

another option is requiring some kind of scheme prefix when there is risk of confusion, e.g. workspace://mymodel.foo, which extends easily to other storage solutions or "namespaces" (s3://..., https://..., ... or even job:... and job://otherbackend/...)

@m-mohr
Copy link
Member

m-mohr commented Nov 29, 2022

Yeah, I thought about that too, but I found that even more confusing/unexpected. 🤔

@m-mohr
Copy link
Member

m-mohr commented Jan 31, 2023

Maybe define two processes? Alternatively, I'd say we say that it tries the batch job ID first and otherwise loads the file (because you can change file names but not batch job IDs.)

@m-mohr m-mohr added the ML label Jan 31, 2023
@m-mohr m-mohr modified the milestones: 1.3.0, 2.0.0 Feb 1, 2023
m-mohr added a commit that referenced this issue Mar 8, 2023
@m-mohr m-mohr linked a pull request Mar 8, 2023 that will close this issue
@m-mohr
Copy link
Member

m-mohr commented Mar 8, 2023

I thought a bit more about this and I think we can simply remove the batch job id subtype. A batch job ID can also just be provided via the uri subtype. Either you provide the canonical (i.e. "public") link and then it's just like any external data. Or you can simply provide a URI such as https://example.com/api/v1.0/jobs/12345/results and then you can easily detect the job id from it. The clients can help with it by allowing Job objects as input for example.

I created PR #413 for it.

@m-mohr m-mohr modified the milestones: 2.0.0, 2.1.0 Mar 10, 2023
m-mohr added a commit that referenced this issue Mar 31, 2023
* Remove job-id subtype #384
* load_result -> load_stac
@m-mohr
Copy link
Member

m-mohr commented Sep 30, 2023

Solved by #413, I think.

@m-mohr m-mohr closed this as completed Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed ML must-have
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants