Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEcAn API: Workflow status, download input/output files, JSON workflow submission #2674

Merged
merged 31 commits into from Aug 12, 2020

Conversation

tezansahu
Copy link
Contributor

@tezansahu tezansahu commented Aug 3, 2020

  • Created API to obtain status of a workflow
  • Created API endpoint to download input & output files for a specified run
  • Created API endpoint to download files for a specified workflow
  • Created API endpoint to get information about inputs based on model_id & site_id
  • Created API endpoint to search for formats based on string matching for format_name & mimetype
  • Created API endpoint to retrieve details of a format using its ID
  • Added feature to submit workflow as a JSON
  • Added functionality to allow or restrict access to the API via authentication based on the AUTH_REQ environment variable (See issue Use environment variables in the API for Authentication Requirement & Visibility of Details queried from betyDB #2639)

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@tezansahu tezansahu self-assigned this Aug 3, 2020
@tezansahu tezansahu added the GSOC label Aug 3, 2020
apps/api/R/inputs.R Show resolved Hide resolved
apps/api/R/inputs.R Show resolved Hide resolved
apps/api/R/inputs.R Outdated Show resolved Hide resolved
apps/api/R/submit.workflow.R Outdated Show resolved Hide resolved
apps/api/pecanapi-spec.yml Show resolved Hide resolved
@tezansahu
Copy link
Contributor Author

@mdietze @robkooper I have made all the suggested modifications. Requesting you to review it once & let me know your views.

@tezansahu tezansahu requested a review from mdietze August 9, 2020 06:09
apps/api/R/inputs.R Show resolved Hide resolved
filepath <- paste0(input$file_path, "/", input$file_name)

if(! file.exists(filepath)){
res$status <- 404
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to return different error messages for the file not existing vs nrow(input)==0? Having these two cases return the same error doesn't tell the user what happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm...that's a bit inconvenient I guess because the content-type is specified to be application/octet-stream as a decorator over the function, and changing that here may not be the easiest thing to do as of now. I think we can keep this on a hold for now & meanwhile I will see if something could be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Not the end of the world. Just didn't know if there was a way to send back a message in addition to the raw status code. Definitely not worth holding up this PR for.


* __Inputs:__
* [`GET /api/inputs/`](#get-apiinputs): Search for inputs needed for a PEcAn workflow based on `model_id` & `site_id`
* [`GET /api/inputs/{input_id}`](#get-apiinputsinput_id) *: Download the desired input file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test that this works correctly when an input_id points to multiple files (e.g. a folder full of met files, one for each year)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm...Currently I implemented it in a way assuming that the input_id points to a file & not a folder. Also, the current VM that I am working on doesn't seem to have folders containing multiple met files. So would be difficult to test. Can you suggest some way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robkooper I'm not sure what's on the machine @tezansahu is testing the API on, but could you point him to an example CF met Input record where the files are annual.

@tezansahu if the API system only allows one file to be returned at a time, then I think you'd want to have an optional way to get file by name or by order in a sequence. For the latter I'm thinking something similar to how the search functions will return the first N queries, then you query again with an offset to get the next N queries. But that options doesn't seem great to me because you wouldn't really know which file you're getting so you wouldn't know how to name them. So I guess the simplest place to start (e.g. to get this PR in) is to just give a check on the number of files in an Input and if it's >1 and the file isn't requested by name then we need to throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed this in the latest commits

* __Workflows:__
* [`GET /api/workflows/`](#get-apiworkflows): Retrieve a list of PEcAn workflows
* [`POST /api/workflows/`](#post-apiworkflows): Submit a new PEcAn workflow
* [`GET /api/workflows/{id}`](#get-apiworkflowsid): Obtain the details of a particular PEcAn workflow by supplying its ID
* [`GET /api/workflows/{id}/status`](#get-apiworkflowsidstatus): Obtain the status of a particular PEcAn workflow
* [`GET /api/workflows/{id}/file/{filename}`](#get-apiworkflowsidfilefilename): Download the desired file from a PEcAn workflow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with files in subdirectories, or just files in the workflow directory itself? Also, does /api/workflows/{id} return the list of files so that the user knows which to request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, /api/workflows/{id} returns the list of files so that the user knows which to request. As of now, it fetches files in the workflow directory only....coz the ones in the workflow/runs/ are fetched by the /api/runs/{run_id}/input/{filename} & those in the workflow/out/ are fetched by the /api/runs/{run_id}/input/{filename}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't want to hold up this PR further, but we should also find a way to fetch files in the PFT directory, as a lot of sensitivity analysis results get squirreled away in there. Likewise, the PDA and SDA analyses often create additional subdirectories when run.

"http://localhost:8000/api/inputs/99000000003",
httr::authenticate("carya", "illinois")
)
writeBin(res$content, "test.2002.nc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we use the Input query like this, how are we supposed to know what the filename should be? If not provided in an example here, you should definitely make sure there are helper functions in the pecan package that does both the GET and the query for the filenames and the write. @robkooper beyond this PR, but I think it would be great if there were then core PEcAn functions that would not just use the API to grab files, but then stick them into the local DB as new dbfiles associated with the correct Input record. Would really like to get to the point where PEcAn nodes default to grabbing files from each other before attempting to download and reprocess the raw Inputs from their original sources (e.g. I could grab the met I need for a run from Shawn or Ankur if they've already run that site)

@mdietze mdietze merged commit f3a47c2 into PecanProject:develop Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants