Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata.json has changed significantly #102

Open
biodavidjm opened this issue Nov 16, 2020 · 4 comments
Open

metadata.json has changed significantly #102

biodavidjm opened this issue Nov 16, 2020 · 4 comments

Comments

@biodavidjm
Copy link

I have noticed that the latest caper version 1.4.2 does not constantly write the metadata.json file, which I guess it's a good thing because certainly, the server crashed if too many files have to be written on the metadata.json file. So this is great because a gigantic job that I run using the previous caper version was always crashing, and now it completes the job.

But I still need the metadata.json file. The way I found to generate the metadata.json file is by running this command:

caper metadata e8c2155f-ee2c-4eac-8aa9-a32cdbbd4de0 > metadata.json

But to my surprise, there are many changes in the structure of the file, for example:

  1. When the job is re-run and most of the previous (failed) runs are cached, most of the values printed in the metadata.json file do not work anymore, i.e, the "job / bucket id" is not updated. This means that if the metadata shows, for example:
gs://proteomics-pipeline/results/proteomics_msgfplus/87b2ae48-889a-4ece-a83e-2f0e77122392/call-msconvert_mzrefiner/shard-0/stdout

in reality, that stdout is not in that bucket folder, but in the previous job

gs://proteomics-pipeline/results/proteomics_msgfplus/e8c2155f-ee2c-4eac-8aa9-a32cdbbd4de0/call-msconvert_mzrefiner/shard-0/stdout
  1. The json key-value "commandLine" has disappeared and it's not available anymore!! any other way to find the command that was run?
  2. Could the metadata.json be written to the original bucket where all the output data is located instead of to the local VM folder from where caper metadata command is run?

Thanks a lot for this great tool!

@leepc12
Copy link
Contributor

leepc12 commented Nov 16, 2020

What Cromwell version are you using? Please check cromwell= in your conf file ~/.caper/default.conf. If you don't have it there then you are using default Cromwell 52 (Caper v1.4.2).

I think those changes in metadata.json are due to change of Cromwell versions. Old caper uses old Cromwell.

1:
I think it's a known bug of Cromwell. Upgrade Cromwell to the latest and see if it's fixed. I also observed that call-cached task sometimes have wrong file paths (which should not even exist due to call-caching) written in metadata.json.

  1. I actually don't know about the key commandLine. You can look into script file to get actual command lines for a task.

  2. You can use gsutil cp.

$ caper metadata WORKFLOW_ID > metadata.json
$ WORKFLOW_ROOT=$(cat `metadata.json` | jq .workflowRoot)
$ gsutil cp metadata.json $WORKFLOW_ROOT/

@biodavidjm
Copy link
Author

Hi @leepc12

  1. I am using the latest version available when caper is installed, i.e. cromwell-52.jar. What version should I use? this version of Cromwell was working perfectly fine with caper 1.4.1
  2. It was available in this line of the metadata.json file
        "pipeline.masic": [
            {
                "preemptible": false,
                "executionStatus": "Done",
                "stdout": "gs://whatever-pipeline/results/pipeline/85b88fd2-9669-4fd0-b605-7a2cd23591b6/call-masic/shard-0/stdout",
                "backendStatus": "Success",
                "compressedDockerSize": 254658935,
                "commandLine": "echo \"STEP 0: Ready to run MASIC\"\n\nmono /app/masic/MASIC_Console.exe \\\n/I:/cromwell_root/proteomics-pipeline/test/raw/global/MoTrPAC_Pilot_TMT_W_S1_01_12Oct17_Elm_AQ-17-09-02.raw \\\n/P:/cromwell_root/proteomics-pipeline/parameters/TMT10_LTQ-FT_10ppm_ReporterTol0.003Da_2014-08-06.xml \\\n/O:output_masic",
                "shardIndex": 0,

That commandLine key value was very useful and it is now gone

  1. Yes, that I know it ;-) But it would be great if caper would take care of putting the metadata.json file there, just as before 1.4.2. For example, write the metadata.json file to the working directory once the job is "done" or "failed" without the need of calling the command and create the file locally.

Again, thank you very much for the great tool

@leepc12
Copy link
Contributor

leepc12 commented Nov 16, 2020

  1. Try with the latest one. Find URLs for cromwell and womtool on Cromwell's github releases page and define them in the conf file (cromwell=http://.../cromwell-VER.jar and womtool=http://.../wotmool-VER.jar).

  2. I actually don't know, Caper just wraps Cromwell's REST API to retrieve metadata.

  3. Maybe I can add some parameter like caper metadata WORKFLOW_ID --write-on-workflow-root so that it writes to a file on the bucket instead of printing out to STDOUT.

@biodavidjm
Copy link
Author

Maybe I can add some parameter like caper metadata WORKFLOW_ID --write-on-workflow-root so that it writes to a file on the bucket instead of printing out to STDOUT.

Adding that parameter would be extremely helpful! Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants