Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MegaQC doesn't handle data when module was run multiple times #2203

Open
4 tasks done
fgvieira opened this issue Nov 30, 2023 · 14 comments · May be fixed by #2206
Open
4 tasks done

MegaQC doesn't handle data when module was run multiple times #2203

fgvieira opened this issue Nov 30, 2023 · 14 comments · May be fixed by #2206

Comments

@fgvieira
Copy link
Contributor

fgvieira commented Nov 30, 2023

Description of bug

I am trying to plot the FastQC module multiple times (like here) but, even though all looks good in the html report, only the last data is stored in the json. That is, in the example above, only the FastQC (trimmed) data is stored in the json; is this intended?

Would it be possible to save both? Maybe by renaming fastqc_sequence_length_distribution_plot to fastqc_trimmed-sequence_length_distribution_plot and fastqc_raw-sequence_length_distribution_plot (like in the rest of the report)?

I am trying to load MultiQC reports into MegaQC and, as it is now, only the last one is stored in the database.

Before submitting

  • I have read the troubleshooting documentation.
  • I am using the latest release of MultiQC.
  • I have included a full MultiQC log, not truncated.
  • I have attached an input file (.zip if necessary) that triggers the error.
@vladsavelyev vladsavelyev added the bug: core Bug in the main MultiQC code label Dec 1, 2023
@vladsavelyev
Copy link
Member

Thank you @fgvieira for reporting an issue - we definitely should save both, sounds like a bug. I'll take a look now

@vladsavelyev
Copy link
Member

vladsavelyev commented Dec 1, 2023

I tried to reproduce the issue by running on that exact example of fastqc data:

cd test_data/data/modules/fastqc/issue_843
multiqc .

I can see data for both sections in multiqc_data/multiqc_data.json. E.g. there are FastQC (raw) and FastQC (trimmed) versions of each metrics, and there are data for both plots in your example:

grep fastqc_sequence_length_distribution_plot multiqc_data/multiqc_data.json
        "fastqc_sequence_length_distribution_plot": {
                "id": "fastqc_sequence_length_distribution_plot",
        "fastqc_sequence_length_distribution_plot-1": {
                "id": "fastqc_sequence_length_distribution_plot-1",

The second one has a suffix appended in the end. Agree that it makes sense to use the anchor from the config for better readability, instead of a -1 numeric suffix.

@fgvieira
Copy link
Contributor Author

fgvieira commented Dec 1, 2023

Which version are you using?
Because it is not on the JSON file multiqc_data.json

@vladsavelyev
Copy link
Member

vladsavelyev commented Dec 1, 2023

I tried both 1.18 (af792aa) (the latest stable release installed with pip install multiqc==1.18) as well as master (1.19.dev0 (af792aa).

I wonder if it has to do with the input data? I can see e.g. both fastqc_per_base_sequence_quality_plot and fastqc_per_base_sequence_quality_plot-1 in your JSON. And sequence_length_distribution-1 might be missing because the sequence_length_distribution data was missing in the FastQC output for the FastQC (raw) samples? Wonder if you run MultiQC in verbose mode with -v, it will print sequence_length_distribution not found in FastQC reports into the stderr?

If you could attach the FastQC reports are you running on, along with the multiqc_config.yaml, that would be really helpful for me to reproduce the issue!

@vladsavelyev vladsavelyev added the waiting: example data Needs example data before we can proceed label Dec 1, 2023
@fgvieira
Copy link
Contributor Author

fgvieira commented Dec 1, 2023

I did not see anything out of the ordinary, but here is some example data:

fastqc_raw.zip
fastqc_trim.zip
multiqc_config.zip

And yes, it would be nice to have them named by the anchor. 😄

PS - had to zip them separately due to GitHub size limitations.

@vladsavelyev
Copy link
Member

vladsavelyev commented Dec 1, 2023

If you look at the HTML report, that plot is not built for the "raw" inputs as I guess the logic was that it doesn't make sense when all reads are of the same length:

Screenshot 2023-12-01 at 23 29 31

One can argue it still makes sense to save the data into the JSON. I don't have a strong opinion on that, would leave it as it is, but for readability I would indeed use anchors to make it exactly clear which of the groups that only plot belongs to.

@vladsavelyev
Copy link
Member

And yes, it would be nice to have them named by the anchor. 😄

I started this PR that uses anchors as suffixes instead of numbers: #2206
Will add that feature for other modules and then merge :)

@vladsavelyev vladsavelyev added core: back end and removed bug: core Bug in the main MultiQC code waiting: example data Needs example data before we can proceed labels Dec 1, 2023
@fgvieira
Copy link
Contributor Author

fgvieira commented Dec 1, 2023

Even though all reads are the same length, there is still data, so I'd expect it to be on the json. But not a big issues, if the other plots are there. 😄
However, these -1 plots do not show up on MegaQC; do you think it could be because of the name? Do you think by using the anchor as prefix it will show up?

@vladsavelyev
Copy link
Member

Yeah, agree that it makes sense to still save the data into JSON. I'll take a look!

Regarding MegaQC, good question - need to look into it.

@vladsavelyev vladsavelyev changed the title Data missing from JSON, when module run multiple times. Data not ends up in MegaQC when module was run multiple times Dec 1, 2023
@vladsavelyev
Copy link
Member

Wonder if you could create an issue in MegaQC repo?

@vladsavelyev vladsavelyev changed the title Data not ends up in MegaQC when module was run multiple times MegaQC doesn't handle data when module was run multiple times Dec 1, 2023
@fgvieira
Copy link
Contributor Author

fgvieira commented Dec 1, 2023

Sure, but it would be nice to test with these latest changes first, no?
Are you planning to merge the PR and release a new version soon?

@vladsavelyev
Copy link
Member

In a few weeks perhaps, yes. But I doubt that draft PR changes anything with regards to MegaQC

@fgvieira
Copy link
Contributor Author

fgvieira commented Dec 2, 2023

I was looking at the MegaQC code, and it seems that the JSON upload to the DB is actually made in MultiQC under multiqc/utils/megaqc.py. Could the problem be there?

@vladsavelyev
Copy link
Member

But multiqc/utils/megaqc.py that's exactly the code that makes multiqc_data/multiqc_data.json, and the *-1 plot data is present there (except for the seq len dist in your case, as pre-trimmed reads are all of the same length).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants