-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Element Biosciences AVITI Bases2fastq support to multiqc #1990
Conversation
af20cbb
to
cac9530
Compare
a08a9dc
to
ca2d356
Compare
Example MultiQC reports cc @YuheCheng62 Cloudbreak-DVT-ecoli-wgs-2x150 Cloudbreak-DVT-human-wgs-2x150 Cloudbreak-DVT-human-rnaseq-2x75 |
PR into MultiQC_TestData for supprting test data. cc @YuheCheng62 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very fast first-pass review to check some of the common gotchas that come up in PRs. A few tweaks to change here, I'll come back for a more thorough review soon. (Haven't even tried running it yet).
ok I still have 4 minutes before my next meeting so tried generating a report very quickly with the test data. Speed notes:
|
self.groupDict = dict() | ||
self.groupLookupDict = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should stick to the lower_case_with_underscores naming convention for variables and functions.
(and also note to self - this is something that can be checked automatically with linters)
plotContent = dict() | ||
for s_name in runData.keys(): | ||
runStats = dict() | ||
runStats.update({"#Polonies": runData[s_name]["NumPolonies"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Polonies
should be rounded to read or bases counts in the general stats table. See how it's done for the bcl2fastq module as a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for other metrics. Please take a look if similar metrics that are already reported in other modules, and try to use the same naming style, value formatting, and color scheme. E.g. bcl2fastq has Q30, yield, mean base quality. FastQC can be a reference for other metrics and plots.
return html, plotName, anchor, description, helptext, plotContent | ||
|
||
|
||
def plot_per_cycle_N_content(sampleData, groupLookupDict, colorDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the FastQC code for that plot be adapted here?
I tried to open the first example that you shared ( Overall, the module is a great addition, but I would ask to adjust the code style (particularly, Python's standard is to use lower_case_underscore naming for variables and functions), and to look at similar modules like fastqc bclf2fastq to reuse and adapt the existing code for similar plots and metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few change suggestions, and make sure to update the branch from master. Otherwise, good with me!
Co-authored-by: Vlad Savelyev <vladislav.sav@gmail.com>
Co-authored-by: Vlad Savelyev <vladislav.sav@gmail.com>
Co-authored-by: Vlad Savelyev <vladislav.sav@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff!
"format": "{d}", | ||
"description": "The (total) number of polonies calculated for the run", | ||
"min": 0, | ||
"scale": "RdYlGn", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just repeating some code comments that might be lost now :)
#Polonies
should be rounded to read or bases counts in the general stats table. See how it's done for the bcl2fastq module as a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for other metrics. Could you take a look if similar metrics that are already reported in other modules, and try to use the same naming style, value formatting, and color scheme. E.g. bcl2fastq has Q30, yield, mean base quality. FastQC can be a reference for other metrics and plots.
return html, plot_name, anchor, description, helptext, plot_content | ||
|
||
|
||
def plot_per_cycle_N_content(sample_data, group_lookup_dict, color_dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to adapt the FastQC code for that plot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes requested - see the code comments above!
ok, the remaining changes are very minor. I just tried to fix the merge conflict but couldn't push it back - because the fork is under Elembio (and not a user account), GitHub is a bit more strict about allowing us to push into the PR directly. To keep the momentum, I've created a new branch that we can work in and an associated PR: #2044 Hopefully we can do these final minor changes and get this merged soon 😄 |
Locking the conversation here so that we remember to move over to #2044 |
Actually I think I will just close this PR for clarity. |
CHANGELOG.md
has been updated--lint
flag) - linting in progress.docs/README.md
is updated with link to belowdocs/modulename.md
is createdself.add_section
)cc @YuheCheng62 @ajaltomare