Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Deprecate nextclade json parser, add github actions, and update task names #2

Merged
merged 37 commits into from
Jul 1, 2024

Conversation

danpolanco
Copy link
Collaborator

@danpolanco danpolanco commented Jun 2, 2024

This PR:

Aim, context, and functionality 🎯

We are removing the nextclade json parser in favor of including the Nextclade results in the summary file. We are also improving WDL task names and variables.

Workflow Changes ✅

Upstream effects

None.

Input changes

None.

Output changes

  • nextclade_json_parser.py has been removed and the the output file it produced as well.
  • summary.py now includes the columns needed that were originally output by nextclade_json_parser.py.

Downstream effects

None yet as we haven't included RSV results in production datasets or Horizon.

Testing 🛠️

None at the moment.

Developer Checklist 👷‍♀️

  • Prior to development, issues were discussed with the bioinformatics team members and approved
  • Code has been refactored to sufficiently address the issues this pull request closes
  • Testing was performed and the results from testing match the expected results
  • All code changes match our style guide
  • README has been updated to reflect changes
  • Workflow diagrams in READMEs have been updated to reflect changes

Reviewer Checklist 🔍

  • Met with developer to review all changes and testing performed
  • Code refactoring sufficiently address the issue(s) that this pull request closes
  • All code meets style guide criteria
  • Confirm testing was sufficient (e.g. correct dataset was used for testing, results match the expected results). If not, the developer will perform additional testing which should be documented in the testing section above.
  • New version release number has been decided upon (if applicable)

@danpolanco danpolanco force-pushed the feat/deprecate_nextclade_json_parser/dlp branch from 23e7e4d to 48f0d4a Compare June 7, 2024 18:27
@danpolanco danpolanco changed the title Feat/deprecate nextclade json parser/dlp [Feat] deprecate nextclade json parser Jun 7, 2024
def make_wgs_horizon_output(results_df: pd.DataFrame, project_name: str) -> None:
"""Make wgs horizon report."""
results_df["report_to_epi"] = ""
results_df["Run_Date"] = str(date.today())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be "Run_Date" instead of "run_date"?

ref_genome_length = len(ref_genome.seq)

Ns = 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added initialization to prevent mypy type errors. Also, good practice to declare it in the outer scope here since we use it outside the following if statements.

num_non_ambiguous_bases = np.NaN
coverage = np.NaN
aligned_bases = np.NaN # type: ignore
Ns = np.NaN # type: ignore
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to handle the type issues where we assign some_variable = np.NaN. I get an error from mypy error: Incompatible types in assignment (expression has type "float", variable has type "int").

@danpolanco
Copy link
Collaborator Author

This pull request now also partially addresses #3 where mypy complains about type issues.

@danpolanco danpolanco changed the title [Feat] deprecate nextclade json parser [Feat] Deprecate nextclade json parser, add github actions, and update task names Jun 7, 2024
@@ -17,156 +18,39 @@ workflow RSV_illumina_pe_summary {
File concat_seq_results_py
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be concatenate_seq_results_py or maybe should we shorten it everwhere?

@danpolanco danpolanco removed the request for review from molly-hetheringtonrauth June 12, 2024 21:31
@danpolanco danpolanco marked this pull request as ready for review June 14, 2024 19:25
@danpolanco
Copy link
Collaborator Author

@molly-hetheringtonrauth , I tested this in Terra and it works and results match what we expect. I don't think we need a individual meeting about these changes and can just merge them in if that works for you. I'll be making more changes after this pull request to add versioning and fix any Horizon issues anyway.

@danpolanco danpolanco merged commit b022c16 into main Jul 1, 2024
1 check passed
@danpolanco danpolanco deleted the feat/deprecate_nextclade_json_parser/dlp branch July 1, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant