|
1 | | -# 5. SARS-CoV-2 analysis {.unnumbered} |
| 1 | +# 5. Nextclade CLI {.unnumbered} |
2 | 2 |
|
3 | | -If you are dealing with SARS-Cov-2 data, then you can run the [pangolin software](https://github.com/cov-lineages/pangolin) to submit your SARS-CoV-2 genome sequences which then are compared with other genome sequences and assigned the most likely lineage. |
| 3 | +You can use the command line version of Nextclade [Nextclade](https://docs.nextstrain.org/projects/nextclade/en/stable/) to identify differences between query sequences and a reference sequence and to assign query sequences to clades. |
| 4 | + |
| 5 | +Nextclade can utilize official and community datasets which are maintained at [github.com/nextstrain/nextclade_data](github.com/nextstrain/nextclade_data). In addition, you could create your own dataset and use it with Nextclade. For more information on how to create your own dataset, visit [https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html) |
| 6 | + |
| 7 | +## 5.1 How to use nextclade {.unnumbered} |
| 8 | + |
| 9 | +Gather a list of all official and community datasets: |
| 10 | +``` |
| 11 | +nextclade dataset list |
| 12 | +``` |
| 13 | + |
| 14 | +Download an official or community dataset: |
| 15 | +``` |
| 16 | +nextclade dataset get --name '{dataset}' --output-dir '{output}' |
| 17 | +``` |
| 18 | + |
| 19 | +- `{dataset}` is the name of a dataset. |
| 20 | +- `{output}` is the location where the dataset will be downloaded. |
| 21 | + |
| 22 | +Run nextclade: |
| 23 | +``` |
| 24 | +nextclade run \ |
| 25 | +--input-dataset {dataset} \ |
| 26 | +--output-all={output}/ \ |
| 27 | +{sequences} |
| 28 | +``` |
| 29 | + |
| 30 | +- `{dataset}` is either an official/community or custom made dataset. |
| 31 | +- `{output}` is the folder where all the output files will be stored. |
| 32 | +- `{sequences}` is your fasta file with all of your consensus sequences. |
| 33 | + |
| 34 | +::: callout-important |
| 35 | +When using Nextclade, make sure that the reference sequence of the dataset is the exact same as your reference used to generate the consensus sequence from chapter 3. |
| 36 | +::: |
| 37 | + |
| 38 | +## 5.2 Custom nextclade visualisation {.unnumbered} |
| 39 | + |
| 40 | +If your Nextclade dataset contained a GFF3 annotation file for the reference sequence, then you can use the [viz_nextclade_cli.R](https://github.com/LucvZon/nanopore-amplicon-analysis-manual/tree/main/scripts) script to visualize the amino acid mutations per genetic feature. |
4 | 41 |
|
5 | 42 | Execute the following: |
6 | | -```bash |
7 | | -pangolin {input} --outfile {output} |
| 43 | +``` |
| 44 | +Rscript viz_nextclade_cli.R \ |
| 45 | +--nextclade-input-dir {input_dir} \ |
| 46 | +--json-file {input.json} \ |
| 47 | +--plotly-output-dir {plotly_output_dir} \ |
| 48 | +--ggplotly-output-dir {ggplotly_output_dir} |
8 | 49 | ``` |
9 | 50 |
|
10 | | -- `{input}` is your aggregated consensus fasta file from step X.X. |
11 | | -- `{output}` is a .csv file that contains taxon name and lineage assigned per fasta sequence. Read more about the output format: [https://cov-lineages.org/resources/pangolin/output.html](https://cov-lineages.org/resources/pangolin/output.html) |
| 51 | +- `{nextclade-input-dir}` is the output folder from the nextclade run (step 5.1). |
| 52 | +- `{json-file}` is the nextclade.json file that should be present in the output folder from the nextclade run (step 5.1). |
| 53 | +- `{plotly-output-dir}` html plots made with plotly. |
| 54 | +- `{ggplotly-output-dir}` html plots made with ggplotly. |
12 | 55 |
|
| 56 | +The plots will be generated for each genetic feature of the reference sequence. Currently, we output plotly and ggplotly versions, just use whichever looks best to you. |
13 | 57 |
|
14 | | -## To be added... |
| 58 | +## 5.3 Pangolin (redundant) |
15 | 59 |
|
16 | | -Here are some of the snakemake rules that are currently excluded: |
| 60 | +If you are dealing with SARS-Cov-2 data, then you can run the [pangolin software](https://github.com/cov-lineages/pangolin) to submit your SARS-CoV-2 genome sequences which then are compared with other genome sequences and assigned the most likely lineage. |
17 | 61 |
|
18 | | -- create_depth_file |
19 | | -- create_vcf |
20 | | -- annotate_vcf |
21 | | -- filter_vcf |
22 | | -- create_filtered_vcf_tables |
| 62 | +As Nextclade already performs pangolin classification step for you, it has become redundant to run this in addition to Nextclade. However, if for whatever reason you still want to run it manually, then execute the following: |
| 63 | +```bash |
| 64 | +pangolin {input} --outfile {output} |
| 65 | +``` |
23 | 66 |
|
24 | | -These rules are exclusively for analysis of SARS-Cov-2 data and will be implemented into the container workflow in the near future. |
| 67 | +- `{input}` is your aggregated consensus fasta file from step X.X. |
| 68 | +- `{output}` is a .csv file that contains taxon name and lineage assigned per fasta sequence. Read more about the output format: [https://cov-lineages.org/resources/pangolin/output.html](https://cov-lineages.org/resources/pangolin/output.html) |
25 | 69 |
|
26 | 70 | ::: {.callout-note} |
27 | 71 | You can now move to the final chapter to automate all of the steps we’ve previously discussed. |
|
0 commit comments