generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 10
/
compose_installation.Rmd
535 lines (380 loc) · 22.3 KB
/
compose_installation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
---
output: rmarkdown::html_document
title: "How to run seqsender with Compose"
vignette: >
%\VignetteIndexEntry{How to run seqsender with Compose}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include=FALSE, echo=FALSE, message=FALSE, warning=FALSE}
# R libraries
library(yaml) # for yaml file
# Read in the DESCRIPTION file
description <- yaml::read_yaml("../DESCRIPTION")
# Define variables
program <- description$Package
# Define github repo
github_repo <- description$URL
# Define github pages URL
github_pages_url <- description$GITHUB_PAGES
```
<script type="text/javascript">
function ToggleOperation(id) {
var x = document.getElementById(id+"-block");
if (x.style.display === "none") {
x.style.display = "block";
} else {
x.style.display = "none";
}
}
</script>
**SOFTWARE REQUIREMENTS:**
- Linux (64-bit) or Mac OS X (64-bit)
- Git version 2.25.1 or later
- Docker version 20.10.14 or later
- Docker Compose version 2.21 or later
- Standard utilities: curl, tar, unzip
**ADDITIONAL REQUIREMENTS:**
See [PRE-REQUISITES](`r github_pages_url`/index.html#prerequisites) and [REQUIREMENT FILES](`r github_pages_url`/index.html#requirement-files) before proceeding to the next steps
### (1) Clone ``r program`` repo to your $HOME directory
``` bash
cd $HOME
git clone `r github_repo`.git
```
### (2) Navigate to ``r program`` folder where `docker-compose.yml` is stored and edit that file to link the data inputs to run ``r program``
``` bash
cd `r program`
```
Here is a quick look of the `docker-compose.yaml` file:
```bash
version: "3.9"
x-data-volumes:
&data-volume
type: bind
source: $HOME
target: /data
services:
seqsender:
container_name: seqsender
image: cdcgov/seqsender-dev:latest
restart: always
volumes:
- *data-volume
command: tail -f /dev/null
```
_**NOTE:** `source` is the storage location of your local machine. This location will be mapped to `/data` directory inside the container. Here we are mounting the local `$HOME` directory to `/data` inside the container._
### (3) Start up the ``r program`` container
```bash
docker-compose up -d
```
**`-d`**: run the container in detached mode <br>
For more information about the docker-compose syntax, see <a href="https://docs.docker.com/engine/reference/commandline/compose_up/" target="_blank">docker-compose up reference</a>
### (4) Check if the container is running
``` bash
docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b37b6b19c4e8 `r program`:latest "/bin/bash" 5 hours ago Up 5 hours `r program`
```
### (5) See a list of commands in ``r program`` container
``` bash
docker exec -it `r program` bash `r program`-kickoff --help
```
**`-t`**: allocate a pseudo-tty <br> **-i**: keep STDIN open even if not attached <br>
**`-h`**, **`--help`**: show help messages and exit
``` bash
usage: `r program`.py [-h]
{prep,submit,check_submission_status,template,version} ...
Automate the process of batch uploading consensus sequences and metadata to
databases of your choices
positional arguments:
{prep,submit,check_submission_status,template,version}
optional arguments:
-h, --help show this help message and exit
```
Rather than hastily jump in and submit a `production` submission right away, we can utilize GISAID's and NCBI's **“TEST-SERVER”** to upload a `test` submission first. That way submitter can familiarize themselves with the submission process prior to make a real submission.
**Note:** Duplicate test submissions will result in an error. Please create new sequence names each time you plan to run test submissions to avoid this issue.
### <a onclick="ToggleOperation('projects')"><i class="fas fa-play" role="presentation" aria-label="play icon"></i> Submit a `test` submission with a pre-processed dataset</a>
<div id="projects-block" style="display: none;">
Here we will go over the steps of preparing and batch uploading meta- and sequence-data to GISAID and NCBI databases using a pre-processed dataset provided with the software.
The `template` command will allow you to output examples of metadata and config files so you can base your submission on prior to upload a real submission. To get more help on the command, run
```bash
docker exec -it seqsender bash seqsender-kickoff template --help
```
```bash
usage: seqsender.py template [-h] [--biosample] [--sra] [--genbank] [--gisaid]
--organism {FLU,COV} --submission_dir
SUBMISSION_DIR --submission_name SUBMISSION_NAME
Return a set of files (e.g., config file, metadata file, fasta files, etc.)
that are needed to make a submission
optional arguments:
-h, --help show this help message and exit
--biosample, -b Submit to BioSample. (default: )
--sra, -s Submit to SRA. (default: )
--genbank, -n Submit to Genbank. (default: )
--gisaid, -g Submit to GISAID. (default: )
--organism {FLU,COV} Type of organism data (default: FLU)
--submission_dir SUBMISSION_DIR
Directory to where all required files (such as
metadata, fasta, etc.) are stored (default: None)
--submission_name SUBMISSION_NAME
Name of the submission (default: None)
```
<br>
#### 1. Download the pre-processed meta- and sequence-data
```bash
docker exec -it seqsender bash seqsender-kickoff template \
--organism FLU \
-bsng \
--submission_dir /data \
--submission_name flu-test-submission
```
- **`--organism`** specifies the type of data to download. Currently, **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) are the only two options. Additional datasets for other organisms will be provided in future updates or requests. <br>
- **`-bsng`** is a combination flag of databases: **Biosample** *(`-b` or `--biosample`)*, **SRA** *(`-s` or `--sra`)*, **Genbank** *(`-n` or `--genbank`)*, and **GISAID** *(`-g` or `--gisaid`)*. This combination flag tells ``r program`` to generate an unified meta- and sequence-data into one file so we can perform batch upload to all databases simultaneously. <br>
- **`--submission_dir`** is the directory where you store all of the submission histories (e.g. `/data` -> our `$HOME` directory). <br>
- **`--submission_name`** is the submission folder inside the `--submission_dir` directory where it contains all necessary files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) in order to make a submission.
A quick look at the output files:
![](images/submission_dir.png)
Here is the standard out of the command.
```bash
Generating submission template
Files are stored at: /data/flu-test-submission
Total runtime (HRS:MIN:SECS): 0:00:00.115140
```
#### 2. Set up the config file -- `config.yaml`
After the template is downloaded in `(1)`, you can find `config.yaml` in your local `$HOME/flu-test-submission` directory. The `config.yaml` yaml file provides a brief description about the submission and contains user credentials that allow ``r program`` to authenticate the database prior to upload a submission.
Open that file with a text editor of your choice and fill in the appropriate information about your submission.
![](images/config_file.png)
:::{style="padding: 10px; border: 1px solid blue !important;"}
<i class="fas fa-triangle-exclamation" role="presentation" aria-label="triangle-exclamation icon"></i> **NOTE:** <br>
- To submit to NCBI only, one can remove the **GISAID Submission (b)** section from the config file. Vice versa, to submit to GISAID only, just remove the **NCBI Submission (a)** section. <br>
- **Submission_Position** determines the order of the database in which we will submit first. For instance, if GISAID is set as `1`, ``r program`` will submit to GISAID first, then after all samples are assigned with a GISAID accession number, ``r program`` will proceed to submit to NCBI. This order of submission ensures samples are linked correctly between the two databases after submission. <br>
- **Username** and **Password** under the **NCBI Submission (b)** section are the credentials used to authenticate the **NCBI FTP Server** (not to mistake with individual NCBI account). See [PRE-REQUISITES](`r github_pages_url`/articles/index.html#prerequisites) for more details.
:::
:::{style="padding: 10px; border: 1px solid blue !important;"}
<i class="fas fa-triangle-exclamation" role="presentation" aria-label="triangle-exclamation icon"></i> **ADDITIONAL REQUIREMENTS:** <br>
- If **SRA** is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called `raw_reads` inside your submission directory of choice.
- If **GISAID** is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, <a href="images/fluCLI_download.png" target="_blank">**Influenza A Virus** (FLU)</a> or <a href="images/covCLI_download.png" target="_blank">**SARS-COV-2** (COV)</a>) from the GISAID platform and stored them in a subfolder called `gisaid_cli` inside your submission directory of choice.
A quick look of where to store the downloaded **GISAID CLI** package,
![](images/gisaid_cli_dir.png)
_**Important:** Make sure you binary CLI package are executable. To allow executable permissions, run_
```bash
chmod a+x <your_gisaid_cli_binary>
```
:::
<br>
#### 3. Upload a test submission
```bash
docker exec -it seqsender bash seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir /data \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test
```
- **`--organism`** specifies the type of data to upload. Currently, **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) are the only two options. <br>
- **`-bsng`** is a combination flag of databases: **Biosample** *(`-b` or `--biosample`)*, **SRA** *(`-s` or `--sra`)*, **Genbank** *(`-n` or `--genbank`)*, and **GISAID** *(`-g` or `--gisaid`)*. This combination flag tells ``r program`` to prep and submit to each given database. See `docker exec -it seqsender bash seqsender-kickoff submit --help` for more details. <br>
- **`--submission_dir`** is the directory where you store all of the submission histories (e.g. `/data` -> our `$HOME` directory). <br>
- **`--submission_name`** is the submission folder inside the `--submission_dir` directory where it contains all necessary files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) in order to make a submission. <br>
- **`--config_file`** is the config file inside the `--submission_name` directory. <br>
- **`--metadata_file`** is the metadata file inside the `--submission_name` directory. <br>
- **`--fasta_file`** is the fasta file inside the `--submission_name` directory. <br>
- **`--test`** is used to submit to **“TEST-SERVER ONLY”** . For `production` submission, please remove this flag.
A quick look at the standard output.
```bash
Creating submission files for BIOSAMPLE
Files are stored at: /data/flu-test-submission/submission_files/BIOSAMPLE
Creating submission files for SRA
Files are stored at: /data/flu-test-submission/submission_files/SRA
Creating submission files for GENBANK
Files are stored at: /data/flu-test-submission/submission_files/GENBANK
Creating submission files for GISAID
Files are stored at: /data/flu-test-submission/submission_files/GISAID
Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.
Submission attempt: 1
Uploading successfully
Status report is stored at: /data/flu-test-submission/submission_report_status.csv
Log file is stored at: /data/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt
```
#### 4. Check the status of a submission
After a submission is submitted, you can routinely check the status of the submission.
```bash
docker exec -it seqsender bash seqsender-kickoff check_submission_status \
--organism FLU \
--submission_dir /data \
--submission_name flu-test-submission \
--test
```
- **`--organism`** specifies the type of data. Currently, **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) are the only two options. <br>
- **`--submission_dir`** is the directory where you store all of the submission histories. <br>
- **`--submission_name`** is the submission folder inside the `--submission_dir` directory where it contains all necessary files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) in order to make a submission. <br>
- **`--test`** is used to submit to **“TEST-SERVER ONLY”** . For `production` submission, please remove this flag.
Here is a quick look at the standard output:
```bash
Checking submission status for:
Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test
Submission database: GISAID
Submission status: processed-ok
Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted
Submission database: SRA
Pulling down report.xml
Submission status: submitted
Submission database: GENBANK
Submission status: ---
Total runtime (HRS:MIN:SECS): 0:00:08.213955
```
Here is a list of submission statuses and its meanings:
> 1. If at least one action has **Processed-error**, submission status is **Processed-error** <br>
> 2. Otherwise if at least one action has **Processing** state, the whole submission is **Processing** <br>
> 3. Otherwise, if at least one action has **Queued** state, the whole submission is **Queued** <br>
> 4. Otherwise, if at least one action has **Deleted** state, the whole submission is **Deleted** <br>
> 5. If all actions have **Processed-ok**, submission status is **Processed-ok** <br>
> 6. Otherwise submission status is **Submitted**
<br>
</div>
### <a onclick="ToggleOperation('chemicals')"><i class="fas fa-play" role="presentation" aria-label="play icon"></i> Submit a `test` submission with your own dataset</a>
<div id="chemicals-block" style="display: none;">
Before you can perform a `test` submission with your own dataset, make sure you have the required files (such as **config.yaml**, **metadata.csv**, **sequence.fasta**, **raw reads**, etc.) already prepared and stored in the submission directory of your choice.
<br>
#### 1. Assemble your meta- and sequence-data
(a) To prep for FLU submissions, select one of the databases below for more details
> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
> <a href="`r github_pages_url`/articles/gisaid_flu_required_fields.html" target="_blank">GISAID</a> <br>
> <a href="`r github_pages_url`/articles/multiple_databases_submission.html" target="_blank">Multiple databases</a>
(b) To prep for COV submissions, select one of the databases below for more details
> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">GISAID</a> <br>
> <a href="`r github_pages_url`/articles/multiple_databases_submission.html" target="_blank">Multiple databases</a>
After you have finished prepping for your database of choices in `(a)` or `(b)`, create a submission folder and store all your metadata and sequence files there.
Here is a quick look at the folder structure
![](images/submission_dir.png)
Finally, make sure additional requirements below are met before you can proceed to the next steps.
:::{style="padding: 10px; border: 1px solid blue !important;"}
- If **SRA** is in your list of submitting databases, the raw reads for all samples must be provided and stored in a subfolder called `raw_reads` inside your submission directory of choice.
- If **GISAID** is in your list of submitting databases, download the CLI package that associated with your organism of interest (e.g, <a href="images/fluCLI_download.png" target="_blank">**Influenza A Virus** (FLU)</a> or <a href="images/covCLI_download.png" target="_blank">**SARS-COV-2** (COV)</a>) from the GISAID platform and stored them in a subfolder called `gisaid_cli` inside your submission directory of choice.
Here is an example of where to place the **GISAID CLI** package.
![](images/gisaid_cli_dir.png)
_**Important:** Make sure you binary CLI package are executable. To allow executable permissions, run_
```bash
chmod a+x <your_gisaid_cli_binary>
```
:::
<br>
#### 2. Upload a test submission
After all files are (i) are prepared, we can go ahead and upload the submission
```bash
docker exec -it seqsender bash seqsender-kickoff submit \
--organism FLU \
-bsng \
--submission_dir /data \
--submission_name flu-test-submission \
--config_file config.yaml \
--metadata_file metadata.csv \
--fasta_file sequence.fasta \
--test
```
- **`--organism`** specifies the type of data to upload. Currently, **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) are the only two options. <br>
- **`-bsng`** is a combination flag of databases: **Biosample** *(`-b` or `--biosample`)*, **SRA** *(`-s` or `--sra`)*, **Genbank** *(`-n` or `--genbank`)*, and **GISAID** *(`-g` or `--gisaid`)*. This combination flag tells ``r program`` to prep and submit to each given database. See `docker exec -it seqsender bash seqsender-kickoff submit --help` for more details. <br>
- **`--submission_dir`** is the directory where you store all of the submission histories (e.g. `/data` -> our `$HOME` directory). <br>
- **`--submission_name`** is the submission folder inside the `--submission_dir` directory where it contains all necessary files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) in order to make a submission. <br>
- **`--config_file`** is the config file inside the `--submission_name` directory. <br>
- **`--metadata_file`** is the metadata file inside the `--submission_name` directory. <br>
- **`--fasta_file`** is the fasta file inside the `--submission_name` directory. <br>
- **`--test`** is used to submit to **“TEST-SERVER ONLY”** . For `production` submission, please remove this flag.
A quick look at the standard output.
```bash
Creating submission files for BIOSAMPLE
Files are stored at: /data/flu-test-submission/submission_files/BIOSAMPLE
Creating submission files for SRA
Files are stored at: /data/flu-test-submission/submission_files/SRA
Creating submission files for GENBANK
Files are stored at: /data/flu-test-submission/submission_files/GENBANK
Creating submission files for GISAID
Files are stored at: /data/flu-test-submission/submission_files/GISAID
Uploading submission files to NCBI-BIOSAMPLE
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to NCBI-SRA
Performing a 'Test' submission
If this is not a 'Test' submission, interrupts submission immediately.
Connecting to NCBI FTP Server
Submission name: flu-test-submission
Submitting 'flu-test-submission'
Uploading submission files to GISAID-FLU
Performing a 'Test' submission with Client-Id: TEST-EA76875B00C3
If this is not a 'Test' submission, interrupts submission immediately.
Submission attempt: 1
Uploading successfully
Status report is stored at: /data/flu-test-submission/submission_report_status.csv
Log file is stored at: /data/flu-test-submission/submission_files/GISAID/gisaid_upload_log_attempt_1.txt
```
#### 3. Check the status of a submission
After a submission is submitted, you can routinely check the status of the submission.
```bash
docker exec -it seqsender bash seqsender-kickoff check_submission_status \
--organism FLU \
--submission_dir /data \
--submission_name flu-test-submission \
--test
```
- **`--organism`** specifies the type of data. Currently, **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) are the only two options. <br>
- **`--submission_dir`** is the directory where you store all of the submission histories (e.g. `/data` -> our `$HOME` directory). <br>
- **`--submission_name`** is the submission folder inside the `--submission_dir` directory where it contains all necessary files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) in order to make a submission. <br>
- **`--test`** is used to submit to **“TEST-SERVER ONLY”** . For `production` submission, please remove this flag.
Here is a quick look at the standard output:
```bash
Checking submission status for:
Submission name: flu-test-submission
Submission organism: FLU
Submission type: Test
Submission database: GISAID
Submission status: processed-ok
Submission database: BIOSAMPLE
Pulling down report.xml
Submission status: submitted
Submission database: SRA
Pulling down report.xml
Submission status: submitted
Submission database: GENBANK
Submission status: ---
Total runtime (HRS:MIN:SECS): 0:00:08.213955
```
Here is a list of submission statuses and its meanings:
> 1. If at least one action has **Processed-error**, submission status is **Processed-error** <br>
> 2. Otherwise if at least one action has **Processing** state, the whole submission is **Processing** <br>
> 3. Otherwise, if at least one action has **Queued** state, the whole submission is **Queued** <br>
> 4. Otherwise, if at least one action has **Deleted** state, the whole submission is **Deleted** <br>
> 5. If all actions have **Processed-ok**, submission status is **Processed-ok** <br>
> 6. Otherwise submission status is **Submitted**
<br>
</div>
<br><br><br>
Any questions or issues? Please report them on our <a href="`r github_repo`/issues" target="_blank">Github issue tracker</a>.
<br>