Skip to content

Commit

Permalink
Creating demo
Browse files Browse the repository at this point in the history
A demo, explaining the usage of mutacc is now availible.
  • Loading branch information
adrosenbaum committed Jun 24, 2019
1 parent 49d632d commit 01ce673
Show file tree
Hide file tree
Showing 17 changed files with 132 additions and 14 deletions.
40 changes: 40 additions & 0 deletions demo/case.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#Minimalistic example of case yaml file:

#case:
#The only required field for case is a 'case_id.
case:
case_id: 'demo_trio'

#samples
#A list of samples. For each of the samples a 'sample_id' must be given, together
#with pedigree information ('mother', 'father'). A list of fastq files and a bam file
#must be given. Each case can be filled with an arbitrary amount of meta data
#for each sample
samples:
- sample_id: 'child'
analysis_type: wgs
sex: male
phenotype: affected
mother: 'mother'
father: 'father'
bam_file: 'demo/child.bam'

- sample_id: 'father'
analysis_type: wgs
sex: male
phenotype: unaffected
mother: '0'
father: '0'
bam_file: 'demo/father.bam'

- sample_id: 'mother'
analysis_type: wgs
sex: female
phenotype: unaffected
mother: '0'
father: '0'
bam_file: 'demo/mother.bam'

#variant:
#'variant_id', 'chromosome', 'position', 'alt', 'ref' must be given.
variants: 'demo/variant1.vcf.gz'
Binary file added demo/child.bam
Binary file not shown.
Binary file added demo/child.bam.bai
Binary file not shown.
Binary file added demo/child_R1.fastq.gz
Binary file not shown.
Binary file added demo/child_R2.fastq.gz
Binary file not shown.
Binary file added demo/father.bam
Binary file not shown.
Binary file added demo/father.bam.bai
Binary file not shown.
Binary file added demo/father_R1.fastq.gz
Binary file not shown.
Binary file added demo/father_R2.fastq.gz
Binary file not shown.
Binary file added demo/mother.bam
Binary file not shown.
Binary file added demo/mother.bam.bai
Binary file not shown.
Binary file added demo/mother_R1.fastq.gz
Binary file not shown.
Binary file added demo/mother_R2.fastq.gz
Binary file not shown.
Binary file added demo/variant1.vcf.gz
Binary file not shown.
Binary file added demo/variant1.vcf.gz.tbi
Binary file not shown.
70 changes: 70 additions & 0 deletions docs/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# MutAcc Demo

To get an intuition over how MutAcc works, this demo provides a fast go through of the main features.

Files that can be used in this demo are found in the demo/ folder in the root of this repository. These files are simulated fastq, and bam-files that represents a father/mother/child trio. The child has a de-novo heterozygous SNV 7:117235031 G->A. This demo shows how this family can be uploaded into the mutacc database.

in this demo, the flag ```-d/--demo``` is used after the main command , i.e. ```mutacc --demo [subcommands] [options]```. With this flag, no configuration file will be necessary. This will create the mutacc root directory in the home directory of the user ```~/mutacc_root_demo/``` which can be removed after the demo is made. For this demo to work, it requires that a mongodb process is running on host 'localhost' and 27017. If mutacc is installed on a conda environment (which is recommended!) source the environment before starting this demo.

## Extract reads from case

Before this family can be uploaded into the database, the reads from this case must be extracted this is done with the following command

```terminal
mutacc --demo extract -case demo/case.yaml --padding 100
```

This will extract all reads spanning the variant position with an additional padding
of 100 bp on both sides. The variant reads will be placed in ```~/mutacc_root_demo/reads/demo_trio/<sample>/<date>/``` as fastq.gz files. There will also be a new json formatted file in ```~/mutacc_root_demo/imports/demo_trio_import_mutacc.json``` that can now be uploaded into the database.

## Upload case to the database

The upload is done with one simple command

```terminal
mutacc --demo db import ~/mutacc_root_demo/imports/demo_trio_import_mutacc.json
```

The information in the .json file will now get imported into the database

## Query the database

Congrats! there is now one case and one variant in the database. These can now be queried for
with the command

```terminal
mutacc --demo db export --case-query '{}' --member child --proband --sample-name child
```

Let's go through the options given in this example.

--case-query '{}': This option will take a json-formated string. When making queries
mutacc uses the [mongodb](https://docs.mongodb.com/manual/) query language, where all
queries are specified as json objects. giving the string '{}' which is an empty json object
will return all cases in the database. the user can also choose to query on variants with
the --variant-query option.

--member child: This specifies that we are interested in child samples. Other valid values
for this option are 'father', 'mother', and 'affected'

--proband: This flag specifies that the exported sample will be proband. This will come in handy when there are several cases in the database, where some are not father/mother/child-trios. With this flag activated, the samples will be queried even if it is not of type 'child' (Single sample cases do not have any sample marked as child).

--sample-name child: This allows the User to specify a name for the exported sample. In this case 'child'

First look in ```~/mutacc_root_demo/variants/child_variants.vcf```. Here a vcf has been created with all variants the are found in the query.

Another file ```~/mutacc_root_demo/queries/child_query_mutacc.json``` has also been created.
This file will be used to create our synthetic samples in the next section.

## Create datasets

To create a synthetic dataset, the user must chose a background that will be enriched
with the reads in our database. Let's try to enrich the fastq-files of the father with the reads of his child!

```terminal
mutacc --demo synthesize -b demo/father.bam -f demo/father_R1.fastq.gz demo/father_R2.fastq.gz --query ~/mutacc_root_demo/queries/child_query_mutacc.json
```

This will create two fastq files in ```~/mutacc_root_demo/datasets/```. These files now have the reads from the child spanning the variant region, and the reads from the father elsewhere.

Why is this useful? Imagine that the database contain a large number of clinical cases with known clinical variants. MutAcc can, by following the same workflow as above, create synthetic datasets by enriching well known genomic data with real clinical variants. These datasets can be used in validation of bionformatics pipelines that are used in clinical settings.
36 changes: 22 additions & 14 deletions mutacc/cli/root.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@
@click.option('--loglevel', default='INFO', type=click.Choice(LOG_LEVELS))
@click.option('-c', '--config-file', type=click.Path(exists=True))
@click.option('-r', '--root-dir', type=click.Path(exists=True))
@click.option('-d', '--demo', is_flag=True)
@click.version_option(__version__)
@click.pass_context
def cli(context, loglevel, config_file, root_dir):
def cli(context, loglevel, config_file, root_dir, demo):

coloredlogs.install(level = loglevel)

Expand All @@ -41,20 +42,27 @@ def cli(context, loglevel, config_file, root_dir):
with open(config_file, 'r') as in_handle:
cli_config = yaml.load(in_handle, Loader=yaml.FullLoader)


mutacc_config = {}
mutacc_config['host'] = cli_config.get('host') or 'localhost'
mutacc_config['port'] = cli_config.get('port') or 27017
mutacc_config['username'] = cli_config.get('username')
mutacc_config['password'] = cli_config.get('password')
mutacc_config['db_name'] = cli_config.get('database') or 'mutacc'

#Check the root_dir and add to mutacc_config
root_dir = cli_config.get('root_dir') or root_dir
if not root_dir:

LOG.warning('Please provide a root directory, through option --root-dir or in config_file')
context.abort()
if demo:
mutacc_config['host'] = 'localhost'
mutacc_config['port'] = 27017
mutacc_config['db_name'] = 'mutacc-demo'
root_dir = make_dir('~/mutacc_root_demo')

else:
mutacc_config['host'] = cli_config.get('host') or 'localhost'
mutacc_config['port'] = cli_config.get('port') or 27017
mutacc_config['username'] = cli_config.get('username')
mutacc_config['password'] = cli_config.get('password')
mutacc_config['db_name'] = cli_config.get('database') or 'mutacc'

#Check the root_dir and add to mutacc_config

root_dir = cli_config.get('root_dir') or root_dir
if not root_dir:

LOG.warning('Please provide a root directory, through option --root-dir or in config_file')
context.abort()

mutacc_config['root_dir'] = parse_path(root_dir, file_type = 'dir')

Expand Down

0 comments on commit 01ce673

Please sign in to comment.