Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiqc #564

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4b6e821
Merge pull request #530 from Clinical-Genomics/release-v3.3.0
robinandeer Aug 16, 2017
3160026
Adds function to check if mongod is running
Aug 17, 2017
89da0ee
Updates documentation and fix small issue
Aug 17, 2017
9251aea
fix issue when starting scout without chanjo report installed
robinandeer Aug 17, 2017
2141b86
update how to handle updating a gene panel using a CSV file
robinandeer Aug 17, 2017
bb8c38a
update version
robinandeer Aug 18, 2017
64bce2c
Updates docs with loading of institute and user
Aug 18, 2017
5fb7a13
Updates load commands, fix problem when analysis date is missing in l…
Aug 18, 2017
5569295
Updates documentation and adds docstring to parse_ped
Aug 18, 2017
2a4932d
Adds documentation on intsitutes and users
Aug 18, 2017
f86f251
Fixes so that scout can display variants without gene information
Aug 18, 2017
532628d
Merge pull request #569 from Clinical-Genomics/update-docs-setup
Aug 18, 2017
8b15863
Adds admin guide on annotations
Aug 21, 2017
f547754
Updates genes view to display information for both builds
Aug 21, 2017
cd7650b
Adds document on genes to docs
Aug 22, 2017
9729f8d
Updates test for new gene functionality
Aug 22, 2017
74d17e0
Merge pull request #572 from Clinical-Genomics/view-38-genes
Aug 22, 2017
5218b60
add closing form tag to variant template, fix #573
robinandeer Aug 24, 2017
073b9ad
Fixes problem when unmarking causative
Aug 24, 2017
9309701
Merge pull request #578 from Clinical-Genomics/test-mark-causative
Aug 24, 2017
4f90000
Honor CLI log level when starting app, unless in DEBUG mode.
dnil Aug 29, 2017
8a0421f
Fix CLINSIG filter
dnil Aug 29, 2017
6ee2396
Merge pull request #580 from dnil/clinsig_clinical_filter
Aug 30, 2017
4a20567
order ACMG categories to work in < Python 3.6, fix #574
robinandeer Aug 30, 2017
209a803
allow users to toggle selected acmg classification, fix #574
robinandeer Aug 30, 2017
1917e2a
allow users to filter by hgnc id, fix #579
robinandeer Aug 30, 2017
3974094
update title on inheritance/penetrace table, fix #568
robinandeer Aug 30, 2017
92dbd20
handle matching OMIM inheritance, fix #566
robinandeer Aug 30, 2017
4ccfdc4
add pedhep to cohort tags, fix #558
robinandeer Aug 30, 2017
80fe1e5
store info about previous analyses on case
robinandeer Aug 30, 2017
bb72daa
list previous analysis with links to delivery reports, fix #550
robinandeer Aug 30, 2017
7397c8b
add endpoint for displaying multiqc report for a case
robinandeer Jul 20, 2017
344139a
add support in config for adding multiqc when loading/updating a case
robinandeer Jul 20, 2017
cd51d67
update artwork
robinandeer Jul 25, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file modified artwork/logo.sketch
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/README.md
Expand Up @@ -13,4 +13,4 @@

<p align="center"><img height="400" width="800" src="img/v3.jpg"></p>

**New entry in the blog**: [What's new in 3.1?](blog/new-3.1.md)
**New entry in the blog**: [What's new in 3.3?](blog/new-3.3.md)
4 changes: 4 additions & 0 deletions docs/SUMMARY.md
Expand Up @@ -13,6 +13,8 @@
* [Getting Started](user-guide/getting-started.md)
* [General Usage](user-guide/using-scout.md)
* [Pages](user-guide/pages.md)
* [Institutes](user-guide/institutes.md)
* [Users](user-guide/users.md)
* [Cases](user-guide/cases.md)
* [Variants](user-guide/variants.md)
* [Gene Panels](user-guide/panels.md)
Expand All @@ -21,6 +23,8 @@

* [Admin Guide](admin-guide/README.md)
* [Loading](admin-guide/loading.md)
* [Load config](admin-guide/load-config.md)
* [Annotations](admin-guide/annotations.md)

* [Blog](blog/README.md)
* [What's new in 3.0?](blog/new-3.0.md)
Expand Down
140 changes: 140 additions & 0 deletions docs/admin-guide/annotations.md
@@ -0,0 +1,140 @@
# Annotations

Scout is a primarily a visualisation tool with some other functionality. One could imagine that in the future, some or all annotations could be performed by Scout. For now scout will look for some known keys when uploading a VCF and extract information for those. [VEP][vep] is the tool supported for functional and regional annotations at the moment, [SnpEff][snpeff] will be added in the near future. For the other types of annotations Scout will look for certain keys in the INFO field of the vcf and expect the value to be of a specific type. This means that there is not a dependency on any other specific annotation tool besides VEP, just make sure that the key and values are correct according to the specification below.

### Rank score

One of the hard problem when dealing with whole genome data is the huge amount variants that are generated in every analysis. Scout was developed to be used in rare variant analysis, this means that there is ony a small number of variants that are actually interesting to look at. We do not want to store all variants from each case in a database that should be able to controll thousands of cases. To solve this problem we are working with rank scores, each variant is scored according to a scoring schema then we only upload and sort the variants based on their rank score. In this way the users can start by looking at the variants that looks potentially most dangerous from a bioinformatic perspective. We use the tool [genmod][genmod] to (among other things) score the variant, but as long as there is a `RankScore`-field in the `INFO` field of the VCF with a float as value it is handeled by Scout.

## Annotation keys and tool suggestions

In this section all the different annotation keys and suggestions of tools that can be used to annotate them are listed.

### Frequencies

#### 1000G ####

The frequency from the [1000G][1000g] population database.

- Key: `1000G`
- Value: `Float`
- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno]

#### 1000G_MAX_AF ####

The maximum allele frequency of all populations in the [1000G][1000g] population database.

- Key: `1000G_MAX_AF`
- Value: `Float`
- Tools: custom made, we have modified the 1000G file and use [genmod][genmod]

#### ExAC ####

The frequency from the [ExAC][exac] population database.

- Key: `EXACAF`
- Value: `Float`
- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno]


#### ExAC_MAX_AF ####

The maximum allele frequency of all populations [ExAC][exac] population database.

- Key: `EXAC_MAX_AF`
- Value: `Float`
- Tools: custom made, we have modified the exac file and use [genmod][genmod]

### Severity ###

#### CADD score ####

The Combined Annotation Dependent Depletion([CADD][cadd]) score. A prediction of the deleterioussness for a variant.

- Key: `CADD` or `cadd` in VEP `CSQ` field
- Value: `Float`
- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno]

#### SIFT ####

The [SIFT][sift]) prediction for how a variation affects the protein.

- Key: `CSQ`-`SIFT`
- Value: `String`
- Tools: [VEP][vep]

#### PolyPhen ####

The [PolyPhen][polyphen]) prediction for how a variation affects the protein.

- Key: `CSQ`-`PolyPhen`
- Value: `String`
- Tools: [VEP][vep]


#### Rank score ####

The combined rank score for a variant

- Key: `RankScore`
- Value: `Float`
- Tools: [genmod][genmod]


### Conservation ###

#### Gerp ####

The Genomic Evolutionary Rate Profiling([GERP][gerp]) conservation string. An estimation of how conserved this position is.

- Key: `GERP++_RS_prediction_term`
- Value: `String`
- Tools: [SnpSift][snpsift]

#### phastCons ####

The [PHASTcons][phastcons] conservation string.

- Key: `phastCons100way_vertebrate_prediction_term`
- Value: `String`
- Tools: [SnpSift][snpsift]

#### phylop ####

The [phylop][phylop] 100 way predicted conservation string.

- Key: `phyloP100way_vertebrate_prediction_term`
- Value: `String`
- Tools: [SnpSift][snpsift]

### Inheritance ###

#### Genetic models ####
What genetics models are followed for the variant in the particular family

- Key: `GeneticModels`
- Value: list of `String`
- Tools: [genmod][genmod]

#### Autosomal Recessive Compounds ####
What variants is this variant in Autosomal Recessive Compound with?

- Key: `Compounds`
- Value: list of `String`
- Tools: [genmod][genmod]


[vep]: http://www.ensembl.org/info/docs/tools/vep/index.html
[snpeff]: http://snpeff.sourceforge.net/about.html
[genmod]: https://github.com/moonso/genmod
[vcfanno]: https://github.com/brentp/vcfanno
[snpsift]: http://snpeff.sourceforge.net/SnpSift.html

[1000g]: http://www.1000genomes.org/
[exac]: http://exac.broadinstitute.org
[cadd]: http://cadd.gs.washington.edu
[gerp]: http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html
[phastcons]: http://compgen.cshl.edu/phast/
[phylop]: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons100way
[sift]: http://sift.jcvi.org
[polyphen]: http://genetics.bwh.harvard.edu/pph2/dokuwiki/
99 changes: 99 additions & 0 deletions docs/admin-guide/load-config.md
@@ -0,0 +1,99 @@
# The load config

Scout have the possibility to store loads of information about a case and the samples that are included. It is cumbersome to specify to many parameters on the command line so there is an option to give this information in a yaml formated config file.
Here we can give scout some meta information about the analysis, how it was performed, information about family, samples etc.

The basic structure of a load config looks like:


```yaml
owner: str(mandatory)

family: str(mandatory)
samples:
- analysis_type: str(optional), [wgs,wes]
sample_id: str(mandatory)
capture_kit: str(optional)
father: str(mandatory)
mother: str(mandatory)
sample_name: str(mandatory)
phenotype: str(mandatory), [affected, unaffected, unknown]
sex: str(mandatory), [male, female, unknown]
expected_coverage: int(mandatory)

vcf_snv: str(optional)
vcf_sv: str(optional)
vcf_cancer: str(optional)
vcf_snv_research: str(optional)
vcf_sv_research: str(optional)
vcf_cancer_research: str(optional)

madeline: str(optional)

peddy_ped: str(optional)
peddy_ped_check: str(optional)
peddy_sex_check: str(optional)

default_gene_panels: list[str](optional)
gene_panels: list[str](optional)

# meta data
rank_model_version: float(optional)
rank_score_threshold: float(optional)
analysis_date: datetime(optional)
human_genome_build: str(optional)
```

Let's go through each field:

- **owner** each case has to have a owner, this refers to an existing institute in the scout instance
- **family** each case has to have a family id
- **samples** list of samples included in the case
- *analysis_type* specifies the analysis type for the sample
- *samlple_id* identifyer for a sample
- *capture_kit* for exome specifies the capture kit
- *father* sample id for father or 0
- *mother* sample id for mother or 0
- *phenotype* specifies the affection status of the sample in human readable format
- *sex* specifies the sex of the sample in human readable format
- *expected_coverage* the level of expected coverage
- **vcf_snv** path to snv vcf file
- **vcf_sv**
- **vcf_snv_research** path to vcf file with all variants
- **vcf_sv_research**
- **vcf_cancer**
- **vcf_cancer_research**
- **madeline** path to a madeline pedigree file in xml format
- **peddy_ped** path to a [peddy](https://github.com/brentp/peddy) ped file with an analysis of the pedigree based on variant information
- **peddy_ped_check** path to a [peddy](https://github.com/brentp/peddy) ped check file
- **peddy_sex_check** path to a [peddy](https://github.com/brentp/peddy) ped check file
- **default_gene_panels** list of default gene panels. Variants from the genes in the gene panels specified will be shown when opening the case in scout
- **gene_panels** list of gene panels. This will specify what panels the case has been run with
- **rank model version** which rank model that was used when scoring the variants
- **rank_score_treshold** only include variants with a rank score above this treshold
- **analysis_date** time for analysis in datetime format. Defaults to time of uploading
- **human_genome_build** what genome version was used.

### Minimal config

Here is an example of a minimal load config:

```yaml
---

owner: cust004

family: '1'
samples:
- analysis_type: wes
sample_id: NA12878
capture_kit: Agilent_SureSelectCRE.V1
father: 0
mother: 0
sample_name: NA12878
phenotype: affected
sex: male
expected_coverage: 30

vcf_snv: scout/demo/643594.clinical.vcf.gz
```
49 changes: 14 additions & 35 deletions docs/admin-guide/loading.md
@@ -1,45 +1,24 @@
#Loading Scout

When loading a case into scout it is possible to use either a config file or to specify parameters on the command line.
## Institute

## Scout Load Config
To load a institute into scout use the command `scout load institute`. As mentioned in the user guide an [institute](../user-guide/institutes.md) has to have a unique internal id, this is specified on the command line with `-i/--internal-id`. Also a display name could be used if there is a need for that, specify with `-d/--display-name`. If no display name is choosen it will default to internal id.
Note that internal id is unique.

The loading config is a `.yaml` file and can include all the necessary information to scout. Command line options will overload information in the config file.
## User

The config file has the following specification:
To load a user into scout use the command `scout load user`. A user has to:

```yaml
owner: str(mandatory)
- belong to an *institute*
- have a *name*
- have a *email adress*

family: str(mandatory)
samples:
- analysis_type: str(optional), [wgs,wes]
sample_id: str(mandatory)
capture_kit: str(optional)
father: str(mandatory)
mother: str(mandatory)
sample_name: str(mandatory)
phenotype: str(mandatory), [affected, unaffected, unknown]
sex: str(mandatory), [male, female, unknown]
expected_coverage: int(mandatory)

vcf_snv: str(optional)
vcf_sv: str(optional)
vcf_cancer: str(optional)
vcf_snv_research: str(optional)
vcf_sv_research: str(optional)
vcf_cancer_research: str(optional)

madeline: str(optional)
default_gene_panels: list[str](optional)
gene_panels: list[str](optional)
## Case
When loading a case into scout it is possible to use either a config file or to specify parameters on the command line.

# meta data
rank_model_version: float(optional)
rank_score_threshold: float(optional)
analysis_date: datetime(optional)
human_genome_build: str(optional)
```
### Scout Load Config

The loading config is a `.yaml` file and can include all the necessary information to scout. Command line options will overload information in the config file. For a complete spec of the config file see [load config](load-config.md)

An example file, (this file is located in `scout/demo/643594.config.yaml`):

Expand Down Expand Up @@ -95,7 +74,7 @@ human_genome_build: 37

```

## Load case from CLI without config
### Load case from CLI without config

Cases can be loaded without config file, in that case the user needs to specify a ped file and optionally one or several VCF files. An example could look like

Expand Down
6 changes: 6 additions & 0 deletions docs/user-guide/genes.md
@@ -0,0 +1,6 @@
# Genes and transcripts

Scout stores information about genes and transcripts. The information is collected from a couple of resources, these can be updated manually if desired. Defenition of what genes that exists and their correct names are collected from [HGNC][hgnc]. Unfortunately HGNC does only maintain a distribution for GRCh38, at this time (mid 2017) there are many resorces that lack support for build 38 so many investigators still use build 37. We then use two files, one for each build, with information about coordinates and transcripts from ensembl. These files together make up the defenition of genes that are used in scout.


[hgnc]: http://www.genenames.org
2 changes: 1 addition & 1 deletion docs/user-guide/getting-started.md
Expand Up @@ -3,7 +3,7 @@ Scout is a web-based visualizer for VCF-files. It helps to manage multiple patie


## Institutes, Cases, Variants
Scout has a few levels of abstraction to deal with the data it presents. *Institutes* contain multiple cases and group users into teams. Cases are a unit that is analysed together, usually the same as a family or a tumor/normal sample - they all share a subset of called variants. Variants are individual genotype calls across a single case.
Scout has a few levels of abstraction to deal with the data it presents. [*Institutes*](institutes.md) contain multiple [cases](cases.md) and group [users](users.md) into teams. Cases are a unit that is analysed together, usually the same as a family or a tumor/normal sample - they all share a subset of called variants. Variants are individual genotype calls across a single case.

> [insert screenshot here]

Expand Down
3 changes: 3 additions & 0 deletions docs/user-guide/institutes.md
@@ -0,0 +1,3 @@
# Institutes

Scout was made as a centralized tool where multiple users from different customers could work against the same instance. Institutes is a way to separate sensitive information from the users. Each [case](cases.md) has to have a institute as owner. A [user](users.md) belongs to an institute and in that way restricted to see only the cases owned by that institute. So one instance of scout can have one or many institutes. Each institute could be the owner of multiple cases and have multiple users attached. Each institute has to have a unique identifier, `institute_id`.
3 changes: 3 additions & 0 deletions docs/user-guide/users.md
@@ -0,0 +1,3 @@
#Users

A user represents an individual with access to all [cases](cases.md) that belongs to the same [institute](institutes.md) that the user does. From the main menu in scout one can access a *users* page that displays all existing users in the scout instance and ranks them based on how many actions they have performed.