Clinical-Genomics · robinandeer · Aug 16, 2017 · Aug 17, 2017 · Aug 17, 2017 · Aug 17, 2017
diff --git a/artwork/logo.sketch b/artwork/logo.sketch
diff --git a/docs/README.md b/docs/README.md
@@ -13,4 +13,4 @@
 
 <p align="center"><img height="400" width="800" src="img/v3.jpg"></p>
 
-**New entry in the blog**: [What's new in 3.1?](blog/new-3.1.md)
+**New entry in the blog**: [What's new in 3.3?](blog/new-3.3.md)
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -13,6 +13,8 @@
 	* [Getting Started](user-guide/getting-started.md)
 	* [General Usage](user-guide/using-scout.md)
 	* [Pages](user-guide/pages.md)
+	* [Institutes](user-guide/institutes.md)
+	* [Users](user-guide/users.md)
 	* [Cases](user-guide/cases.md)
 	* [Variants](user-guide/variants.md)
 	* [Gene Panels](user-guide/panels.md)
@@ -21,6 +23,8 @@
 
 * [Admin Guide](admin-guide/README.md)
 	* [Loading](admin-guide/loading.md)
+	* [Load config](admin-guide/load-config.md)
+	* [Annotations](admin-guide/annotations.md)
 
 * [Blog](blog/README.md)
 	* [What's new in 3.0?](blog/new-3.0.md)

diff --git a/docs/admin-guide/annotations.md b/docs/admin-guide/annotations.md
@@ -0,0 +1,140 @@
+# Annotations
+
+Scout is a primarily a visualisation tool with some other functionality. One could imagine that in the future, some or all annotations could be performed by Scout. For now scout will look for some known keys when uploading a VCF and extract information for those. [VEP][vep] is the tool supported for functional and regional annotations at the moment, [SnpEff][snpeff] will be added in the near future. For the other types of annotations Scout will look for certain keys in the INFO field of the vcf and expect the value to be of a specific type. This means that there is not a dependency on any other specific annotation tool besides VEP, just make sure that the key and values are correct according to the specification below.
+
+### Rank score
+
+One of the hard problem when dealing with whole genome data is the huge amount variants that are generated in every analysis. Scout was developed to be used in rare variant analysis, this means that there is ony a small number of variants that are actually interesting to look at. We do not want to store all variants from each case in a database that should be able to controll thousands of cases. To solve this problem we are working with rank scores, each variant is scored according to a scoring schema then we only upload and sort the variants based on their rank score. In this way the users can start by looking at the variants that looks potentially most dangerous from a bioinformatic perspective. We use the tool [genmod][genmod] to (among other things) score the variant, but as long as there is a `RankScore`-field in the `INFO` field of the VCF with a float as value it is handeled by Scout.
+
+## Annotation keys and tool suggestions
+
+In this section all the different annotation keys and suggestions of tools that can be used to annotate them are listed.
+
+### Frequencies 
+
+#### 1000G ####
+
+The frequency from the [1000G][1000g] population database.
+
+- Key: `1000G`
+- Value: `Float`
+- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno]
+
+#### 1000G_MAX_AF ####
+
+The maximum allele frequency of all populations in the [1000G][1000g] population database.
+
+- Key: `1000G_MAX_AF`
+- Value: `Float`
+- Tools: custom made, we have modified the 1000G file and use [genmod][genmod]
+
+#### ExAC ####
+
+The frequency from the [ExAC][exac] population database.
+
+- Key: `EXACAF`
+- Value: `Float`
+- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno] 
+
+
+#### ExAC_MAX_AF ####
+
+The maximum allele frequency of all populations [ExAC][exac] population database. 
+
+- Key: `EXAC_MAX_AF`
+- Value: `Float`
+- Tools: custom made, we have modified the exac file and use [genmod][genmod]
+
+### Severity ###
+
+#### CADD score ####
+
+The Combined Annotation Dependent Depletion([CADD][cadd]) score. A prediction of the deleterioussness for a variant.
+
+- Key: `CADD` or `cadd` in VEP `CSQ` field
+- Value: `Float`
+- Tools: [VEP][vep], [SnpEff][snpeff], [genmod][genmod], [vcfanno][vcfanno] 
+
+#### SIFT ####
+
+The [SIFT][sift]) prediction for how a variation affects the protein.
+
+- Key: `CSQ`-`SIFT`
+- Value: `String`
+- Tools: [VEP][vep]
+
+#### PolyPhen ####
+
+The [PolyPhen][polyphen]) prediction for how a variation affects the protein.
+
+- Key: `CSQ`-`PolyPhen`
+- Value: `String`
+- Tools: [VEP][vep]
+
+
+#### Rank score ####
+
+The combined rank score for a variant
+
+- Key: `RankScore`
+- Value: `Float`
+- Tools: [genmod][genmod]
+
+
+### Conservation ###
+
+#### Gerp ####
+
+The Genomic Evolutionary Rate Profiling([GERP][gerp]) conservation string. An estimation of how conserved this position is.
+
+- Key: `GERP++_RS_prediction_term`
+- Value: `String`
+- Tools: [SnpSift][snpsift]
+
+#### phastCons ####
+
+The [PHASTcons][phastcons] conservation string.
+
+- Key: `phastCons100way_vertebrate_prediction_term`
+- Value: `String`
+- Tools: [SnpSift][snpsift]
+
+#### phylop ####
+
+The [phylop][phylop] 100 way predicted conservation string.
+
+- Key: `phyloP100way_vertebrate_prediction_term`
+- Value: `String`
+- Tools: [SnpSift][snpsift]
+
+### Inheritance ###
+
+#### Genetic models ####
+What genetics models are followed for the variant in the particular family
+
+- Key: `GeneticModels`
+- Value: list of `String`
+- Tools: [genmod][genmod]
+
+#### Autosomal Recessive Compounds ####
+What variants is this variant in Autosomal Recessive Compound with?
+
+- Key: `Compounds`
+- Value: list of `String`
+- Tools: [genmod][genmod]
+
+
+[vep]: http://www.ensembl.org/info/docs/tools/vep/index.html
+[snpeff]: http://snpeff.sourceforge.net/about.html
+[genmod]: https://github.com/moonso/genmod
+[vcfanno]: https://github.com/brentp/vcfanno
+[snpsift]: http://snpeff.sourceforge.net/SnpSift.html
+
+[1000g]: http://www.1000genomes.org/
+[exac]: http://exac.broadinstitute.org
+[cadd]: http://cadd.gs.washington.edu
+[gerp]: http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html
+[phastcons]: http://compgen.cshl.edu/phast/
+[phylop]: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cons100way
+[sift]: http://sift.jcvi.org
+[polyphen]: http://genetics.bwh.harvard.edu/pph2/dokuwiki/
diff --git a/docs/admin-guide/load-config.md b/docs/admin-guide/load-config.md
@@ -0,0 +1,99 @@
+# The load config
+
+Scout have the possibility to store loads of information about a case and the samples that are included. It is cumbersome to specify to many parameters on the command line so there is an option to give this information in a yaml formated config file.
+Here we can give scout some meta information about the analysis, how it was performed, information about family, samples etc. 
+
+The basic structure of a load config looks like:
+
+
+```yaml
+owner: str(mandatory)
+
+family: str(mandatory)
+samples:
+  - analysis_type: str(optional), [wgs,wes]
+    sample_id: str(mandatory)
+    capture_kit: str(optional)
+    father: str(mandatory)
+    mother: str(mandatory)
+    sample_name: str(mandatory)
+    phenotype: str(mandatory), [affected, unaffected, unknown]
+    sex: str(mandatory), [male, female, unknown]
+    expected_coverage: int(mandatory)
+
+vcf_snv: str(optional)
+vcf_sv: str(optional)
+vcf_cancer: str(optional)
+vcf_snv_research: str(optional)
+vcf_sv_research: str(optional)
+vcf_cancer_research: str(optional)
+
+madeline: str(optional)
+
+peddy_ped: str(optional)
+peddy_ped_check: str(optional)
+peddy_sex_check: str(optional)
+
+default_gene_panels: list[str](optional)
+gene_panels: list[str](optional)
+
+# meta data
+rank_model_version: float(optional)
+rank_score_threshold: float(optional)
+analysis_date: datetime(optional)
+human_genome_build: str(optional)
+```
+
+Let's go through each field:
+
+- **owner** each case has to have a owner, this refers to an existing institute in the scout instance
+- **family** each case has to have a family id
+- **samples** list of samples included in the case
+	- *analysis_type* specifies the analysis type for the sample
+	- *samlple_id* identifyer for a sample
+	- *capture_kit* for exome specifies the capture kit
+	- *father* sample id for father or 0
+	- *mother* sample id for mother or 0
+	- *phenotype* specifies the affection status of the sample in human readable format
+	- *sex* specifies the sex of the sample in human readable format
+	- *expected_coverage* the level of expected coverage
+- **vcf_snv** path to snv vcf file
+- **vcf_sv** 
+- **vcf_snv_research** path to vcf file with all variants
+- **vcf_sv_research**
+- **vcf_cancer**
+- **vcf_cancer_research**
+- **madeline** path to a madeline pedigree file in xml format
+- **peddy_ped** path to a [peddy](https://github.com/brentp/peddy) ped file with an analysis of the pedigree based on variant information
+- **peddy_ped_check** path to a [peddy](https://github.com/brentp/peddy) ped check file
+- **peddy_sex_check** path to a [peddy](https://github.com/brentp/peddy) ped check file
+- **default_gene_panels** list of default gene panels. Variants from the genes in the gene panels specified will be shown when opening the case in scout
+- **gene_panels** list of gene panels. This will specify what panels the case has been run with
+- **rank model version** which rank model that was used when scoring the variants
+- **rank_score_treshold** only include variants with a rank score above this treshold
+- **analysis_date** time for analysis in datetime format. Defaults to time of uploading
+- **human_genome_build** what genome version was used.
+
+### Minimal config
+
+Here is an example of a minimal load config:
+
+```yaml
+---
+
+owner: cust004
+
+family: '1'
+samples:
+  - analysis_type: wes
+    sample_id: NA12878
+    capture_kit: Agilent_SureSelectCRE.V1
+    father: 0
+    mother: 0
+    sample_name: NA12878
+    phenotype: affected
+    sex: male
+    expected_coverage: 30
+
+vcf_snv: scout/demo/643594.clinical.vcf.gz
+```
diff --git a/docs/admin-guide/loading.md b/docs/admin-guide/loading.md
@@ -1,45 +1,24 @@
 #Loading Scout
 
-When loading a case into scout it is possible to use either a config file or to specify parameters on the command line.
+## Institute
 
-## Scout Load Config
+To load a institute into scout use the command `scout load institute`. As mentioned in the user guide an [institute](../user-guide/institutes.md) has to have a unique internal id, this is specified on the command line with `-i/--internal-id`. Also a display name could be used if there is a need for that, specify with `-d/--display-name`. If no display name is choosen it will default to internal id.
+Note that internal id is unique.
 
-The loading config is a `.yaml` file and can include all the necessary information to scout. Command line options will overload information in the config file.
+## User
 
-The config file has the following specification:
+To load a user into scout use the command `scout load user`. A user has to: 
 
-```yaml
-owner: str(mandatory)
+- belong to an *institute*
+- have a *name*
+- have a *email adress*
 
-family: str(mandatory)
-samples:
-  - analysis_type: str(optional), [wgs,wes]
-    sample_id: str(mandatory)
-    capture_kit: str(optional)
-    father: str(mandatory)
-    mother: str(mandatory)
-    sample_name: str(mandatory)
-    phenotype: str(mandatory), [affected, unaffected, unknown]
-    sex: str(mandatory), [male, female, unknown]
-    expected_coverage: int(mandatory)
-
-vcf_snv: str(optional)
-vcf_sv: str(optional)
-vcf_cancer: str(optional)
-vcf_snv_research: str(optional)
-vcf_sv_research: str(optional)
-vcf_cancer_research: str(optional)
-
-madeline: str(optional)
-default_gene_panels: list[str](optional)
-gene_panels: list[str](optional)
+## Case
+When loading a case into scout it is possible to use either a config file or to specify parameters on the command line.
 
-# meta data
-rank_model_version: float(optional)
-rank_score_threshold: float(optional)
-analysis_date: datetime(optional)
-human_genome_build: str(optional)
-```
+### Scout Load Config
+
+The loading config is a `.yaml` file and can include all the necessary information to scout. Command line options will overload information in the config file. For a complete spec of the config file see [load config](load-config.md)
 
 An example file, (this file is located in `scout/demo/643594.config.yaml`):
 
@@ -95,7 +74,7 @@ human_genome_build: 37
 
 ```
 
-## Load case from CLI without config
+### Load case from CLI without config
 
 Cases can be loaded without config file, in that case the user needs to specify a ped file and optionally one or several VCF files. An example could look like
 

diff --git a/docs/user-guide/genes.md b/docs/user-guide/genes.md
@@ -0,0 +1,6 @@
+# Genes and transcripts
+
+Scout stores information about genes and transcripts. The information is collected from a couple of resources, these can be updated manually if desired. Defenition of what genes that exists and their correct names are collected from [HGNC][hgnc]. Unfortunately HGNC does only maintain a distribution for GRCh38, at this time (mid 2017) there are many resorces that lack support for build 38 so many investigators still use build 37. We then use two files, one for each build, with information about coordinates and transcripts from ensembl. These files together make up the defenition of genes that are used in scout.
+
+
+[hgnc]: http://www.genenames.org
diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md
@@ -3,7 +3,7 @@ Scout is a web-based visualizer for VCF-files. It helps to manage multiple patie
 
 
 ## Institutes, Cases, Variants
-Scout has a few levels of abstraction to deal with the data it presents. *Institutes* contain multiple cases and group users into teams. Cases are a unit that is analysed together, usually the same as a family or a tumor/normal sample - they all share a subset of called variants. Variants are individual genotype calls across a single case.
+Scout has a few levels of abstraction to deal with the data it presents. [*Institutes*](institutes.md) contain multiple [cases](cases.md) and group [users](users.md) into teams. Cases are a unit that is analysed together, usually the same as a family or a tumor/normal sample - they all share a subset of called variants. Variants are individual genotype calls across a single case.
 
 > [insert screenshot here]
 

diff --git a/docs/user-guide/institutes.md b/docs/user-guide/institutes.md
@@ -0,0 +1,3 @@
+# Institutes
+
+Scout was made as a centralized tool where multiple users from different customers could work against the same instance. Institutes is a way to separate sensitive information from the users. Each [case](cases.md) has to have a institute as owner. A [user](users.md) belongs to an institute and in that way restricted to see only the cases owned by that institute. So one instance of scout can have one or many institutes. Each institute could be the owner of multiple cases and have multiple users attached. Each institute has to have a unique identifier, `institute_id`.
diff --git a/docs/user-guide/users.md b/docs/user-guide/users.md
@@ -0,0 +1,3 @@
+#Users
+
+A user represents an individual with access to all [cases](cases.md) that belongs to the same [institute](institutes.md) that the user does. From the main menu in scout one can access a *users* page that displays all existing users in the scout instance and ranks them based on how many actions they have performed.