Skip to content
This repository
Browse code

Expose additional preparation parameters to top level config and gene…

…ralize. Allows prep of haploid calls without conversion to diploid
  • Loading branch information...
commit 83f940fabf5fc1993dbd989bdbb99a303ec5af77 1 parent 69f88a9
Brad Chapman authored
2  README.md
Source Rendered
@@ -94,6 +94,8 @@ provide example starting points and details on available options are below:
94 94 not coordinate sorted within chromosomes. (boolean; default false).
95 95 prep-sv-genotype: Normalize structural variant genotypes to a single
96 96 ref call (boolean; default false).
  97 + prep-allele-count: Number of alleles to convert calls to during
  98 + prep work (default 2)
97 99 preclean: Remove problematic characters from input VCFs
98 100 (boolean; default false).
99 101 remove-refcalls: Remove reference, non-variant calls.
1  config/single-process.yaml
@@ -9,5 +9,6 @@ experiments:
9 9 calls:
10 10 - name: gatk
11 11 file: test/data/freebayes-calls-indels.vcf
  12 + prep-sv-genotype: true
12 13 prep: true
13 14 normalize: true
3  src/bcbio/variation/combine.clj
@@ -181,8 +181,7 @@
181 181 prep-file (if (true? (:prep call))
182 182 (prep-vcf sample-file (:ref exp) (:sample exp) :out-dir out-dir
183 183 :out-fname out-fname :orig-ref-file (:ref call)
184   - :sort-pos (get call :prep-sort-pos false)
185   - :sv-genotype (get call :prep-sv-genotype false))
  184 + :config call)
186 185 sample-file)
187 186 hap-file (if (true? (:make-haploid call))
188 187 (diploid-calls-to-haploid prep-file (:ref exp) :out-dir out-dir)
22 src/bcbio/variation/normalize.clj
@@ -91,7 +91,7 @@
91 91
92 92 (defn- fix-vc
93 93 "Build a new variant context with updated sample name and normalized alleles.
94   - Based on :allele-count in the configuration updates haploid allele calls. This
  94 + Based on :prep-allele-count in the configuration updates haploid allele calls. This
95 95 normalizes the representation in Mitochondrial and Y chromosomes which are
96 96 haploid but are often represented as diploid with a single call."
97 97 [sample config orig]
@@ -101,9 +101,9 @@
101 101 [(Genotype/modifyName g sample)])
102 102 (.getGenotypes vc)))
103 103 (normalize-allele-calls [g]
104   - {:pre [(contains? #{1 (:allele-count config)} (count (.getAlleles g)))]}
105   - (if (= (count (.getAlleles g)) (:allele-count config)) g
106   - (Genotype/modifyAlleles g (repeat (:allele-count config)
  104 + {:pre [(contains? #{1 (:prep-allele-count config)} (count (.getAlleles g)))]}
  105 + (if (= (count (.getAlleles g)) (:prep-allele-count config)) g
  106 + (Genotype/modifyAlleles g (repeat (:prep-allele-count config)
107 107 (first (.getAlleles g))))))]
108 108 (-> orig
109 109 (assoc :vc
@@ -148,7 +148,7 @@
148 148 0 [(Genotype. sample [alt-allele])]
149 149 1 [(maybe-fix-vc (first gs) alt-allele)]
150 150 (map :genotype gs)))]
151   - (if (:sv-genotype config)
  151 + (if (:prep-sv-genotype config)
152 152 (let [new-gs (ref-vc-genotype (:genotypes orig)
153 153 (first (:alt-alleles orig)))]
154 154 (-> orig
@@ -164,7 +164,7 @@
164 164 [rdr vcf-decoder sample config]
165 165 (->> rdr
166 166 line-seq
167   - (#(if (:sort-pos config) (sort-by-position %) %))
  167 + (#(if (:prep-sort-pos config) (sort-by-position %) %))
168 168 (remove nochange-alt?)
169 169 (map vcf-decoder)
170 170 (map (partial normalize-sv-genotype config sample))
@@ -190,7 +190,7 @@
190 190 (assoc xs 0 new))]
191 191 (let [parts (string/split line #"\t")
192 192 cur-chrom (first (vals
193   - (chr-name-remap (:org config) ref-info [(first parts)])))]
  193 + (chr-name-remap (:prep-org config) ref-info [(first parts)])))]
194 194 {:chrom cur-chrom
195 195 :line (->> parts
196 196 (fix-chrom cur-chrom)
@@ -258,9 +258,11 @@
258 258 Assumes by position sorting of variants in the input VCF. Chromosomes do
259 259 not require a specific order, but positions internal to a chromosome do.
260 260 Currently configured for human preparation."
261   - [in-vcf-file ref-file sample & {:keys [out-dir out-fname sort-pos sv-genotype]
262   - :or {sort-pos false}}]
263   - (let [config {:org :GRCh37 :allele-count 2 :sort-pos sort-pos :sv-genotype sv-genotype}
  261 + [in-vcf-file ref-file sample & {:keys [out-dir out-fname config]
  262 + :or {config {}}}]
  263 + (let [config (merge-with #(or %1 %2) config
  264 + {:prep-org :GRCh37 :prep-allele-count 2
  265 + :prep-sort-pos false :prep-sv-genotype false})
264 266 base-name (if (nil? out-fname) (itx/remove-zip-ext in-vcf-file) out-fname)
265 267 out-file (itx/add-file-part base-name "prep" out-dir)]
266 268 (when (itx/needs-run? out-file)
2  test/bcbio/variation/test/compare.clj
@@ -170,7 +170,7 @@
170 170 (facts "Check for multiple samples in a VCF file"
171 171 (multiple-samples? vcf) => false)
172 172 (facts "Normalize variant representation of chromosomes, order, genotypes and samples."
173   - (prep-vcf vcf ref "Test1" :sort-pos true) => out-vcf)
  173 + (prep-vcf vcf ref "Test1" :config {:prep-sort-pos true}) => out-vcf)
174 174 (facts "Pre-cleaning of problematic VCF input files"
175 175 (clean-problem-vcf prevcf) => out-prevcf)))
176 176
2  test/bcbio/variation/test/utils.clj
@@ -17,5 +17,5 @@
17 17 ?form)))
18 18
19 19 (facts "Add Complete Genomics metrics to VCF file."
20   - (let [ready-vcf (prep-vcf cg-vcf ref "NA12939" :sort-pos true)]
  20 + (let [ready-vcf (prep-vcf cg-vcf ref "NA12939" :config {:prep-sort-pos true})]
21 21 (add-cgmetrics ready-vcf cg-var ref) => out-cg-var))

0 comments on commit 83f940f

Please sign in to comment.
Something went wrong with that request. Please try again.