Stagedencoderrb #4

catoverdrive · 2017-10-20T18:16:38Z

No description provided.

Implicitly verify biallelic. Implicitly verify biallelic on non-split VDSes on the python side for methods that require it through the @require_biallelic decorator. Scala interface now `require`s biallelic expectations. Other requirements should be written in this form, too (like TSampleString). This is based on the decision not to support a user-level Scala interface that mirror the Python. We've already left that with, say, history. Fixed docs failure. Rebased RegionValue/OrderedRDD2 work. Untested. Fixed serialization bug. UnsafeSuite passes. Fixed tests. Lots of cleanup. OrderedRDD2 works with list of partitionKey, key. Cleanup. Removed BroadcastTypeTree. Directly build OrderedRDD2 in LoadVCF. More consistent use of Type ReginoValue accessors. Improved OrderedRDD2 range bounds sampler. Added support for ordrdd2 legacy VDSes. Ported drop samples, variants to rdd2. Ported filter genotypes to rdd2. Added persist2, unpersist2 to OrderedRDD2. Fixed a bunch of bugs related to persist. wip. Ported aggregators, splitMulti, annotateAllles to rdd2. Some tests failing. splitMulti, etc. assume minrep doesn't change ordering. splitMulti doesn't handle multiple variants at the same locus. (Breaks SplitSuite.test.) Variants that might need to move should get shuffled. Aggregators need to copy, would probably fail over samples now. (Add test.) Restricted generated variants to make things pass. wip. minrepped ignored in filter_alleles. aggregators still need to copy. ordrdd2 legacy VDS needs to be checked in. Added testLegacy files. filter_alleles ported to rdd2. Doesn't use HTSGenotypeView. Also a bit of a shitshow. Make aggregators copy when necessary. Fixed python tests. Ported VariantQC to rdd2. Cleanup. Prefer Hail ArrayBuilder. Fixed annotate_alleles bug. Relaxed split_multi minrepped=True condition. wip wip Tests pass. Set block size 128K. Fixed persist heisenbug. Ready to go, size comparable with vcf.bgz. Fixed doc. Added left_aligned option to minrep. Renamed minrepped => left_aligned. Added uniroot. Cleaned up rebase. Tests passing. Fixed tests.

Use in VariantQC. Needs some cleanup.

fix whitespace more whitespace fixes fix && and ||

….readInt()...)

… StagedDecoder

…RowsRDD to use staged

- table for caching array elementOffsets

* # This is a combination of 22 commits. # This is the 1st commit message: apply resettable context forgot to fix one use of AutoCloseable fix add setup iterator more sensible method ordering make TrivialContext Resettable a few more missing resettablecontexts address comments apply resettable context forgot to fix one use of AutoCloseable fix add setup iterator more sensible method ordering remove rogue element type type make TrivialContext Resettable wip wip wip wip use safe row in join suite pull over hailcontext remove Region.clear(newEnd) add selectRegionValue # This is the commit message #2: convert relational.scala ; # This is the commit message #3: scope the extract aggregators constfb call # This is the commit message #4: scope interpret # This is the commit message #5: typeAfterSelect used by selectRegionValue # This is the commit message #6: load matrix # This is the commit message #7: imports # This is the commit message #8: loadbgen converted # This is the commit message hail-is#9: convert loadplink # This is the commit message hail-is#10: convert loadgdb # This is the commit message hail-is#11: convert loadvcf # This is the commit message hail-is#12: convert blockmatrix # This is the commit message hail-is#13: convert filterintervals # This is the commit message hail-is#14: convert ibd # This is the commit message hail-is#15: convert a few methods # This is the commit message hail-is#16: convert split multi # This is the commit message hail-is#17: convert VEP # This is the commit message hail-is#18: formatting fix # This is the commit message hail-is#19: add partitionBy and values # This is the commit message hail-is#20: fix bug in localkeysort # This is the commit message hail-is#21: fixup HailContext.readRowsPartition use # This is the commit message hail-is#22: port balding nichols model * apply resettable context forgot to fix one use of AutoCloseable fix add setup iterator more sensible method ordering make TrivialContext Resettable a few more missing resettablecontexts address comments apply resettable context forgot to fix one use of AutoCloseable fix add setup iterator more sensible method ordering remove rogue element type type make TrivialContext Resettable wip wip wip wip use safe row in join suite pull over hailcontext remove Region.clear(newEnd) add selectRegionValue convert relational.scala ; scope the extract aggregators constfb call scope interpret typeAfterSelect used by selectRegionValue load matrix imports loadbgen converted convert loadplink convert loadgdb convert loadvcf convert blockmatrix convert filterintervals convert ibd convert a few methods convert split multi convert VEP formatting fix add partitionBy and values fix bug in localkeysort fixup HailContext.readRowsPartition use port balding nichols model port over table.scala couple fixes convert matrix table remove necessary use of rdd variety of fixups wip add a clear * Remove direct Region allocation from FilterColsIR When regions are off-heap, we can allow the globals to live in a separate, longer-lived Region that is not cleared until the whole partition is finished. For now, we pay the memory cost. * Use RVDContext in MatrixRead zip This Region will get cleared by consumers. I introduced the zip primitive which is a safer way to zip two RVDs because it does not rely on the user correctly clearing the regions used by the left and right hand sides of the zip. * Control the Regions in LoadGDB I do not fully understand how LoadGDB is working, but a simple solution to the use-case is to serialize to arrays of bytes and parallelize those. I realize there is a proliferation of `coerce` methods. I plan to trim this down once we do not have RDD and ContextRDD coexisting * wip * unify RVD.run * reset in write * fixes * use context region when allocating * also read RVDs using RVDContext * formatting * address comments * remove unused val * abstract over boundary * little fixes * whoops forgot to clear before persisting This fixes the LDPrune if you dont clear the region things go wrong. Not sure what causes that bug. Maybe its something about encoders? * serialize for shuffles, region.scoped in matrixmapglobals, fix joins * clear more! * wip * wip * rework GeneralRDD to ease ContextRDD transition * formatting * final fixes * formatting * merge failures * more bad merge stuff * formatting * remove unnecessary stuff * remove fixme * boom! * variety of merge mistakes * fix destabilize bug * add missing newline * remember to clear the producer region in localkeysort * switch def to val * cleanup filteralleles and exporbidbimfam * fix clearing and serialization issue * fix BitPackedVectorView Previously it always assumed the variant struct started at offset zero, which is not true * address comments, remove a comment * remove direct use of Region * oops * werrrks, mebbe * needs cleanup * fix filter intervals * fixes * fixes * fix filterintervals * remove unnecessary copy in TableJoin * and finally fix the last test * re-use existing CodecSpec definition * remove unnecessary boundaries * use RVD abstraction when possible * formatting * bugfix: RegionValue must know its region * remove unnecessary val and comment * remove unused methods * eliminate unused constructors * undo debug change * formatting * remove unused imports * fix bug in tablejoin * fix RichRDDSuite test If you have no data, then you have no partitions, not 1 partition

cseed and others added 13 commits October 19, 2017 15:34

Added unsafe VSM.insertVA.

661f857

Use in VariantQC. Needs some cleanup.

fixes + additions for asm4s

b5205f0

fix whitespace more whitespace fixes fix && and ||

fixed emitConditional

a2cbe2d

fixed CodeConditional.unary_!() (accidentally swapped true/false labels)

1ed5542

added readInt, readFloat, readDouble, etc. to Decoder class (calls in…

45db40b

….readInt()...)

staged rvb, staged encoder both working with tests

87cb153

added Int64, Float32, Float64 support in StagedRegionValueBuilder and…

2fbc901

… StagedDecoder

remove allocating strings

f6316d6

added case for TBoolean, made static getArrayReader etc, changed Read…

643d122

…RowsRDD to use staged

working copy

0e41b8f

integrated with ReadRowsRDD. need to statically construct struct

d3b59ca

fixed decoder for structs to use static values

e7e0b33

catoverdrive force-pushed the stagedencoderrb branch from 154f4c1 to e7e0b33 Compare October 20, 2017 18:19

catoverdrive mentioned this pull request Oct 20, 2017

Stagedencoder #3

Closed

wang added 3 commits October 20, 2017 15:58

fix while loop

d7f6927

change [/ 8] to [>>> 3]

4651efd

- explicit calls to advance() for arrays and structs

e35d762

- table for caching array elementOffsets

cseed force-pushed the improvecomp branch from 661f857 to 537bedd Compare November 1, 2017 03:58

catoverdrive closed this Dec 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stagedencoderrb #4

Stagedencoderrb #4

catoverdrive commented Oct 20, 2017

Stagedencoderrb #4

Stagedencoderrb #4

Conversation

catoverdrive commented Oct 20, 2017