[ADAM-1883] Python and R caching #1885
Conversation
1 similar comment
Test PASSed. |
Resolves #1883 |
This is tested and works like so:
|
I'm not sure I'm comfortable reviewing Python stuff yet. Is it reasonable to expose |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 1c40800 # timeout=10Checking out Revision 1c40800 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 1c40800b3b7b8e34bbc320651ff03f1dcc7c1e12First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
re 2 I'm asking is the |
I agree @heuermh, but this is currently the only way to access a pyspark rdd from our API. The other rdd option is ._jvmRdd, but this is not a pyspark RDD, just an inaccessible Java Member. |
Test PASSed. |
I see, thanks |
@heuermh I updated the count example to exclude .rdd so it isn't so confusing |
Thanks for doing this @akmorrow13! Two things:
To wit:
|
@@ -137,6 +137,26 @@ trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends Logging { | |||
*/ | |||
def replaceSequences(newSequences: SequenceDictionary): U | |||
|
|||
/** | |||
* Caches underlying RDD in memory. This interfaced is used to |
fnothaft
Jan 25, 2018
Member
Drop comment about R/Python, this method is useful independent of that.
Drop comment about R/Python, this method is useful independent of that.
|
||
/** | ||
* Unpersists underlying RDD from memory. This interfaced is used to | ||
* access caching functionality from the python and R APIs. |
fnothaft
Jan 25, 2018
Member
Drop comment about R/Python, this method is useful independent of that.
Drop comment about R/Python, this method is useful independent of that.
* @return type of RDD that was cached | ||
*/ | ||
def cache() = { | ||
rdd.cache() |
heuermh
Jan 25, 2018
Member
Asking out of ignorance, does caching make any sense for GenomicDataset.dataset
?
Asking out of ignorance, does caching make any sense for GenomicDataset.dataset
?
fnothaft
Jan 25, 2018
Member
Ah, yes... These should probably be overridden by the DatasetBound...RDD
classes.
Ah, yes... These should probably be overridden by the DatasetBound...RDD
classes.
fnothaft
Feb 3, 2018
Member
Hi @akmorrow13! Sorry for the slow reply here. Yes, that is what I want, but the simpler way to do this is to have RDDBoundGenomicRDD
and DatasetBoundGenomicRDD
traits that both implement the cache
, persist
, and unpersist
methods. Then RDDBoundAlignmentRecordRDD
would extend RDDBoundGenomicRDD
, etc. We should do this for saveAsParquet
as well.
Also, my preference is that unpersist
returns Unit
(no return value) and that cache
/persist
return either U
or Unit
.
Hi @akmorrow13! Sorry for the slow reply here. Yes, that is what I want, but the simpler way to do this is to have RDDBoundGenomicRDD
and DatasetBoundGenomicRDD
traits that both implement the cache
, persist
, and unpersist
methods. Then RDDBoundAlignmentRecordRDD
would extend RDDBoundGenomicRDD
, etc. We should do this for saveAsParquet
as well.
Also, my preference is that unpersist
returns Unit
(no return value) and that cache
/persist
return either U
or Unit
.
I added a persist function, but I have absolutely no idea to handle passing parameters from python to scala. I tried following the example for stringency, like this:
but that didn't seem to work |
Test PASSed. |
Test PASSed. |
def cache(self): | ||
self._jvmRdd.cache() | ||
|
||
def persist(self, sl): |
fnothaft
Jan 25, 2018
Member
sl
should be of type StorageLevel
- Use the Java StorageLevels.create method
- Voila:
self._jvmRdd.persist(self.sc._jvm.org.apache.spark.api.StorageLevels.create(sl.useDisk, sl.useMemory, sl.useOffHeap, sl.useDeserialized, sl.replication)
Py4j lines Python types up with Java types, so you need to line Python bool
up with Java boolean
, not Scala.Boolean
, Python int
up with Java int
, not Scala.Integer
, etc.
sl
should be of typeStorageLevel
- Use the Java StorageLevels.create method
- Voila:
self._jvmRdd.persist(self.sc._jvm.org.apache.spark.api.StorageLevels.create(sl.useDisk, sl.useMemory, sl.useOffHeap, sl.useDeserialized, sl.replication)
Py4j lines Python types up with Java types, so you need to line Python bool
up with Java boolean
, not Scala.Boolean
, Python int
up with Java int
, not Scala.Integer
, etc.
Test PASSed. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 52c468a # timeout=10Checking out Revision 52c468a (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 52c468a > /home/jenkins/git2/bin/git rev-list 92d412d # timeout=10Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result SUCCESSADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result SUCCESSADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 2a5c7d4 # timeout=10Checking out Revision 2a5c7d4 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2a5c7d4 > /home/jenkins/git2/bin/git rev-list 52c468a # timeout=10Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 2a5c7d4 # timeout=10Checking out Revision 2a5c7d4 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2a5c7d4 > /home/jenkins/git2/bin/git rev-list 2a5c7d4 # timeout=10Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 01bd9f7 # timeout=10Checking out Revision 01bd9f7 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 01bd9f7 > /home/jenkins/git2/bin/git rev-list 2a5c7d4 # timeout=10Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains b38a1c5 # timeout=10Checking out Revision b38a1c5 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b38a1c5 > /home/jenkins/git2/bin/git rev-list 01bd9f7 # timeout=10Triggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Just a couple of stylistic nits. Looks good otherwise; thanks @akmorrow13! |
JavaSaveArgs, | ||
SAMHeaderWriter | ||
} | ||
import org.bdgenomics.adam.rdd.{ DatasetBoundGenomicDataset, AvroGenomicRDD, JavaSaveArgs, SAMHeaderWriter } |
fnothaft
Feb 7, 2018
Member
Nit: Please split import across multiple lines.
Nit: Please split import across multiple lines.
@@ -30,7 +30,7 @@ import org.bdgenomics.adam.models.{ | |||
SequenceDictionary | |||
} | |||
import org.bdgenomics.adam.rdd.ADAMContext._ | |||
import org.bdgenomics.adam.rdd.{ AvroRecordGroupGenomicRDD, JavaSaveArgs } | |||
import org.bdgenomics.adam.rdd.{ DatasetBoundGenomicDataset, AvroRecordGroupGenomicRDD, JavaSaveArgs } |
fnothaft
Feb 7, 2018
Member
Please split long import line across multiple lines.
Please split long import line across multiple lines.
AvroGenomicRDD, | ||
VCFHeaderUtils | ||
} | ||
import org.bdgenomics.adam.rdd.{ DatasetBoundGenomicDataset, AvroGenomicRDD, VCFHeaderUtils } |
fnothaft
Feb 7, 2018
Member
Nit: please split long import across multiple lines.
Nit: please split long import across multiple lines.
:param sl new StorageLevel | ||
""" | ||
|
||
return self._replaceRdd(self._jvmRdd.persist(self.sc._jvm.org.apache.spark.api.java.StorageLevels.create(sl.useDisk, \ |
fnothaft
Feb 7, 2018
Member
This line is really long. Can you split it out as:
jsl = self.sc._jvm.org.apache.spark.api.java.StorageLevels.create(sl.useDisk,
sl.useMemory,
sl.useOffHeap,
sl.deserialized,
sl.replication)
return self._replaceRdd(self._jvmRdd.persist(jsl))
Indented parameters should align with the opening delimiter.
This line is really long. Can you split it out as:
jsl = self.sc._jvm.org.apache.spark.api.java.StorageLevels.create(sl.useDisk,
sl.useMemory,
sl.useOffHeap,
sl.deserialized,
sl.replication)
return self._replaceRdd(self._jvmRdd.persist(jsl))
Indented parameters should align with the opening delimiter.
@@ -201,3 +202,33 @@ def test_filterByOverlappingRegions(self): | |||
|
|||
filtered = reads.filterByOverlappingRegions(querys) | |||
self.assertEquals(filtered.toDF().count(), 2) | |||
|
|||
def test_caching(self): |
fnothaft
Feb 7, 2018
Member
Nit: 2 lines of space before each new function definition.
Nit: 2 lines of space before each new function definition.
863b906
to
0fdbdf7
Test PASSed. |
Test PASSed. |
Test PASSed. |
Looks good, some minor doc fixes, and a question about |
} | ||
|
||
/** | ||
* Persists underlying RDD in memory. |
heuermh
Feb 7, 2018
Member
Persists underlying RDD in memory. → Persists underlying RDD in memory or disk.
Persists underlying RDD in memory. → Persists underlying RDD in memory or disk.
} | ||
|
||
/** | ||
* Unpersists underlying RDD from memory. |
heuermh
Feb 7, 2018
Member
Unpersists underlying RDD from memory. → Unpersists underlying RDD from memory or disk.
Unpersists underlying RDD from memory. → Unpersists underlying RDD from memory or disk.
/** | ||
* A trait describing a GenomicDataset that also supports the Spark SQL APIs. | ||
*/ | ||
trait DatasetBoundGenomicDataset[T, U <: Product, V <: GenomicDataset[T, U, V]] extends GenomicDataset[T, U, V] { |
heuermh
Feb 7, 2018
Member
@fnothaft I think you may have suggested this, doesn't GenomicDataset
already imply the RDD has been bound to a dataset? Couldn't these be moved there?
I'm not very sure what GenomicDataset
is for, it isn't used everywhere I would've thought
https://github.com/bigdatagenomics/adam/search?utf8=%E2%9C%93&q=GenomicDataset&type=
@fnothaft I think you may have suggested this, doesn't GenomicDataset
already imply the RDD has been bound to a dataset? Couldn't these be moved there?
I'm not very sure what GenomicDataset
is for, it isn't used everywhere I would've thought
https://github.com/bigdatagenomics/adam/search?utf8=%E2%9C%93&q=GenomicDataset&type=
fnothaft
Feb 7, 2018
Member
No, GenomicDataset
is any GenomicRDD
where the underlying genomic data can be viewed as both an RDD or a Dataset, while the base GenomicRDD
trait only provides RDDs. This distinction will go away soon.
No, GenomicDataset
is any GenomicRDD
where the underlying genomic data can be viewed as both an RDD or a Dataset, while the base GenomicRDD
trait only provides RDDs. This distinction will go away soon.
fnothaft
Feb 7, 2018
Member
ParquetUnbound
/RDDBound
/DatasetBound
implies how the data is currently physically represented. Parquet unbound means that the data hasn't been materialized into memory yet and that the data is represented using a Parquet file, while RDD/DatasetBound means that the current (as of the last transformation) representation is an RDD/Dataset.
Since cache
impacts the physical materialization of the data, we want the new DatasetBoundGenomicDataset
abstract class to override cache
, and not the GenomicDataset
abstract class.
ParquetUnbound
/RDDBound
/DatasetBound
implies how the data is currently physically represented. Parquet unbound means that the data hasn't been materialized into memory yet and that the data is represented using a Parquet file, while RDD/DatasetBound means that the current (as of the last transformation) representation is an RDD/Dataset.
Since cache
impacts the physical materialization of the data, we want the new DatasetBoundGenomicDataset
abstract class to override cache
, and not the GenomicDataset
abstract class.
} | ||
|
||
/** | ||
* Persists underlying dataset in memory. |
heuermh
Feb 7, 2018
Member
Persists underlying dataset in memory. → Persists underlying dataset in memory or disk.
Persists underlying dataset in memory. → Persists underlying dataset in memory or disk.
* Persists underlying dataset in memory. | ||
* | ||
* @param sl new StorageLevel | ||
* @return Unit |
} | ||
|
||
/** | ||
* Unpersists underlying dataset from memory. |
heuermh
Feb 7, 2018
Member
Unpersists underlying dataset from memory. → Unpersists underlying dataset from memory or disk.
Unpersists underlying dataset from memory. → Unpersists underlying dataset from memory or disk.
/** | ||
* Unpersists underlying dataset from memory. | ||
* | ||
* @return Unit |
fnothaft
Feb 7, 2018
Member
unpersisted, not uncached
unpersisted, not uncached
|
||
def persist(self, sl): | ||
""" | ||
Persists underlying RDD in memory. |
heuermh
Feb 7, 2018
Member
Persists underlying RDD in memory. → Persists underlying RDD in memory or disk.
Persists underlying RDD in memory. → Persists underlying RDD in memory or disk.
|
||
def unpersist(self): | ||
""" | ||
Unpersists underlying RDD from memory. |
heuermh
Feb 7, 2018
Member
Unpersists underlying RDD from memory. → Unpersists underlying RDD from memory or disk.
Unpersists underlying RDD from memory. → Unpersists underlying RDD from memory or disk.
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains c71ccf4 # timeout=10Checking out Revision c71ccf4 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f c71ccf4 > /home/jenkins/git2/bin/git rev-list e719c88 # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
1 similar comment
Jenkins, retest this please. |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains c71ccf4 # timeout=10Checking out Revision c71ccf4 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f c71ccf4 > /home/jenkins/git2/bin/git rev-list c71ccf4 # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
:return: Returns a new, persisted RDD. | ||
""" | ||
|
||
jsl = self.sc._jvm.org.apache.spark.api.java.StorageLevels.create(sl.useDisk, \ |
fnothaft
Feb 9, 2018
Member
Do not use \
to break lines, and please indent to the opening delimiter. I.e., every sl
should be at the same character position on the line.
Do not use \
to break lines, and please indent to the opening delimiter. I.e., every sl
should be at the same character position on the line.
|
||
return self._replaceRdd(self._jvmRdd.persist(jsl)) | ||
|
||
def unpersist(self): |
fnothaft
Feb 9, 2018
Member
Two spaces above.
Two spaces above.
From looking at Jenkins, needs a |
Test FAILed. Build result: FAILURE[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1885/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 89131e6 # timeout=10Checking out Revision 89131e6 (origin/pr/1885/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 89131e6 > /home/jenkins/git2/bin/git rev-list c71ccf4 # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Test PASSed. |
I think this is all set @fnothaft let me know if there is anything else that is needed |
Test PASSed. |
Thanks @akmorrow13! Merged manually as 6051321. |
No description provided.