Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embed type metadata #15

Merged
merged 9 commits into from
Jan 30, 2020
Merged

embed type metadata #15

merged 9 commits into from
Jan 30, 2020

Conversation

nlhepler
Copy link
Contributor

stuffs type_name data into the shard metadata section.

We probably don't want to do this, as type_name is explicitly not guaranteed to be stable across rustc versions, from its docs:

This is intended for diagnostic use. The exact contents and format of the string are not specified, other than being a best-effort description of the type. For example, type_name::<Option>() could return the "Option" or "std::option::Optionstd::string::String", but not "foobar". In addition, the output may change between versions of the compiler.

The type name should not be considered a unique identifier of a type; multiple types may share the same type name.

The current implementation uses the same infrastructure as compiler diagnostics and debuginfo, but this is not guaranteed.

@nlhepler
Copy link
Contributor Author

nlhepler commented Oct 31, 2019

If you want to not look at off-target things, the first commit contains the substantial changes.

@nlhepler
Copy link
Contributor Author

nlhepler commented Oct 31, 2019

The reason I'm leaving this as WIP is we probably need a new trait:

trait TypeNamed {
  fn type_name() -> &'static str;
}

and use this to implement this functionality. However that's a pretty big breaking change.

@coveralls
Copy link

coveralls commented Oct 31, 2019

Coverage Status

Coverage remained the same at 93.948% when pulling f3180e0 on lh/type-metadata into aa92f23 on master.

Copy link
Contributor

@pmarks pmarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down with the first commit as a practical near-term solution. It's annoying that they're not stabilizing the string from type_name(), but I bet it won't change often. If it does we could implement a smarter comparison to preserve backward compatibility. I really don't want to force a new trait impl on users - I don't think the payoff is big enough.

src/lib.rs Outdated Show resolved Hide resolved
@sjackman
Copy link

We probably don't want to do this, as type_name is explicitly not guaranteed to be stable across rustc versions

All executables of the Cell Ranger pipeline are expected to be compiled with the same version of Rust, so I wouldn't expect this to be a problem.

@sreenathkrishnan
Copy link
Collaborator

All executables of the Cell Ranger pipeline are expected to be compiled with the same version of Rust, so I wouldn't expect this to be a problem.

Yes, within a pipeline run it should be okay. However, we could have a unit test that consumes a shard file and that test could break when we upgrade the toolchain.

@sjackman
Copy link

All executables of the Cell Ranger pipeline are expected to be compiled with the same version of Rust, so I wouldn't expect this to be a problem

Yes, within a pipeline run it should be okay. However, we could have a unit test that consumes a shard file and that test could break when we upgrade the toolchain.

Probably best to update any shard artifacts between releases. That makes ensuring the results are identical between releases more challenging though.

@nlhepler
Copy link
Contributor Author

There you go, @sreenathkrishnan

@nlhepler nlhepler changed the title [WIP] type metadata embed type metadata Oct 31, 2019
@sjackman
Copy link

sjackman commented Nov 1, 2019

I'm a big fan of more sanity checking in the deserializer. Thanks, Lance! To test it, I tried changing the sort order of only one of the matrix writer and reader (without changing the other) to induce an error.

When I change the writer sort order from BarcodeThenFeatureOrder to FeatureThenBarcodeOrder I get the error message:

Error: Io(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" })

When I change the reader sort order from BarcodeThenFeatureOrder to FeatureThenBarcodeOrder I get the error message:

memory allocation of 4846226421456044032 bytes failed

Change the writer sort order from BarcodeThenFeatureOrder to FeatureThenBarcodeOrder.

--- a/lib/rust/cr_lib/src/align_and_count.rs
+++ b/lib/rust/cr_lib/src/align_and_count.rs
@@ -256,7 +256,7 @@ impl MartianStage for AlignAndCountStage {
         let bam_header = rover.make_path("bam_header");
 
         // Count data ordered by barcode.
-        let mut bc_counts: ShardWriter<FeatureBarcodeCount, crate::BarcodeThenFeatureOrder> =
+        let mut bc_counts: ShardWriter<FeatureBarcodeCount, crate::FeatureThenBarcodeOrder> =
             ShardWriter::new(
                 &counts_bc_order_shard,
                 SEND_BUFFER_SZ,

Change the reader sort order from BarcodeThenFeatureOrder to FeatureThenBarcodeOrder.

--- a/lib/rust/cr_lib/src/count_matrix.rs
+++ b/lib/rust/cr_lib/src/count_matrix.rs
@@ -127,7 +127,7 @@ pub fn write_matrix_market(
     feature_ref: &FeatureReference,
     barcode_whitelist: &str,
 ) -> Result<(), Error> {
-    let reader: ShardReader<FeatureBarcodeCount, crate::BarcodeThenFeatureOrder> =
+    let reader: ShardReader<FeatureBarcodeCount, crate::FeatureThenBarcodeOrder> =
         ShardReader::open_set(&chunks)?;
     let barcode_to_index: FxHashMap<fastq_10x::sseq::SSeq, u32> =
         read_barcode_whitelist(barcode_whitelist)?;

I modified the top-level Cargo.toml to include this PR and checked the Cargo.lock to verify that it was effective:

--- a/lib/rust/Cargo.toml
+++ b/lib/rust/Cargo.toml
@@ -23,3 +23,5 @@ members = [
 [profile.release]
 debug = true
 
+[patch.crates-io]
+shardio = { git = "https://github.com/10XGenomics/rust-shardio", rev = "bdb2fdcae99895b108de1735eb04367367cb7cd4" }

@nlhepler
Copy link
Contributor Author

nlhepler commented Nov 1, 2019

@sjackman Could you try with the latest f3180e0 ?

@pmarks pmarks merged commit f3180e0 into master Jan 30, 2020
@sjackman sjackman deleted the lh/type-metadata branch January 30, 2020 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants