implementation for RNA #6

satriobio · 2023-12-20T15:48:32Z

add working implementation of rnar9 with correct orientation (3'-5')
add noise calculation (normal distribution)
change config pore_type to model_type and making simulation.rs generic
change kmer model format to accomodate three column model (kmer, lvl_mean, and lvl_stdv)
renaming model for clarity
add transcript toml config, fasta sequence, and its weight
add config for sampling rate

Adoni5

In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.

Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.

There's some other changes I suspect will break Barcoding.

I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.

And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.

Adoni5 · 2023-12-20T21:37:49Z

src/impl_services/data.rs

@@ -909,37 +949,52 @@ fn process_samples_from_config(
        // only a path to a single file has been passed
        } else {
            debug!("{:#?}", sample);
+            if sample.input_genome.is_pod5() | sample.input_genome.is_pod5() {


What data are we stroring as pod5?

Adoni5 · 2023-12-20T21:56:04Z

src/main.rs

-    R10,
-    /// R9 pore
-    R9,
+pub enum Model {


Actually we should split this:into a model and pore enum

pub enum PoreType { R10, R9 } pub enum Model { DNA, RNA }

Adoni5 · 2023-12-20T21:56:30Z

src/main.rs

@@ -78,7 +81,8 @@ struct Config {
    random_seed: Option<u64>,
    target_yield: f64,
    working_pore_percent: Option<usize>,
-    pore_type: Option<String>,
+    simulation_type: Option<String>,


Add Comments for new fields

In fact simulation_type is never used?

Adoni5 · 2023-12-20T21:59:06Z

src/main.rs

@@ -179,6 +184,7 @@ struct Parameters {
    device_id: String,
    position: String,
    break_read_ms: Option<u64>,
+    sampling: Option<u64>


this should be samples_per_base, sampling is an unclear name

nope - this should be sample_rate - this is confusing wiith the samples_per_base in data

Adoni5 · 2023-12-20T22:01:21Z

src/simulation.rs

-        None
+        // if return None will cause panic.
+        // Example very short siRNA without line breaks in genecode transcript.
+        Some(buf)


just always return buf rather than branching - I see that this would be necessary on a perfect sequence

Adoni5 · 2023-12-20T22:10:19Z

src/impl_services/data.rs

-        PoreType::R10 => "_R10",
-        PoreType::R9 => "",
+    let r10_suffix = match model_type {
+        Model::DNAR10 => "_DNAR10",


Did you test this? I suspect it breaks the barcodes for R9

Adoni5 · 2023-12-20T22:10:49Z

src/impl_services/data.rs

@@ -768,6 +771,9 @@ fn stop_sending_read(
 pub trait FileExtension {
    fn has_extension<S: AsRef<str>>(&self, extensions: &[S]) -> bool;
    fn is_fasta(&self) -> bool;
+    fn is_npy(&self) -> bool;
+    fn is_pod5(&self) -> bool;


Why do we need an is Pod5 function?

Adoni5 · 2023-12-20T22:11:15Z

src/impl_services/data.rs

+    };    
+
+    // Select Model to Simulate
+    let model = match config.check_model_type() {


Good solution

Adoni5 · 2023-12-20T22:11:40Z

src/impl_services/data.rs

+                simulation::parse_kmers(&kmer_string).expect("Failed to parse R10 kmers");
+            Some(kmer_hashmap)
+        }
+        Model::RNAR9 => {


We can do the different parsing here for the kmer models R9 vs. R10

Adoni5 · 2023-12-20T22:12:38Z

src/impl_services/data.rs

                );
            }
+            else {
+                debug!("Sorry unsupported format!");


Rather than debug!, warn! and exit

Adoni5

In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.

Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.

There's some other changes I suspect will break Barcoding.

I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.

And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.

Rather than model::<nucleotide><poretype>

Expectes std_level column for 5mer

Add laplace noise for r10

satriobio and others added 9 commits November 17, 2023 15:06

RNA model implementation

633b5ee

flip sequence array for RNA simulation

264090a

create smaller test transcripts

8f65615

update RNA model, implement noise

2f24adb

fix rna signal simulation

1770662

Add sampling config

ad91edf

Model renaming, for clarity

bf735ae

Add transcript example

ce48f4f

Merge branch 'main' into dev

0db775d

Adoni5 requested changes Dec 20, 2023

View reviewed changes

Adoni5 reviewed Dec 20, 2023

View reviewed changes

Adoni5 added 13 commits December 21, 2023 11:59

Add R9 prefix to barcode squiggle arrays

b3a0d55

Add probability crate for laplace dist

e7e04ec

Swicth to Nucloeitde and PoreType enums

538ab90

Rather than model::<nucleotide><poretype>

Parse kmers dependent on 5mer or 9mer

89a520c

Expectes std_level column for 5mer

Switch to Pore Type and Nucleotide type enums

9f69a2f

Add laplace noise for r10

CORRECT R10 model

f3d3547

Cargo clippy suggestions

4fbb444

Add new fields to examples

6747cb6

Read correct kmers

5d5c5b4

Fix example config tomls

4229f0c

update examples in README.md

3077362

Fix docker things

4e969e5

Fix typos

a4c6084

Adoni5 approved these changes Dec 22, 2023

View reviewed changes

Adoni5 merged commit 6d89391 into LooseLab:main Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation for RNA #6

implementation for RNA #6

satriobio commented Dec 20, 2023

Adoni5 left a comment

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 Dec 20, 2023

Adoni5 left a comment

implementation for RNA #6

implementation for RNA #6

Conversation

satriobio commented Dec 20, 2023

Adoni5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Adoni5 left a comment

Choose a reason for hiding this comment