-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation for RNA #6
Conversation
satriobio
commented
Dec 20, 2023
- add working implementation of rnar9 with correct orientation (3'-5')
- add noise calculation (normal distribution)
- change config pore_type to model_type and making simulation.rs generic
- change kmer model format to accomodate three column model (kmer, lvl_mean, and lvl_stdv)
- renaming model for clarity
- add transcript toml config, fasta sequence, and its weight
- add config for sampling rate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.
Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.
There's some other changes I suspect will break Barcoding.
I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.
And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.
src/impl_services/data.rs
Outdated
@@ -909,37 +949,52 @@ fn process_samples_from_config( | |||
// only a path to a single file has been passed | |||
} else { | |||
debug!("{:#?}", sample); | |||
if sample.input_genome.is_pod5() | sample.input_genome.is_pod5() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What data are we stroring as pod5?
src/main.rs
Outdated
R10, | ||
/// R9 pore | ||
R9, | ||
pub enum Model { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we should split this:into a model and pore enum
pub enum PoreType {
R10,
R9
}
pub enum Model {
DNA,
RNA
}
src/main.rs
Outdated
@@ -78,7 +81,8 @@ struct Config { | |||
random_seed: Option<u64>, | |||
target_yield: f64, | |||
working_pore_percent: Option<usize>, | |||
pore_type: Option<String>, | |||
simulation_type: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Comments for new fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact simulation_type is never used?
src/main.rs
Outdated
@@ -179,6 +184,7 @@ struct Parameters { | |||
device_id: String, | |||
position: String, | |||
break_read_ms: Option<u64>, | |||
sampling: Option<u64> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be samples_per_base, sampling is an unclear name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope - this should be sample_rate
- this is confusing wiith the samples_per_base
in data
src/simulation.rs
Outdated
None | ||
// if return None will cause panic. | ||
// Example very short siRNA without line breaks in genecode transcript. | ||
Some(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just always return buf rather than branching - I see that this would be necessary on a perfect sequence
src/impl_services/data.rs
Outdated
PoreType::R10 => "_R10", | ||
PoreType::R9 => "", | ||
let r10_suffix = match model_type { | ||
Model::DNAR10 => "_DNAR10", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test this? I suspect it breaks the barcodes for R9
src/impl_services/data.rs
Outdated
@@ -768,6 +771,9 @@ fn stop_sending_read( | |||
pub trait FileExtension { | |||
fn has_extension<S: AsRef<str>>(&self, extensions: &[S]) -> bool; | |||
fn is_fasta(&self) -> bool; | |||
fn is_npy(&self) -> bool; | |||
fn is_pod5(&self) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an is Pod5 function?
src/impl_services/data.rs
Outdated
}; | ||
|
||
// Select Model to Simulate | ||
let model = match config.check_model_type() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good solution
src/impl_services/data.rs
Outdated
simulation::parse_kmers(&kmer_string).expect("Failed to parse R10 kmers"); | ||
Some(kmer_hashmap) | ||
} | ||
Model::RNAR9 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do the different parsing here for the kmer models R9 vs. R10
); | ||
} | ||
else { | ||
debug!("Sorry unsupported format!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than debug!, warn! and exit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.
Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.
There's some other changes I suspect will break Barcoding.
I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.
And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.
Rather than model::<nucleotide><poretype>
Expectes std_level column for 5mer
Add laplace noise for r10