Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation for RNA #6

Merged
merged 22 commits into from
Dec 22, 2023
Merged

implementation for RNA #6

merged 22 commits into from
Dec 22, 2023

Conversation

satriobio
Copy link
Contributor

  • add working implementation of rnar9 with correct orientation (3'-5')
  • add noise calculation (normal distribution)
  • change config pore_type to model_type and making simulation.rs generic
  • change kmer model format to accomodate three column model (kmer, lvl_mean, and lvl_stdv)
  • renaming model for clarity
  • add transcript toml config, fasta sequence, and its weight
  • add config for sampling rate

Copy link
Contributor

@Adoni5 Adoni5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.

Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.

There's some other changes I suspect will break Barcoding.

I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.

And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.

@@ -909,37 +949,52 @@ fn process_samples_from_config(
// only a path to a single file has been passed
} else {
debug!("{:#?}", sample);
if sample.input_genome.is_pod5() | sample.input_genome.is_pod5() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data are we stroring as pod5?

src/main.rs Outdated
R10,
/// R9 pore
R9,
pub enum Model {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should split this:into a model and pore enum

pub enum PoreType {
    R10,
    R9
}
pub enum Model {
    DNA,
    RNA
}

src/main.rs Outdated
@@ -78,7 +81,8 @@ struct Config {
random_seed: Option<u64>,
target_yield: f64,
working_pore_percent: Option<usize>,
pore_type: Option<String>,
simulation_type: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Comments for new fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact simulation_type is never used?

src/main.rs Outdated
@@ -179,6 +184,7 @@ struct Parameters {
device_id: String,
position: String,
break_read_ms: Option<u64>,
sampling: Option<u64>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be samples_per_base, sampling is an unclear name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope - this should be sample_rate - this is confusing wiith the samples_per_base in data

None
// if return None will cause panic.
// Example very short siRNA without line breaks in genecode transcript.
Some(buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just always return buf rather than branching - I see that this would be necessary on a perfect sequence

PoreType::R10 => "_R10",
PoreType::R9 => "",
let r10_suffix = match model_type {
Model::DNAR10 => "_DNAR10",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this? I suspect it breaks the barcodes for R9

@@ -768,6 +771,9 @@ fn stop_sending_read(
pub trait FileExtension {
fn has_extension<S: AsRef<str>>(&self, extensions: &[S]) -> bool;
fn is_fasta(&self) -> bool;
fn is_npy(&self) -> bool;
fn is_pod5(&self) -> bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an is Pod5 function?

};

// Select Model to Simulate
let model = match config.check_model_type() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good solution

simulation::parse_kmers(&kmer_string).expect("Failed to parse R10 kmers");
Some(kmer_hashmap)
}
Model::RNAR9 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the different parsing here for the kmer models R9 vs. R10

);
}
else {
debug!("Sorry unsupported format!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than debug!, warn! and exit

Copy link
Contributor

@Adoni5 Adoni5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In essence this is correct. My main issue is with adding a new column to the R10 model so we are compatible with RNA - this adds 33% extra to the file and requires manual editing. I;ve suggested a solution.

Not sure about the addition of taking Pod5 files in, as they will crash when we try to read them.

There's some other changes I suspect will break Barcoding.

I would like to make the models more flexible, so have two enums for R9/R10 and RNA/DNA.

And finally i would like to always add Noise, laplace for DNA and I guess Normal for RNA.

@Adoni5 Adoni5 merged commit 6d89391 into LooseLab:main Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants