Skip to content

improve efficiency seqmetrics instability index #52

Description

@LCrossman

Is your feature request related to a problem? Please describe.

function for Instability index in seqmetrics src/metrics.rs - likely is inefficient

This is a function to calculate how unstable a protein is predicted to be when purified and in a test tube. Unstable proteins are less likely to be useful for downstream applications such as use of enzymes in biotechnology.

It is calculated according to Guruprasad K, Reddy BV, Pandit MW (1990). "Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence". Protein Eng. 4 (2): 155–61, where they take the weighted sum of dipeptide occurrences that are more frequently found in unstable proteins compared to stable ones.

The weight calculation is in the load_instability function just above the instability index function in seqmetrics. The values used for the load_instability function are in the file dipeptide_stability_values.csv in the seqmetrics crate folder and query file is K12_ribo.gbk in seqmetrics crate folder.

    for window in chars.windows(2) {
        let pair = format!("{}{}", window[0], window[1]);
        if let Some(val) = weights.get(&pair) {
            total += val;
        }

causes String allocation of every pair across all the proteins (could be thousands), should be more efficient to use tuple or array instead of String

Describe the solution you'd like
we should be able to improve efficiency without using String here

Additional context
test is present in seqmetrics crate

    use tokio::io::BufReader;
    #[cfg(test)]
    #[allow(dead_code)]
    #[allow(unused_mut)]
    #[allow(unused_variables)]
    #[allow(unused_assignments)]
    #[tokio::test]
    pub async fn instability_test() -> Result<(), anyhow::Error> {
        let file_gbk = File::open("K12_ribo.gbk")?;
        let reader = Reader::new(file_gbk);
        let mut records = reader.records();
        let weights = load_instability("dipeptide_stability_values.csv").await?;
        loop {
            match records.next() {
                Some(Ok(record)) => {
                    for (k, _v) in &record.cds.attributes {
                        match record.seq_features.get_sequence_faa(&k) {
                            Some(value) => {
                                let seq_faa = value.to_string();
                                let result = instability_index(seq_faa, &weights).await;
                                println!(
                                    "instability index for {} {} is {}",
                                    &record.id, &k, &result
                                );
                            }
                            _ => (),
                        };
                    }
                }
                Some(Err(e)) => {
                    println!("theres an error {:?}", e);
                }
                None => {
                    println!("finished iteration");
                    break;
                }
            }
        }
        return Ok(());
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions