Skip to content
This repository has been archived by the owner on Mar 17, 2023. It is now read-only.

change translation frame type description #58

Merged
merged 5 commits into from
Apr 24, 2020
Merged

Conversation

pranathivemuri
Copy link
Contributor

No description provided.

@pranathivemuri
Copy link
Contributor Author

addressing the issue - #59

@snafees
Copy link

snafees commented Apr 24, 2020

I understand this is the json output that is desired classification_value_counts': { 'All translations shorter than peptide k-mer size + 1': 1, 'All translation frames have stop codons': 3, 'Coding': 5, 'Non-coding': 11, 'Low complexity nucleotide': 0, 'Read length was shorter than 3 * peptide k-mer size': 2, 'Low complexity peptide in dayhoff6 alphabet': 1}, but, are we trying to ultimately tell the user, e.g., "all translations shorter than peptide k-mer size + 1 = 1" and "all translation frames that have stop codons = 3" and "num of coding reads=5", etc. ? If so, maybe the output of the json file could be written slightly differently so it is easier to make sense of it to a new user.
Maybe that is the goal of another PR at another time.. but just wanted to check!

@pranathivemuri
Copy link
Contributor Author

= is not a valid json character

image

@pranathivemuri
Copy link
Contributor Author

image

@pranathivemuri
Copy link
Contributor Author

also dictionaries are universally always written as key: value

@snafees
Copy link

snafees commented Apr 24, 2020

right, I recall that now. I guess my issue is not so much with : vs. +. It has more to do with our phrasing. But that is minor! No big deal rn.

Copy link
Contributor

@olgabot olgabot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Had a few comments/suggestions

counts[
'Read length was shorter than 3 * peptide k-mer size'] += 1
elif len(unique_categories) == 1:
counts[unique_categories[0]] += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the index 0 work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is only unique category writing incrementing for that

read_id_category = coding_scores.filter(["read_id", "category"])
read_ids = coding_scores.read_id.unique()

for read_id in read_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I'd use pandas groupby to do:

for read_id, df in read_id_category.groupby('read_id'):
    categories_for_read_id = df

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, changed it

@@ -95,29 +95,29 @@ def test_get_n_per_coding_classification(
peptide_bloom_filter_path,
alphabet, peptide_ksize, jaccard_threshold)
data = [
['read1', 'All translations shorter than peptide k-mer size + 1'],
['read2', 'All translation frames have stop codons'],
['read1', 'Translation is shorter than peptide k-mer size + 1'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data should get updated to have multiple results per read, so that we can test that e.g. if any result is coding, it is called coding

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants