Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added sequence coverage map to gui results page #320

Merged
merged 8 commits into from
Sep 13, 2021

Conversation

mgleeming
Copy link
Contributor

Hi alphapept team,

Congratulations on a great project! I've been using alphapept over the last few days and I think this project has the rare combination of ease-of-use through the GUI and flexibility through well-documented scripts.

This could be a very useful project for the MS community and I'd like to support it if I can!

I saw your note welcoming contributions. I will admit that, despite some years as a python programmer, this is my first time contributing to the work of others. So, as a 'hello, world' offering, I added a simple sequence coverage map to the results page of the GUI.

Mostly, I wrote this to wrap my head around some of the code but maybe someone will find it useful. Please do feel free to use it or not as you like. If you've got comments on style or anything else, please let me know.

As extensions of this, I'm thinking it wouldn't be too difficult to add PSM, MS1 precursor and chromatogram viewers into the gui as well. Not sure what direction you want to take the project but I'd be happy to work on these among other things if there's interest.

Thanks again and all the best!
Michael

AP

Copy link
Member

@straussmaximilian straussmaximilian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Michael,

This is really awesome! I like your contribution a lot, and please more of this 👍👍👍. I made some comments on the code. My main suggestion would be if you could move the coverage calculations to the notebook and write a test for it? Then we would have the functional code within the continuous integration. If there are problems with the nbdev directly writing to the fasta.py should also be fine.

In terms of additional functions, we recently discovered the discussions function in Github https://github.com/MannLabs/alphapept/discussions and thought that this could be excellent to track what is interesting and what the community wants / and who is working on what.. At this point, we do prioritize on the algorithmic side, so any GUI integrations are very welcome and feel free to add what you find useful. (But also feel free to look at the algorithmic stuff in case.)

From the checks, there seems to be a bug with the CLA assistant, which I am going to fix in the meantime.

Best,

Max


with st.beta_expander("Sequence coverage map"):

protein_id = st.text_input(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use st.selectbox with options being all unique proteins, you will get a searchable text field that is filled with all measured proteins. This could help with the lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - I'll do that!

for index, row in target_protein_peptide_matches.iterrows():
peptide = row['naked_sequence']

if peptide == '': # Found cases where naked_sequence was blank for all - Bug?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this is potentially a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this when matching was activated. Without MBR, the naked sequences were correctly populated. I haven't followed it any further than that yet though. I'll confirm this with a separate dataset and open a new issue ticket if the result stands...


formatted_sequence = ''
counter = 0
for residue in residues:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful formatting, looks really nice in streamlit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

matches = [m.start() for m in re.finditer('(?=%s)' %peptide, target_sequence)]
for m in matches:
for index in range(m, m+len(peptide)):
assert residues[index]['res'] == target_sequence[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of the assert statement here as this is not a test function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry, slipped through!

@mgleeming
Copy link
Contributor Author

Hi Max!

Thanks for your kind words and for the helpful comments! I'll make these changes tomorrow and update.

That sounds like a great plan! I'm happy to focus on the GUI and build that out some more. I'll start a thread on the discussions page you mentioned about it so that we don't duplicate efforts with anyone else.

Looking forward to working with you and the team!
Michael

PS: I'm quite a few timezones away in Australia (Melbourne) so we'll probably be trading night-time emails :)

@mgleeming
Copy link
Contributor Author

Hi Max,

I’ve incorporated your comments as above. To expand the GUI, I’d anticipate writing a bunch of miscellaneous functions to prepare data and, to avoid cluttering your existing files, I’ve added a new notebook ‘14_display.ipynb’ that can be a home for these. If you’re all focusing on the algorithmic side, this might help us to stay out of each other’s way.

Also, were you rebuilding the docs as separate commits? It looked that way from the ‘DOCUMENT’ tags on the commit messages. No problem – just wasn’t sure if I should do it or not. … I’ve left it for now.

All the best,
Michael

Copy link
Member

@straussmaximilian straussmaximilian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic.

@@ -265,6 +265,7 @@
"remove_mods": "13_export.ipynb",
"ap_to_mq_sequence": "13_export.ipynb",
"prepare_ap_results": "13_export.ipynb",
"calculate_sequence_coverage": "14_display.ipynb",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea with the notebook and glad that the nbdev worked out.


def calculate_sequence_coverage(target_sequence:str, peptide_list:list)->(int, int, float, list):
"""
Calculate the percentage of a target protein covered by a list of peptides.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking to the docstring convention.

@straussmaximilian straussmaximilian changed the base branch from master to develop September 10, 2021 22:22
@straussmaximilian straussmaximilian changed the base branch from develop to master September 10, 2021 22:32
@straussmaximilian straussmaximilian changed the base branch from master to develop September 10, 2021 22:32
@straussmaximilian
Copy link
Member

Hi,
This looks all very good, and I can see that you mastered the nbdev structure. Great!

I changed the base from master to develop as we typically run a complete performance check before merging to master. This will now go into v0.3.29 and we can all new things to the changelog.md.

Could you also update the branch to the latest develop? I hope this should fix the CLA assistant check. This was quite buggy now but should in principle ask for consent that we are allowed to merge your code contributions within the PR.

Some minor notes:

You can see that some commits do match your profile avatar; it seems that you committed w/o setting the config for some commits.

Screenshot 2021-09-11 at 00 17 20

You can see this by typing git shortlog -sne.
Screenshot 2021-09-11 at 00 18 27

You could try to change this on your branch if you want but is typically cumbersome (https://help.github.com/en/github/using-git/changing-author-info).

We had some discussions today and will probably populate the projects/discussions page more in the coming days to make it a bit more transparent on what is being worked on.

As for the docs: Right now, the practice is to rebuild them in develop before they are merged to master. It is possible to test them beforehand locally but this also requires some fiddling.

Best,

Max

P.S.: Night emails are great. 😊

@mgleeming
Copy link
Contributor Author

Thanks Max,

Yep! The nbdev worked great! I'd never used it before but it seems like a great way to make notebooks more useful for development work rather than just data exploration!

I've merged the latest develop branch so hopefully that fixes the CLA problem - we'll see what happens :) Thanks for pointing out the config avatar thing. I'm working from 2 different machines which were configured a bit differently. I'll try to update them to match.

Thanks for that - I'll leave rebuilding the docs out of my commits and also start up some discussion threads for some GUI enhancements as well.

Have a nice weekend
Michael

@straussmaximilian straussmaximilian merged commit 3b00a7c into MannLabs:develop Sep 13, 2021
@straussmaximilian
Copy link
Member

Hi Michael, the CLA assistant did not work for some reason.
As this is now causing more troubles than actual benefit at this point I will probably remove the action and decided to merge nonetheless.

Anyhow: Congratulations to the first contribution. I think this is awesome!

@mgleeming
Copy link
Contributor Author

Thanks Max! Happy to sign the CLA by some other means if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants