191 validate cross links based on armstrong error tolerance#212
Conversation
…sslinkers in the uploaded data
…trong-error-tolerance
…ion is working after merge
…laceholder function
…ere checked with validation method
Elena-kal
left a comment
There was a problem hiding this comment.
Seems fine overall. The test as you described worked. Although I think because you fetch the alphafold data in your code yourself one does not need to add the step to upload it. I think in future this should not be included in the code but I am fine with leaving it as it is for now, if you open an issue for it as soon as this is merged.
| # right now we always return the central C atom | ||
| # later we might want to return the reactive atom of the amino acid residue of the specific amino acid kind | ||
| # as soon as we change this, we will need to change the test test_validate_with_angstrom_deviation | ||
| return "CA" |
There was a problem hiding this comment.
I was just wondering: How do we distinguish between the same kinds of atoms at different positions? Say we have another C atom, how do we know which one is the central?
There was a problem hiding this comment.
For each amino acid there is exactly one atom called "CA" (for C alpha). This is the central C atom. The other C atoms are named differently.
There was a problem hiding this comment.
Okay I see, what about other atoms like Hydrogen? There are probably several of them in one amino acid right? Do they also have definite naming? No need to change anything here but I 'd be interested to know. Maybe we can talk about it on Tuesday.
There was a problem hiding this comment.
I'm unsure, I would talk to Chris about this on Tuesday.
jorisfu
left a comment
There was a problem hiding this comment.
Just a little thing but looks very good
… for distance calculation
Description
fixes #191

There is a new step where the intra-crosslinks of one protein are being validated (-> the user can enter an allowed deviation from the crosslinker's length. If the distance of the connected amino acids in alphafold is greater than crosslinker length + allowed deviation, the crosslinker will be marked as invalid. The user can not only apply an upper bound but also a lower bound to the allowed deviation.)
Right now, the distance calculation is based on the central Ca atom; we might want to change this to the residue atom where the crosslinker actually binds later on.
The output is a simple bar plot and we add two colums to the crosslinking df (distance in alphafold and true or false on whether the crosslink matches the alphafold data or not).
Changes
All the functions for the actual validation are in backend/protzilla/data_analysis/cross_linking_validation.py
The form is defined at the end of backend/protzilla/methods/data_analysis.py
Simple tests are in backend/tests/protzilla/data_analysis/test_crosslinking_validation.py
Testing
Check that in the crosslinking validation step three fields show up for each crosslinker that appears in the crosslinking data (for p26splussubstrate_XL.xlsx only DSSO should show up).
PR checklist
Development
Mergeability
blackpnpm formatand checked withpnpm lintCode review