Skip to content

191 validate cross links based on armstrong error tolerance#212

Merged
AnnaPolensky merged 22 commits into
crosslinkingfrom
191-validate-cross-links-based-on-armstrong-error-tolerance
Feb 2, 2026
Merged

191 validate cross links based on armstrong error tolerance#212
AnnaPolensky merged 22 commits into
crosslinkingfrom
191-validate-cross-links-based-on-armstrong-error-tolerance

Conversation

@AnnaPolensky
Copy link
Copy Markdown
Collaborator

@AnnaPolensky AnnaPolensky commented Jan 29, 2026

Description

fixes #191
There is a new step where the intra-crosslinks of one protein are being validated (-> the user can enter an allowed deviation from the crosslinker's length. If the distance of the connected amino acids in alphafold is greater than crosslinker length + allowed deviation, the crosslinker will be marked as invalid. The user can not only apply an upper bound but also a lower bound to the allowed deviation.)
Right now, the distance calculation is based on the central Ca atom; we might want to change this to the residue atom where the crosslinker actually binds later on.
The output is a simple bar plot and we add two colums to the crosslinking df (distance in alphafold and true or false on whether the crosslink matches the alphafold data or not).
grafik

Changes

All the functions for the actual validation are in backend/protzilla/data_analysis/cross_linking_validation.py
The form is defined at the end of backend/protzilla/methods/data_analysis.py
Simple tests are in backend/tests/protzilla/data_analysis/test_crosslinking_validation.py

Testing

  1. Add Crosslinking Data Import and import p26splussubstrate_XL.xlsx
  2. Add Alphafold Load Step and load O43242
  3. Add Crosslinking Validation (this is a data analysis step), select O43242, DSSO should have a length of 10.2. Test different values for the bounds. E.g. if we only apply an upper bound of 1, only one crosslink should match the Alphafold data, for 2, 2 crosslinks should match.

Check that in the crosslinking validation step three fields show up for each crosslinker that appears in the crosslinking data (for p26splussubstrate_XL.xlsx only DSSO should show up).

PR checklist

Development

  • If necessary, I have updated the documentation (README, docstrings, etc.)
  • If necessary, I have created / updated tests.

Mergeability

  • main-branch has been merged into local branch to resolve conflicts
  • The tests and linter have passed AFTER local merge
  • The backend code has been formatted with black
  • The frontend code has been formatted with pnpm format and checked with pnpm lint

Code review

  • I have self-reviewed my code.
  • At least one other developer reviewed and approved the changes

@AnnaPolensky AnnaPolensky self-assigned this Jan 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 29, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  backend/main
  views.py 48-59
  backend/protzilla
  form.py
  networking.py 27-32
  runner.py
  stepfactory.py
  steps.py
  backend/protzilla/data_analysis
  crosslinking_validation.py 55, 120, 210-211, 262-271
  differential_expression_anova.py
  differential_expression_kruskal_wallis.py
  differential_expression_linear_model.py
  differential_expression_mann_whitney.py
  differential_expression_t_test.py
  backend/protzilla/data_integration
  di_plots.py
  backend/protzilla/importing
  alphafold_protein_structure_load.py 68, 80, 124-125, 133-137, 141-142, 161-162, 173-176, 180, 213-214, 219, 223
  crosslinking_import.py 105-123, 160-163, 231-254, 277-287, 307-354, 375-421, 472, 474, 477-479, 567-602, 665, 683-691
  import_utils.py
  ms_data_import.py
  backend/protzilla/methods
  data_analysis.py 2478, 2521-2538
  data_integration.py
  importing.py 427, 453
  backend/protzilla/utilities
  utilities.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Copy Markdown
Collaborator

@Elena-kal Elena-kal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine overall. The test as you described worked. Although I think because you fetch the alphafold data in your code yourself one does not need to add the step to upload it. I think in future this should not be included in the code but I am fine with leaving it as it is for now, if you open an issue for it as soon as this is merged.

Comment thread backend/protzilla/methods/data_analysis.py Outdated
Comment thread backend/protzilla/methods/data_analysis.py Outdated
Comment thread backend/protzilla/methods/data_analysis.py Outdated
Comment thread backend/protzilla/data_analysis/crosslinking_validation.py
# right now we always return the central C atom
# later we might want to return the reactive atom of the amino acid residue of the specific amino acid kind
# as soon as we change this, we will need to change the test test_validate_with_angstrom_deviation
return "CA"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering: How do we distinguish between the same kinds of atoms at different positions? Say we have another C atom, how do we know which one is the central?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each amino acid there is exactly one atom called "CA" (for C alpha). This is the central C atom. The other C atoms are named differently.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see, what about other atoms like Hydrogen? There are probably several of them in one amino acid right? Do they also have definite naming? No need to change anything here but I 'd be interested to know. Maybe we can talk about it on Tuesday.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure, I would talk to Chris about this on Tuesday.

Comment thread backend/protzilla/data_analysis/cross_linking_validation.py Outdated
Comment thread backend/protzilla/data_analysis/cross_linking_validation.py Outdated
Comment thread backend/protzilla/data_analysis/cross_linking_validation.py Outdated
Comment thread backend/protzilla/data_analysis/cross_linking_validation.py Outdated
Comment thread backend/protzilla/data_analysis/cross_linking_validation.py Outdated
Copy link
Copy Markdown
Collaborator

@jorisfu jorisfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little thing but looks very good

Comment thread backend/protzilla/data_analysis/crosslinking_validation.py Outdated
Comment thread backend/protzilla/data_analysis/crosslinking_validation.py Outdated
@AnnaPolensky AnnaPolensky merged commit e43e3a0 into crosslinking Feb 2, 2026
2 checks passed
@AnnaPolensky AnnaPolensky deleted the 191-validate-cross-links-based-on-armstrong-error-tolerance branch February 2, 2026 14:36
@Elena-kal Elena-kal mentioned this pull request Apr 21, 2026
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants