Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with multiple seqIds (aptamers) per protein (uniprotID)? #125

Closed
adder opened this issue May 26, 2024 · 2 comments
Closed

How to deal with multiple seqIds (aptamers) per protein (uniprotID)? #125

adder opened this issue May 26, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@adder
Copy link

adder commented May 26, 2024

Hey,

Maybe this is a more general question about the somalogic platform but I encountered it while analyzing the data with somadataIO.
I noticed that some "seqId" correspond to multiple uniprot ids. So this would mean that 1 protein can be represented by multiple aptamers (at least form some proteins?)
If this is indeed the case, where could I find more information on this?
Why are there proteins with multiple aptamers?
Is it wise to combine these values so I have 1 measurement per protein (eg. by averaging the RFUs per protein form the same sample)?
Or am I missing something?

Thanks for this nice package!

@adder adder added the question Further information is requested label May 26, 2024
@wschwarzmann
Copy link

Hello @adder , I am from the SomaLogic Global Scientific Engagement team and can answer your questions! Each SeqId is a unique numeric identifier that has a 1:1 relationship with each SOMAmer. Protein Target names and UniProt ID's are also included in SomaScan .adat files to better identify each protein. As SOMAmers can be designed for specific proteoforms and regions of proteins, UniProt ID does not always properly distinguish the protein target. Target Full Name in the .adat is a better distinguisher when the SOMAmer is targeting a known proteoform. In the event where the Target Full Name is also duplicated, the amino acid region and RefSeq ID for the protein construct can be found at https://menu.somalogic.com after logging in, by clicking on an individual protein or by clicking the "Export" button to get the full list as an Excel file.
Since the SOMAmers with duplicated UniProt ID's can represent different features of the protein, collapsing the values is not the best approach. If both SOMAmers are representing the total concentration of the protein under the study conditions, then this would not be entirely detrimental, but if they are measuring different isoforms or regions of the protein, then collapsing them may lose some significant results. We recommend treating each analyte as an independent measurement. If you have any more questions, feel free to email techsupport@somalogic.com with additional inquiries. The Global Scientific Engagement team is available to respond via email or video call.
Thank you for your questions and using our R package!

@adder
Copy link
Author

adder commented May 30, 2024

Thanks for your detailed answer!

@amanda-hi amanda-hi pinned this issue Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants