Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coefficient of variation filtering for transitions #35

Closed
yaaminiv opened this issue Oct 27, 2017 · 17 comments
Closed

Coefficient of variation filtering for transitions #35

yaaminiv opened this issue Oct 27, 2017 · 17 comments
Assignees
Labels

Comments

@yaaminiv
Copy link
Collaborator

For each pair of technical replicates, I removed any transitions that had a coefficient of variation greater than a certain thresh.old (from Steven's suggestion in #18). I used three different thresholds: CV>20, CV>15, CV>10. I reexamined my technical replication, my NMDS for samples/eelgrass, and my ANOSIMs. I also made box plots for each transition, using just sites and both sites and eelgrass conditions. All of my notebooks can be found here:

CV > 20 filtering results
CV > 15 and CV > 10 filtering results
Boxplots

As I mention in my box plot notebook entry, my biggest concern now is the fact that we selected individual transition data to remove for each technical replicate, as opposed to removing a transition completely from the analysis. So some transitions have a much larger dataset than others. How can I account for these irregularities moving forward? I'm assuming my next step is to also examine the boxplots and quantify how similar/different my median/mean values are (i.e. t-tests or ANOVAs).

Thoughts/suggestions?

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@yaaminiv
Copy link
Collaborator Author

yaaminiv commented Oct 27, 2017

For Heat Shock Protein, just sites (WITH TITLES).

CV >20 Filtering:

choyp_hs12a 25 33 m 60352 apttlllepdgk y4
choyp_hs12a 25 33 m 60352 apttlllepdgk y6
choyp_hs12a 25 33 m 60352 apttlllepdgk y7
choyp_hs12a 25 33 m 60352 giaeaisssk y4
choyp_hs12a 25 33 m 60352 giaeaisssk y5
choyp_hs12a 25 33 m 60352 giaeaisssk y6

CV >15 Filtering:

choyp_hs12a 25 33 m 60352 apttlllepdgk y4
choyp_hs12a 25 33 m 60352 apttlllepdgk y6
choyp_hs12a 25 33 m 60352 apttlllepdgk y7
choyp_hs12a 25 33 m 60352 giaeaisssk y4
choyp_hs12a 25 33 m 60352 giaeaisssk y5
choyp_hs12a 25 33 m 60352 giaeaisssk y6

CV >10 Filtering:

choyp_hs12a 25 33 m 60352 apttlllepdgk y4
choyp_hs12a 25 33 m 60352 apttlllepdgk y6
choyp_hs12a 25 33 m 60352 apttlllepdgk y7
choyp_hs12a 25 33 m 60352 giaeaisssk y4
choyp_hs12a 25 33 m 60352 giaeaisssk y5
choyp_hs12a 25 33 m 60352 giaeaisssk y6

@yaaminiv
Copy link
Collaborator Author

yaaminiv commented Oct 27, 2017

ALL FIXED

Hang on, just realized that none of my box plots have titles. Give me a few minutes and I can paste the same graphs but with titles that indicate what protein/peptide/transition you're looking at

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@yaaminiv
Copy link
Collaborator Author

only 1

@sr320
Copy link
Member

sr320 commented Oct 27, 2017

Are you using zeros for values when you have no data?

@yaaminiv
Copy link
Collaborator Author

yes

@emmats
Copy link

emmats commented Oct 27, 2017

I'm still concerned about the replication. If the exact same sample is injected in the same mass spectrometer two different times, the results (especially for SRM!) should look the same. With the kind of variance you have, I worry that 1) sample names/locations got mixed up, 2) there is something very wrong with the list of transitions and they are completely unreliable, or 3) the column switch resulted in poor replication. I believe you have ruled out #3 and #1. Right now you are trying to get to the bottom of #2. If it really is a problem with the oyster transitions themselves, then I would expect excellent replication among the PRTC peptides despite poor replication among the oyster peptides. We do not see that. That makes me think there is some kind of technical issue that we have not figured out.
I'm not sure how conclusions can be drawn from the data if the replication is so poor that we don't know the "true" proteomic profile for any one sample. Is it out of the question to re-run these and do technical triplicates?

@emmats
Copy link

emmats commented Oct 27, 2017

Shoot, the #s should not have been linked to previous issues. That was a mistake!

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@emmats
Copy link

emmats commented Oct 27, 2017

I can join by Hangout assuming my energy stays up. I had surgery on Tuesday but I'm feeling OK today.

@yaaminiv
Copy link
Collaborator Author

@sr320 the yes was to your comment. I had to replace NAs with zeros for my NMDS, so let me look at my dataset with NAs and get back to you?

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@yaaminiv
Copy link
Collaborator Author

I actually can't make an NMDS with NA values, so I'm unsure how to correct that?

@sr320
Copy link
Member

sr320 commented Oct 27, 2017 via email

@sr320 sr320 closed this as completed Oct 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants