Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample size and including or excluding sources #379

Open
Mint204 opened this issue May 14, 2024 · 4 comments
Open

Sample size and including or excluding sources #379

Mint204 opened this issue May 14, 2024 · 4 comments

Comments

@Mint204
Copy link

Mint204 commented May 14, 2024

I had a question about sample size and when to include a source.

I have a model I'm building with about 40 animals hair cut up into 4 sections (seasons). So there's a random effect for individual and 1 fixed effect with 4 levels. I have 8 possible source groups for this one carnivore.

I'm worried about overparametizing the model, but from what I can see, it doesn't make sense to combine any of the sources any further. I know Bayesian models can handle smaller sample sizes than frequentist stats, but how do I know if I've asked too much of the model? Is there a good rule of thumb?

Also, at what point is it acceptable to leave a possible source out? I know the models expect to have "every" source, but what are the limits? For example, is it acceptable to leave a source out if it is only found a few odd times during necropsies, or if it is only found 1% of the time?

Sources online example
@naalipalo
Copy link

I am by no means an expert, but Ive been told you shouldn't have more than 5 prey items. Also, prey items that are "rare" or as you indicated <1% that you find wouldn't be helpful. At one point in the MixSIAR instructions, it actually says something about not showing prey items that make up <1%. I have been told a conservative thought is to not bother with anything <5% of the diet. You have to figure all the error associated with every step you've taken thus far to get to the analysis part. Are your C and N measured from prey items that were collected in the same year, same location as your consumer? Is the machine calibrated correctly to read the values, is there contamination at some point in your sampling? and very importantly, are your TDF values absolutely correct? This process/analysis is not precise. Again, Im not an expert, but I have been advised along these lines. whether or not you take the 5% or 1% is up to you, but Mixing models cannot handle more than 5 prey items. There are some formal-sih ways to assess your mixing space. Look at Smith et al. 2013 - To fit or not to fit Evaluating Stable Isotope mixing models using simulated mixing polygons. It defines your mixing space and then you censor out the consumers that do not fit. Or you can do a permanova between each pair to confirm that they are statisically distinct, or do a KNN where it can make groups for you. From your graph it doesn't look like source 8, 3, 4 are having a huge impact on your consumers. You also have a lot of overlap from several of your prey species. You will have a really hard time trying to distinguish between those prey items. a KNN would help you and likely tell you that you'd have to group pray 1,2,7 or something like that...You should look at some of the papers on best practices for setting up mixing models. There are at least 2 good ones out there.

@AndrewLJackson
Copy link
Collaborator

There are a lot of questions here but I will try to answer / direct you to answers. Most of these are addressed in our paper Best practices for use of stable isotope mixing models in food-web studies and I would strongly encourage you to read this carefully and follow the references for more information where required.

Omitting a source: this may have almost zero effect or it might have a large effect even if it's <5% of the diet. See Point 6 in that "Best practices" paper.

Combining sources: there is not settled advice here and even among the authors of MixSIAR we disagree sometimes. But in general, a priori aggregation of samples beyond what is sensible to the user is to be avoided in favour of a posteriori aggregation. See point 7. For some additional information, we of course choose to aggregate individual sources samples often into groups by species, but one could easily split a species into two groups for males/females, or based on location or season. Similarly, one could aggregate species into functional groups that made sense, e.g. "green algae" or "zooplankton". The choice is yours. Personally I would not recommend looking at clustering models / (per)manovas to guide a priori aggregation. Instead make the decision based on biological / ennvironmental reasoning and then perform a posteriori grouping that is inline with your hypotheses - see point 1 of "best practices" which often gets overlooked much to my frustration.

How many sources: You can fit as many sources as you like, but whether you will be able to make sense of it the output will depend your question, the geometry of the sources and which if any you choose to combine a posteriori. See also Statistical basis and outputs of stable isotope mixing models: Comment on Fry (2013)

How many source samples: see point 3 in "Best practices"

@AndrewLJackson
Copy link
Collaborator

I had a question about sample size and when to include a source.

I have a model I'm building with about 40 animals hair cut up into 4 sections (seasons). So there's a random effect for individual and 1 fixed effect with 4 levels. I have 8 possible source groups for this one carnivore.

I'm worried about overparametizing the model, but from what I can see, it doesn't make sense to combine any of the sources any further. I know Bayesian models can handle smaller sample sizes than frequentist stats, but how do I know if I've asked too much of the model? Is there a good rule of thumb?

Also, at what point is it acceptable to leave a possible source out? I know the models expect to have "every" source, but what are the limits? For example, is it acceptable to leave a source out if it is only found a few odd times during necropsies, or if it is only found 1% of the time?

Sources online example

In direct reply to your original question, it sounds to me like you are being very sensible and I would just keep going! Rather than omit a source that is unlikely a major component of diet, you could use an informative prior instead of the usual vague prior. This way the prior information / knowledge you have that it is likely rare would be reflected in the model fitting process. Chiaradia et al illustrate this.

Specifying priors in MixSIAR is achieved at run time via something like run_model(..., alpha.prior = c(1, 3, 3, 3, 3)) which for 5 sources would down-weight the 1st source relative to the other 4. There is a nice animation on the wiki page Dirichlet Distribution to help see how you might pick alpha values.

@Mint204
Copy link
Author

Mint204 commented May 15, 2024

Thank you! These comments are all very helpful. I will have to look further into those papers and reread the best practices paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants