Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSE Review - 221: Text Clarity #634

Closed
lebebr01 opened this issue Oct 3, 2023 · 3 comments
Closed

JOSE Review - 221: Text Clarity #634

lebebr01 opened this issue Oct 3, 2023 · 3 comments
Assignees

Comments

@lebebr01
Copy link

lebebr01 commented Oct 3, 2023

Overall, I enjoyed the JOSE paper. It was well-written and offered great insights.

For the review checklist element, "Instructional Design," a few small pieces are overstated in the JOSE paper. I would like to hear your thoughts on these.

  1. pg 2, lines 43 - 46: There is a discussion of the mean and standard deviation not being robust, which is great. I was surprised to see the second part regarding these statistics assuming a Normal distribution. I agree and understand your point, but this is overstated for those learning statistics.
  2. pg 7, lines 244 - 247: I liked this example and appreciate the idea of thinking about research context for extreme values/outliers. Would it be worth adding/framing this idea into statistical terms and being very explicit about what you mean by context?

openjournals/jose-reviews#221

@rempsyc rempsyc self-assigned this Oct 3, 2023
@rempsyc
Copy link
Member

rempsyc commented Oct 3, 2023

Thanks for the review @lebebr01!

pg 2, lines 43 - 46: There is a discussion of the mean and standard deviation not being robust, which is great. I was surprised to see the second part regarding these statistics assuming a Normal distribution. I agree and understand your point, but this is overstated for those learning statistics.

To be clear, in the sentence, "they assume normally distributed data", the "they" refers to the methods based on the means and SD, not the means and SD themselves. What if we rephrase this phrase to respecify that we refer to the methods, would you be OK with that?

pg 7, lines 244 - 247: I liked this example and appreciate the idea of thinking about research context for extreme values/outliers. Would it be worth adding/framing this idea into statistical terms and being very explicit about what you mean by context?

For context, the example is:

For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded to allow for valid inferences.

Here, I think the deal with this example is that this is an undetected error outlier, in the sense that it is perhaps not detected by the statistical outlier detection methods, but it still does not belong to the theoretical or empirical distribution of interest (i.e., teenagers). So the take-away from this paragraph is that we should not blindly rely on statistical outlier detection methods and we should do our due diligence to investigate error outliers that are missed by the statistical methods. I will try to clarify this paragraph, but I am not sure I can reframe this in statistical terms since we are zooming out of the stats perspective here in a way, except I can mention the distribution of interest bit.

rempsyc added a commit that referenced this issue Oct 3, 2023
@rempsyc
Copy link
Member

rempsyc commented Oct 3, 2023

Here is the revised paragraph for point 2 (updated on the JOSE branch):

 We should also keep in mind that there might be error outliers that are not detected by statistical tools, but should nonetheless be found and removed. For example, if we are studying the effects of X on Y among teenagers and we have one observation from a 20-year-old, this observation might not be a statistical outlier, but it is an outlier in the context of our research, and should be discarded. We could call these observations undetected error outliers, in the sense that although they do not statistically stand out, they do not belong to the theoretical or empirical distribution of interest (e.g., teenagers). In this way, we should not blindly rely on statistical outlier detection methods; doing our due diligence to investigate undetected error outliers relative to our specific research question is also essential for valid inferences.

@lebebr01
Copy link
Author

lebebr01 commented Oct 4, 2023

Thanks! Looks great. The added piece to being explicit about the method's assumptions is helpful, I misread/misunderstood that.

@rempsyc rempsyc closed this as completed Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants