Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting incorrect results when using stby and descr with weights #196

Open
CaraghS opened this issue May 23, 2024 · 1 comment
Open

Getting incorrect results when using stby and descr with weights #196

CaraghS opened this issue May 23, 2024 · 1 comment

Comments

@CaraghS
Copy link

CaraghS commented May 23, 2024

I am using stby in summary tools to calculated weighted descriptive statistics by group. However, when I do this I am getting a different answer compared to when I filter by grouping variable and then apply the descr function in summary tools. See below - mydf = my unfiltered dataframe, score is a 0-10 variable that I want to get the mean of.

##when I filter first and split my df
filtered_male <- mydf$gender %>% filter(gender==1)
with(filtered_male, stby(score, gender, descr, weights = weight))
Weighted Descriptive Statistics
score by gender
Data Frame: filtered_male
Weights: weight
N: 838

                       1

       Mean         6.86
    Std.Dev         2.93
        Min         0.00
     Median         8.00
        Max        10.00
        MAD         2.97
         CV         0.43
    N.Valid   1509584.07
  Pct.Valid        99.70

##when I don't split my df
with(mydf, stby(score, gender, descr, weights = weight, simplify = TRUE))
Weighted Descriptive Statistics
score by gender
Data Frame: mydf
Weights: weight
N: 838

                       1            2

       Mean         7.01         6.79
    Std.Dev         2.81         3.02
        Min         0.00         0.00
     Median         8.00         8.00
        Max        10.00        10.00
        MAD         2.97         2.97
         CV         0.40         0.45
    N.Valid   1715494.12   1379339.65
  Pct.Valid        56.05        45.07

'''

Any idea's on why this is happening or how I fix it to get the correct weighted mean? (I've check the answers manually and the mean where I filter first is correct). Also, this doesn't seem to be an issue when I don't use weights.

@CaraghS
Copy link
Author

CaraghS commented May 23, 2024

Can I also add - the weighted median reported appears to be incorrect - it is different to that calculated using other R packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant