Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve report_participants (fixes #247) #260

Merged
merged 6 commits into from
Jul 31, 2022
Merged

Improve report_participants (fixes #247) #260

merged 6 commits into from
Jul 31, 2022

Conversation

rempsyc
Copy link
Sponsor Member

@rempsyc rempsyc commented Jul 1, 2022

This PR fixes #247 by adding gender choices and correcting some typos. I also took this opportunity to add a parameter for country since earlier in my own analyses I wished it had this feature. There is a convenience threshold parameter to control how many countries should be printed, based on their % representation.

Reprex:

data <- data.frame(
  "Gender" = c("F", "F", "F", "Male", "M", "M", "Female", "F", "F", "F", "F"),
  "Country" = c("USA", NA, "Canada", "Canada", "India", "Germany",
                "USA", "USA", "USA", "USA", "Canada"))
report::report_participants(data)
#> [1] "11 participants (Gender: 72.7% women, 27.3% men, 0.00% non-binary; 
#> Country: 45.45% USA, 27.27% Canada, 27.27% other)."

# Control presentation treshold
report::report_participants(data, threshold = 5)
#> [1] "11 participants (Gender: 72.7% women, 27.3% men, 0.00% non-binary; 
#> Country: 45.45% USA, 27.27% Canada, 9.09% Germany, 9.09% India, 9.09% NA)."

Created on 2022-07-01 by the reprex package (v2.0.1)

As a side note, I observed an existing behaviour (in the current version, not this PR) that I wonder if it's expected: it classes NAs as non-binary, reprex:

data <- data.frame(
  "Gender" = c("W", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA))
report::report_participants(data)
#> [1] "11 participants (Gender: 9.1% women, 0.0% men, 90.91% non-binary)."

Created on 2022-07-01 by the reprex package (v2.0.1)

I've seen something in the code that seemed related "% missing", but I'm not sure it's working as intended (or it's serving a different purpose).

Finally, I was also thinking that reusing the template for country, one could allow user-defined categorical (or even numeric) categories. Would that be something worth implementing?

@bwiernik
Copy link
Contributor

bwiernik commented Jul 2, 2022

Thanks! We should default to not classify NA as non-binary, but maybe include an argument to do so

@rempsyc
Copy link
Sponsor Member Author

rempsyc commented Jul 4, 2022

Ok so I've fixed a row error, NA % missing treatment, and failing tests. I also updated the examples, added new tests, and consolidated the relevant section of the code to make it shorter.

After looking more carefully at that section, it seems there indeed is a mechanism to classify NAs as "missing". I was able to identify the source of the problem and fix it so that it works correctly. Reprex:

data <- data.frame(
  "Sex" = c("F", "Intersex", NA, NA, NA, NA, NA, NA, "", NA, "NA"),
  "Gender" = c("W", "Queer", NA, NA, NA, NA, NA, NA, "", NA, "NA"))
report::report_participants(data)
#> [1] "11 participants (Sex: 9.1% females, 0.0% males, 9.1% other, 81.82% missing; 
#> Gender: 9.1% women, 0.0% men, 9.09% non-binary, 81.82% missing)"

Created on 2022-07-04 by the reprex package (v2.0.1)

I also started adding an argument gender_NA to allow users to define NAs as they wish (e.g., as non-binary) but now that I have fixed the existing system, I wonder if that is necessary as I imagine in most cases it is better to be more precise and report % missing (as is current behaviour). Furthermore, users could manually change NAs to the proper category if needed, before using the function (and I now feel like that would be the best place for this to happen).

Some tests were failing, so I corrected them, but also found the following notes:

TO DO : discuss if this is the correct approach to report in case of missing values

David Feinberg Comments

There was no reporting of missing values for Age, but there was for Sex,
So I reported missing values for Age, Sex, and Gender.
I didn't touch education. There are no tests for education.
I believe these shouldn't be NA's in tests for Sex and Gender,
because NA means not available
but empty strings are available... they exist...
try:

is.na("")
[1] FALSE
I suggest testing for empty strings as I put below.
NA works for age because that's supposed to be a number. Not sure if it's
going to work in all cases though...

David Feinberg Comments

However, it seems this discussion has been abandoned (for now). Personally, I feel inclined to add NAs in tests, so I did. What seems to be the actual issue is that empty strings are treated as “other” instead of like NA or “NA”. Not sure if this is best practice, but to address this concern, I have added a function at the beginning converting all empty strings to NA so they can be processed correctly later on (e.g., so empty strings are considered missing instead of categories “other”). While there, I also added support for strings “NA” in addition to true NA (just in case).

@DominiqueMakowski
Copy link
Member

Thanks @rempsyc for looking into that!! feel free to add yourself as cbt in the DESC file ☺️

I also started adding an argument gender_NA to allow users to define NAs as they wish (e.g., as non-binary)

I agree with you, we should drop this argument, because it adds a whole new argument whereas NA can trivially be inputed beforehand explicitly as one desires. So I'd say we just label NA as missing and if people want to replace NA by something more specific they should do it beforehand (+ nonbinary and missing values do coexist so it's weird to assume that they could be equivalent)

I agree with having all edgecases covered in the tests too :)

@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2022

Codecov Report

Merging #260 (917f6d0) into main (2dfb8bd) will increase coverage by 2.27%.
The diff coverage is 91.86%.

❗ Current head 917f6d0 differs from pull request most recent head 2b4474d. Consider uploading reports for the commit 2b4474d to get more accurate results

@@            Coverage Diff             @@
##             main     #260      +/-   ##
==========================================
+ Coverage   62.55%   64.83%   +2.27%     
==========================================
  Files          42       42              
  Lines        2804     2869      +65     
==========================================
+ Hits         1754     1860     +106     
+ Misses       1050     1009      -41     
Impacted Files Coverage Δ
R/report_participants.R 85.65% <91.86%> (+20.59%) ⬆️
R/report_info.R 66.66% <0.00%> (-10.42%) ⬇️
R/report_text.R 94.44% <0.00%> (ø)
R/report_performance.R 86.17% <0.00%> (+0.11%) ⬆️
R/report.numeric.R 97.29% <0.00%> (+2.02%) ⬆️
R/report.factor.R 85.48% <0.00%> (+22.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a818eef...2b4474d. Read the comment docs.

@rempsyc
Copy link
Sponsor Member Author

rempsyc commented Jul 5, 2022

In addition to country, a commonly requested demographic is race information, so I added an argument for that. [Sidenote: in French we usually prefer the term "ethnic origin" but in English "race" seems to be the go-to word everyone uses... that said we can change the argument name if needed]. Reprex:

# Race/ethnicity
data <- data.frame(
  "Race" = c("Black", NA, "White", "Asian", "Black", "Arab", "Black",
             "White", "Asian", "Southeast Asian", "Mixed"))
report_participants(data)
#> [1] "11 participants (Race: 27.27% Black, 18.18% Asian, 18.18% White, 36.36% other)"

# Race/ethnicity, control presentation treshold
report_participants(data, threshold = 5)
#> [1] "11 participants (Race: 27.27% Black, 18.18% Asian, 18.18% White, 9.09% Arab, 
#> 9.09% missing, 9.09% Mixed, 9.09% Southeast Asian)"

Created on 2022-07-04 by the reprex package (v2.0.1)

It wasn't much work since the code I added for countries allows for an unlimited number of category values, so it's easy to adapt to any other categorical demographic... So if we have requests for more categories I could easily accommodate them too.

@DominiqueMakowski I looked at the DESCRIPTION file but I could not find a contributors section (only authors). However, if my PR is merged, I will appear as contributor anyway, right? Did you mean update the NEWS.md file to refer to my username and #260? If so, then I just did so and added a note about the change for upcoming v. 0.5.2!

@DominiqueMakowski
Copy link
Member

in French we usually prefer the term "ethnic origin" but in English "race"

We could go with "ethnicity", in my (French) ears it sounds better than "race" but I don't know much about that

I looked at the DESCRIPTION file but I could not find a contributors section

Can add yourself like this person (ctb is the role, there is no separate section):

report/DESCRIPTION

Lines 31 to 35 in 7dda461

person(given = "Rudolf",
family = "Siegel",
role = "ctb",
email = "mutlusun@users.noreply.github.com",
comment = c(ORCID = "0000-0002-6021-804X")))

@strengejacke
Copy link
Member

in French we usually prefer the term "ethnic origin" but in English "race"

We could go with "ethnicity", in my (French) ears it sounds better than "race" but I don't know much about that

@bwiernik ?

@bwiernik
Copy link
Contributor

bwiernik commented Jul 5, 2022

This is something we would need arguments for. It is absolutely "race" in the US or, eg, South Africa, because it doesn't refer to membership in specific ethnic groups but rather to coarse classification into these broader categories (largely based on skin color)--the basis of racism, rather than ethnic prejudice or oppression

@rempsyc
Copy link
Sponsor Member Author

rempsyc commented Jul 5, 2022

Race is certainly a sensitive topic, and understandably, we want to use the right language. @bwiernik thanks for confirming this, let's stick with that then.

@DominiqueMakowski got it, thanks! It is done now. It's an honour to be featured as a contributor there! :P

On that note, I think the PR is ready for final review before merge.

@IndrajeetPatil IndrajeetPatil merged commit 44c223e into easystats:main Jul 31, 2022
@IndrajeetPatil
Copy link
Member

Thanks for your great contribution, @rempsyc! 😊

Hoping to see more of it 😉

@rempsyc rempsyc deleted the report_participants branch October 16, 2022 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

report_participants: allow "Female" and "Male" for variable "gender"
6 participants