Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed calculating column summaries with a missing-proportion option #8419

Merged
merged 2 commits into from Jul 11, 2023

Conversation

lilyclements
Copy link
Contributor

@lilyclements lilyclements commented Jul 4, 2023

A bug identified by the CIMH group during their workshop is that using na_type=c("'prop'") in R-Instat calls a bug. This is because the parameter associated with na_type was not being called in with na_type. This PR fixes this issue for percentiles.
I am not sure if this issue would occur with other summaries. I have tried for other summaries like summary_mean and there is no issue there.
This issue does not occur for me for the other variables that are altered by the na_type. I have tested and these values are read in.

Script that I was testing the bug fix with

# Setting working directory, sourcing R code and loading R packages
setwd(dir="C:/Users/lclem/source/repos/RInstat/instat/static/InstatObject/R")

source(file="Rsetup.R")

data_book <- DataBook$new()

options(dplyr.summarise.inform=FALSE)

# Option: Number of digits to display
options(digits=4)

# Option: Show stars on summary tables of coefficients
options(show.signif.stars=FALSE)

# Code generated by the dialog, Import Dataset
new_RDS <- readRDS(file="C:/Program Files/R-Instat/0.7.6/static/Library/Climatic/Niger/niger_4_stns.RDS")
data_book$import_RDS(data_RDS=new_RDS)

rm(new_RDS)

# Code generated by the dialog, Column Summaries
data_book$calculate_summary(data_name="Data", columns_to_summarise=c("rain","hmax"), factors=c("year","month_abbr"), j=1, summaries=c("summary_count_non_missing", "summary_count", "summary_sum"), silent=TRUE)

Data_by_year_month_abbr <- data_book$get_data_frame("Data_by_year_month_abbr")

na_check(x = Data_by_year_month_abbr$sum_rain, na_type = "prop", na_max_prop=20)
# issue is na_max_prop is not being read into the function that uses it
data_book$calculate_summary(data_name="Data_by_year_month_abbr", columns_to_summarise="sum_rain",
                            factors="month_abbr", na.rm=TRUE, na_type=c("'prop'"),
                            na_max_prop=20, j=1, summaries=c("p10", "p33", "p67", "p90"), silent=TRUE,
                            return_output = TRUE)

@lilyclements
Copy link
Contributor Author

@africanmathsinitiative/developers this is ready to review

@lilyclements
Copy link
Contributor Author

@rdstern So the issue occurs when using the Column Summaries dialog.

  1. Import some data (say, Survey)
  2. Prepare > Data: Reshape > Column Summaries
  3. Put "Size" in the "Variables to Summarise" receiver. "Village" in the "By" receiver.
  4. When selecting summaries on the sdg, select "P33" (this occurs for any of the percentiles, though)
  5. Then on the main dlg, check "Omit Missing Values"
  6. Press "OPTIONS" next to this check box
  7. On this sdg, select "Maximum percentage of missing values allowed" (the third checkbox)
  8. And write in "1" (or any value!) in the input next to that checkbox
  9. Press Return, then OK on the main dialog.
  10. Error!

image

In R script form:

# Initialising R (e.g Loading R packages)

setwd(dir="C:/Users/lclem/source/repos/RInstat/instat/bin/Debug/static/InstatObject/R")

source(file="Rsetup.R")

data_book <- DataBook$new()

# Setting display options (e.g Number of significant digits)
options(digits=4, show.signif.stars=FALSE, dplyr.summarise.inform=FALSE)

# Code generated by the dialog, Import Dataset
rice_survey <- rio::import(file="C:/Program Files/R-Instat/0.7.6/static/Library/Problem Solving Examples/rice_survey.csv", stringsAsFactors=TRUE)
data_book$import_data(data_tables=list(rice_survey=rice_survey))


rm(rice_survey)

# Code generated by the dialog, Column Summaries
data_book$calculate_summary(data_name="rice_survey", columns_to_summarise="Size", factors="Village", na.rm=TRUE, na_type=c("'prop'"), na_max_prop=1, j=1, summaries=c("summary_count_non_missing", "summary_count", "summary_sum", "p33"), silent=TRUE)

This was occurring because the na_max_prop variable wasn't being read into the functions that calculate the percentiles correctly. This PR should fix this problem and mean that we no longer have this issue

@rdstern
Copy link
Collaborator

rdstern commented Jul 10, 2023

@lilyclements that's a very good spot. However, now the dialog seems to give an error the second time I use the dialog, whatever I do!
For example:
image

@lilyclements
Copy link
Contributor Author

@rdstern this error came from when I added in the keys delete changes so was actually an issue in the master version. I fixed this in #8422 which has now been merged, so this shouldn't be a problem anymore. Sorry for not being clearer that that might happen!

I've updated my branch, so if you re-pull this branch then this error shouldn't occur anymore.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilyclements great - this now seems to work fine.
@lloyddewit I hope this is ok for your checks. It is a correction I would really like to see in the version tomorrow if at all possible.

@lloyddewit lloyddewit added the bug label Jul 11, 2023
@lloyddewit lloyddewit changed the title Fixing an error when calculating column summaries with a missing-proportion option Fixed calculating column summaries with a missing-proportion option Jul 11, 2023
@lloyddewit lloyddewit merged commit f0c8e5d into IDEMSInternational:master Jul 11, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants