Sexcoding#100
Conversation
Eliminated print statements that output the classes of ped$spouseID, ped$personID, ped$momID, and ped$dadID. This cleans up the function by removing unnecessary debug output.
Refactored and expanded test cases in test-joss_review.R to thoroughly check consistent handling of string and numeric IDs, including various missing value representations. Updated NEWS.md to document the repair of inconsistent ID handling.
Updates sex code handling in alignPedigreeWithRelations and pedigree.sexrepair to better support non-standard and numeric codes. Enhances tests to cover new config options and warning handling for sex code edge cases.
Wrapped ggpedigree plotting code in df_warroses.R with if (FALSE) to prevent execution during data processing. Improved formatting, argument alignment, and logic in test-ggpedigree_datainputs.R for better readability and consistency, including clearer handling of expect_warnings and test cases.
Introduces a new code_unknown parameter to getDefaultPlotConfig for specifying unknown sex codes. Updates pedigree.sexrepair to provide clearer error and warning messages when unknown or invalid sex values are detected. Adjusts related test to account for the new parameter.
Added more informative error messages when constructing pedigree objects fails due to incorrect or non-standard sex coding. Expanded and refactored tests for various sex code and ID input scenarios, including comprehensive cross-product testing of input types, missing values, and configuration options. Updated documentation to include new config option 'code_unknown'.
Sets scale_y_reverse limits to use minimum y position and normalizes y positions to start at zero in .adjustSpacing. Also updates test to clarify error handling for character IDs.
Adjust y-axis handling and spacing in pedigree plots
Introduces a new option to scale point sizes based on pedigree size in plotting functions. Updates documentation, function signatures, and tests to support the new parameter and ensure correct behavior.
43b803b to
91f5f96
Compare
91f5f96 to
17639d5
Compare
There was a problem hiding this comment.
Pull request overview
This pull request significantly improves sex coding and ID type handling in pedigree data. It adds more informative error messages when sex values are incorrectly specified, expands test coverage with comprehensive input validation tests, introduces new configuration options (code_unknown, point_scale_by_pedigree), and improves robustness in data type coercion and plotting. The changes collectively make the package more user-friendly when handling diverse pedigree data formats.
Key Changes
- Enhanced error messaging for sex coding issues with guidance on using configuration options
- Added comprehensive test suite for various sex codes, ID types, and missing value combinations
- Introduced automatic point size scaling based on pedigree size
- Improved ID type consistency handling to prevent mismatches between numeric and character IDs
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-joss_review.R | Removed redundant test cases that are now covered by the more comprehensive test suite |
| tests/testthat/test-ggpedigree_datainputs.R | Added extensive new test suite systematically testing combinations of sex codes, ID types, and missing parent encodings |
| tests/testthat/test-ggPedigree.R | Added explicit code_female parameter to existing tests for consistency |
| tests/testthat/test-defaultPlotConfig.R | Updated to reflect new parameter count (162) and added pedigree_size argument to buildPlotConfig calls |
| R/kinship2_pedigree.X | New file with alternative pedigree implementation (.X extension is problematic) |
| R/kinship2_pedigree.R | Improved sex repair logic to handle numeric strings and updated error messages |
| R/ggpedigreeHelpers.R | Added y-position normalization to ensure minimum value starts at zero |
| R/ggpedigree.R | Enhanced ID type coercion for consistency, added NA sex handling, improved spacing, and integrated pedigree_size parameter |
| R/ggPedigreeInteractive.R | Added pedigree_size parameter to buildPlotConfig call |
| R/calcCoordinates.R | Wrapped pedigree construction in tryCatch with improved error messages |
| R/defaultPlotConfig.R | Added code_unknown and point_scale_by_pedigree parameters with automatic point size scaling logic |
| man/getDefaultPlotConfig.Rd | Updated documentation for new parameters |
| man/buildPlotConfig.Rd | Added pedigree_size parameter documentation |
| data-raw/df_warroses.R | Wrapped example plotting code in if(FALSE) block |
| NEWS.md | Updated with summary of changes including error message improvements and test expansion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| stop("Error in constructing pedigree object. Please check that the you've | ||
| correctly specificed the sex of individuals. Setting code_male may help if non-standard codes are used (e.g., 'M'/'F'; '1,2').") |
There was a problem hiding this comment.
There are two spelling errors in this error message: "the you've" should be "that you've" and "specificed" should be "specified".
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| stop("Error in constructing pedigree object. Please check that the you've | ||
| correctly specificed the sex of individuals. Setting code_male may help if non-standard codes are used (e.g., 'M'/'F'; '1,2').") |
There was a problem hiding this comment.
There are two spelling errors in this error message: "the you've" should be "that you've" and "specificed" should be "specified". This is identical to the error message on lines 277-278.
| #' Create a pedigree or pedigreeList object | ||
| #' | ||
| #' @param id Identification variable for individual | ||
| #' @param dadid Identification variable for father. Founders' parents should be coded | ||
| #' to NA, or another value specified by missid. | ||
| #' @param momid Identification variable for mother. Founders' parents should be coded | ||
| #' to NA, or another value specified by missid. | ||
| #' @param sex Gender of individual noted in `id'. Either character ("male","female", | ||
| #' "unknown","terminated") or numeric (1="male", 2="female", 3="unknown", 4="terminated") | ||
| #' data is allowed. For character data the string may be truncated, and of arbitrary case. | ||
| #' @param affected A variable indicating affection status. A multi-column matrix | ||
| #' can be used to give the status with respect to multiple traits. Logical, factor, and | ||
| #' integer types are converted to 0/1 representing unaffected and affected, respectively. | ||
| #' NAs are considered missing. | ||
| #' @param status Censor/Vital status (0="censored", 1="dead") | ||
| #' @param relation A matrix with 3 required columns (id1, id2, code) specifying special | ||
| #' relationship between pairs of individuals. Codes: 1=Monozygotic twin, 2=Dizygotic twin, | ||
| #' 3=Twin of unknown zygosity, 4=Spouse. (The last is necessary in order to place a marriage | ||
| #' with no children into the plot.) If famid is given in the call to create pedigrees, then | ||
| #' famid needs to be in the last column of the relation matrix. Note for tuples of >= 3 with | ||
| #' a mixture of zygosities, the plotting is limited to showing pairwise zygosity of adjacent | ||
| #' subjects, so it is only necessary to specify the pairwise zygosity, in the order the subjects | ||
| #' are given or appear on the plot. | ||
| #' @param famid An optional vector of family identifiers. If it is present the result will | ||
| #' contain individual pedigrees for each family in the set, which can be extacted using | ||
| #' subscripts. Individuals need to have a unique id \emph{within} family. | ||
| #' @param missid The founders are those with no father or mother in the pedigree. The dadid | ||
| #' and momid values for these subjects will either be NA or the value of this variable. The | ||
| #' default for missid is 0 if the id variable is numeric, and "" (empty string) otherwise. | ||
| #' @param x pedigree object in print and subset methods | ||
| #' @param max_message_n max number of individuals to list in error messages | ||
| #' @param ... optional arguments passed to internal functions | ||
| #' @param drop logical, used in subset function for dropping dimensionality | ||
| #' @return An object of class \code{pedigree} or \code{pedigreeList} Containing the following items: | ||
| #' famid id findex mindex sex affected status relation | ||
| #' @author Terry Therneau | ||
| #' @name pedigree | ||
| #' @export | ||
|
|
||
|
|
||
| pedigree <- function(id, dadid, momid, | ||
| sex, affected, | ||
| status, relation, | ||
| famid, missid, max_message_n = 5) { | ||
| n <- length(id) | ||
| ## Code transferred from noweb to markdown vignette. | ||
| ## Sections from the noweb/vignettes are noted here with | ||
| ## Doc: Error and Data Checks | ||
|
|
||
| pedigree.idcheck( | ||
| id = id, | ||
| momid = momid, | ||
| dadid = dadid, | ||
| sex = sex | ||
| ) | ||
|
|
||
|
|
||
| id <- pedigree.idrepair(id = id) | ||
|
|
||
| # momid <- pedigree.idrepair(id = momid) | ||
| # dadid <- pedigree.idrepair(id = dadid) | ||
|
|
||
| sex <- pedigree.sexrepair(sex = sex) | ||
|
|
||
| if (missing(missid)) { | ||
| missid <- pedigree.makemissingid(id = id) | ||
| } | ||
|
|
||
| nofather <- (is.na(dadid) | dadid == missid) | ||
| nomother <- (is.na(momid) | momid == missid) | ||
|
|
||
| if (!missing(famid)) { | ||
| if (any(is.na(famid))) stop("The family id cannot contain missing values") | ||
| if (is.factor(famid) || is.character(famid)) { | ||
| if (length(grep("^ *$", famid)) > 0) { | ||
| stop("The family id cannot be a blank or empty string") | ||
| } | ||
| } | ||
| # Make a temporary new id from the family and subject pair | ||
| oldid <- id | ||
| id <- paste(as.character(famid), as.character(id), sep = "/") | ||
| dadid <- paste(as.character(famid), as.character(dadid), sep = "/") | ||
| momid <- paste(as.character(famid), as.character(momid), sep = "/") | ||
| } | ||
| has_famid <- !missing(famid) | ||
|
|
||
| if (any(duplicated(id))) { | ||
| duplist <- id[duplicated(id)] | ||
| msg.n <- min(length(duplist), max_message_n) | ||
| stop(paste("Duplicate subject id:", duplist[1:msg.n])) | ||
| } | ||
| findex <- match(dadid, id, nomatch = 0) | ||
| if (any(sex[findex] != "male")) { | ||
| who <- unique((id[findex])[sex[findex] != "male"]) | ||
| msg.n <- 1:min(max_message_n, length(who)) # Don't list a zillion | ||
| stop(paste( | ||
| "Id not male, but is a father:", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
|
|
||
| if (any(findex == 0 & !nofather)) { | ||
| who <- dadid[which(findex == 0 & !nofather)] | ||
| msg.n <- 1:min(max_message_n, length(who)) # Don't list a zillion | ||
| stop(paste( | ||
| "Value of 'dadid' not found in the id list", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
|
|
||
| mindex <- match(momid, id, nomatch = 0) | ||
|
|
||
| if (any(sex[mindex] != "female")) { | ||
| who <- unique((id[mindex])[sex[mindex] != "female"]) | ||
| msg.n <- 1:min(max_message_n, length(who)) | ||
| stop(paste( | ||
| "Id not female, but is a mother:", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
|
|
||
| if (any(mindex == 0 & !nomother)) { | ||
| who <- momid[which(mindex == 0 & !nomother)] | ||
| msg.n <- 1:min(max_message_n, length(who)) # Don't list a zillion | ||
| stop(paste( | ||
| "Value of 'momid' not found in the id list", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
|
|
||
| if (any(mindex == 0 & findex != 0) || any(mindex != 0 & findex == 0)) { | ||
| who <- id[which((mindex == 0 & findex != 0) | (mindex != 0 & findex == 0))] | ||
| msg.n <- 1:min(max_message_n, length(who)) # Don't list a zillion | ||
| stop(paste( | ||
| "Subjects must have both a father and mother, or have neither", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
|
|
||
| if (!missing(famid)) { | ||
| if (any(famid[mindex] != famid[mindex > 0])) { | ||
| who <- (id[mindex > 0])[famid[mindex] != famid[mindex > 0]] | ||
| msg.n <- 1:min(max_message_n, length(who)) | ||
| stop(paste( | ||
| "Mother's family != subject's family", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
| if (any(famid[findex] != famid[findex > 0])) { | ||
| who <- (id[findex > 0])[famid[findex] != famid[findex > 0]] | ||
| msg.n <- 1:min(max_message_n, length(who)) | ||
| stop(paste( | ||
| "Father's family != subject's family", | ||
| paste(who[msg.n], collapse = " ") | ||
| )) | ||
| } | ||
| } | ||
| ## Doc: Creation of Pedigrees | ||
| if (missing(famid)) { | ||
| temp <- list(id = id, findex = findex, mindex = mindex, sex = sex) | ||
| } else { | ||
| temp <- list( | ||
| famid = famid, id = oldid, findex = findex, mindex = mindex, | ||
| sex = sex | ||
| ) | ||
| } | ||
|
|
||
| if (!missing(affected)) { | ||
| temp$affected <- pedigree.process_affected(affected, n) | ||
| } | ||
|
|
||
|
|
||
| if (!missing(status)) { | ||
| temp$status <- pedigree.process_status(status, n) | ||
| } | ||
|
|
||
| if (!missing(relation)) { | ||
| temp$relation <- pedigree.process_relation( | ||
| relation = relation, | ||
| has_famid = has_famid, | ||
| famid = if (has_famid) famid else NULL, | ||
| id = id, | ||
| momid = momid, | ||
| dadid = dadid, | ||
| sex = sex | ||
| ) | ||
| } | ||
| ## Doc: Finish | ||
| if (missing(famid)) { | ||
| class(temp) <- "pedigree" | ||
| } else { | ||
| class(temp) <- "pedigreeList" | ||
| } | ||
| temp | ||
| } | ||
|
|
||
| #' @keywords internal | ||
| #' @noRd | ||
| pedigree.process_relation <- function(relation, | ||
| has_famid, | ||
| famid, | ||
| id, | ||
| momid, | ||
| dadid, | ||
| sex) { | ||
| rel_parsed <- pedigree.parse_relation(relation, | ||
| has_famid = has_famid | ||
| ) | ||
| id1 <- rel_parsed$id1 | ||
| id2 <- rel_parsed$id2 | ||
| rel_code <- pedigree.coerce_relation_code(rel_parsed$code) | ||
| rel_famid <- rel_parsed$famid | ||
|
|
||
| # Ensure everyone in the relationship is in the pedigree | ||
| if (has_famid) { | ||
| key1 <- paste(as.character(rel_famid), as.character(id1), sep = "/") | ||
| key2 <- paste(as.character(rel_famid), as.character(id2), sep = "/") | ||
| } else { | ||
| key1 <- id1 | ||
| key2 <- id2 | ||
| } | ||
|
|
||
| indx1 <- match(key1, id, nomatch = 0L) | ||
| indx2 <- match(key2, id, nomatch = 0L) | ||
|
|
||
| if (any(indx1 == 0L | indx2 == 0L)) { | ||
| stop("Subjects in relationships that are not in the pedigree") | ||
| } | ||
| if (any(indx1 == indx2)) { | ||
| who <- indx1[indx1 == indx2] | ||
| stop(paste("Subject", id[who], "is their own spouse or twin")) | ||
| } | ||
|
|
||
| # Twin consistency checks | ||
| numeric_code <- as.numeric(rel_code) | ||
| if (any(numeric_code < 3L)) { | ||
| twins <- numeric_code < 3L | ||
| if (any(momid[indx1[twins]] != momid[indx2[twins]])) { | ||
| stop("Twins found with different mothers") | ||
| } | ||
| if (any(dadid[indx1[twins]] != dadid[indx2[twins]])) { | ||
| stop("Twins found with different fathers") | ||
| } | ||
| } | ||
|
|
||
| # MZ twin sex check | ||
| if (any(rel_code == "MZ twin")) { | ||
| mztwins <- rel_code == "MZ twin" | ||
| if (any(sex[indx1[mztwins]] != sex[indx2[mztwins]])) { | ||
| stop("MZ twins with different sexes") | ||
| } | ||
| } | ||
|
|
||
| if (has_famid == TRUE) { | ||
| data.frame( | ||
| famid = rel_famid, | ||
| indx1 = indx1, | ||
| indx2 = indx2, | ||
| code = rel_code | ||
| ) | ||
| } else { | ||
| data.frame( | ||
| indx1 = indx1, | ||
| indx2 = indx2, | ||
| code = rel_code | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| pedigree.parse_relation <- function(relation, has_famid) { | ||
| if (is.matrix(relation)) { | ||
| expected_cols <- if (has_famid) 4L else 3L | ||
| if (ncol(relation) != expected_cols) { | ||
| if (has_famid) { | ||
| stop("Relation matrix must have 3 columns + famid") | ||
| } else { | ||
| stop("Relation matrix must have 3 columns: id1, id2, code") | ||
| } | ||
| } | ||
| id1 <- relation[, 1] | ||
| id2 <- relation[, 2] | ||
| code <- relation[, 3] | ||
| fam <- if (has_famid) relation[, 4] else NULL | ||
| } else if (is.data.frame(relation)) { | ||
| id1 <- relation$id1 | ||
| id2 <- relation$id2 | ||
| code <- relation$code | ||
| fam <- if (has_famid) relation$famid else NULL | ||
|
|
||
| if (is.null(id1) || is.null(id2) || is.null(code) || | ||
| (has_famid && is.null(fam))) { | ||
| if (has_famid) { | ||
| stop("Relation data must have id1, id2, code, and family id") | ||
| } else { | ||
| stop("Relation data frame must have id1, id2, and code") | ||
| } | ||
| } | ||
| } else { | ||
| if (has_famid) { | ||
| stop("Relation argument must be a matrix or a dataframe") | ||
| } else { | ||
| stop("Relation argument must be a matrix or a list") | ||
| } | ||
| } | ||
|
|
||
| list( | ||
| id1 = id1, | ||
| id2 = id2, | ||
| code = code, | ||
| famid = fam | ||
| ) | ||
| } | ||
|
|
||
| #' | ||
| #' @keywords internal | ||
| #' @noRd | ||
| #' @inheritParams pedigree | ||
|
|
||
| pedigree.idcheck <- function(id, momid, dadid, sex) { | ||
| n <- length(id) | ||
| if (length(momid) != n) stop("Mismatched lengths, id and momid") | ||
| if (length(dadid) != n) stop("Mismatched lengths, id and dadid") | ||
| if (length(sex) != n) stop("Mismatched lengths, id and sex") | ||
| if (any(is.na(id))) stop("Missing value for the id variable") | ||
| } | ||
|
|
||
| #' @keywords internal | ||
| #' @noRd | ||
| #' @inheritParams pedigree | ||
|
|
||
| pedigree.idrepair <- function(id) { | ||
| if (!is.numeric(id)) { | ||
| id <- as.character(id) | ||
| if (length(grep("^ *$", id)) > 0) { | ||
| stop("A blank or empty string is not allowed as the id variable") | ||
| } | ||
| } | ||
| id | ||
| } | ||
|
|
||
| #' @keywords internal | ||
| #' @noRd | ||
| #' @inheritParams pedigree | ||
|
|
||
| pedigree.sexrepair <- function(sex, codes = c("male", "female", "unknown", "terminated") | ||
| ) { | ||
| # Allow for character/numeric | ||
| # Allow for character/numeric/factor in the sex variable | ||
| if (is.factor(sex)) { | ||
| sex <- as.character(sex) | ||
| } | ||
|
|
||
| if (is.character(sex)) { | ||
|
|
||
| if(all(unique(sex) %in% codes, na.rm = TRUE)){ | ||
| sex <- charmatch(casefold(sex, upper = FALSE), codes, | ||
| nomatch = 3 | ||
| ) | ||
| } else if(all(unique(sex) %in% c("M","F","U", "T", NA), na.rm = TRUE)){ | ||
|
|
||
| sex <- charmatch(casefold(sex, upper = FALSE), c("M", "F", "U", "T"), | ||
| nomatch = 3 | ||
| ) | ||
|
|
||
| } else if (all(unique(suppressWarnings(as.numeric(sex)) | ||
| ) %in% c(0:4), na.rm = TRUE)) { | ||
| sex <- as.numeric(sex) | ||
| } else { | ||
| print(unique(sex)) | ||
| # stop("Invalid values for 'sex'") | ||
| sex <- charmatch(casefold(sex, upper = FALSE), codes, | ||
| nomatch = 3 | ||
| ) | ||
| } | ||
| } | ||
| # assume either 0/1/2/4 = female/male/unknown/term, or 1/2/3/4 | ||
| # if only 1/2 assume no unknowns | ||
| if (min(sex) == 0) { | ||
| sex <- sex + 1 | ||
| } | ||
| sex <- ifelse(sex < 1 | sex > 4, 3, sex) | ||
| if (all(sex > 2)) { | ||
| stop("Invalid values for 'sex'. All values coded as Unknown") | ||
| } else if (mean(sex == 3) > 0.25) { | ||
| warning("More than 25% of the sex values are 'unknown'") | ||
| } | ||
| sex <- factor(sex, 1:4, labels = codes) | ||
|
|
||
| sex | ||
| } | ||
|
|
||
|
|
||
| #' @keywords internal | ||
| #' @noRd | ||
| pedigree.process_status <- function(status, n) { | ||
| if (length(status) != n) { | ||
| stop("Wrong length for affected") | ||
| } | ||
| if (is.logical(status)) status <- as.integer(status) | ||
| if (any(status != 0L & status != 1L)) { | ||
| stop("Invalid status code") | ||
| } | ||
| status | ||
| } | ||
|
|
||
| #' @keywords internal | ||
| #' @noRd | ||
| pedigree.process_affected <- function(affected, n) { | ||
| if (is.matrix(affected)) { | ||
| if (nrow(affected) != n) stop("Wrong number of rows in affected") | ||
| if (is.logical(affected)) affected <- 1L * affected | ||
| } else { | ||
| if (length(affected) != n) { | ||
| stop("Wrong length for affected") | ||
| } | ||
|
|
||
| if (is.logical(affected)) affected <- as.numeric(affected) | ||
| if (is.factor(affected)) affected <- as.numeric(affected) - 1 | ||
| } | ||
| if (max(affected, na.rm = TRUE) > min(affected, na.rm = TRUE)) { | ||
| affected <- affected - min(affected, na.rm = TRUE) | ||
| } | ||
| if (!all(affected == 0 | affected == 1 | is.na(affected))) { | ||
| stop("Invalid code for affected status") | ||
| } | ||
| affected | ||
| } | ||
|
|
||
|
|
||
| ## Doc: Subscripting a pedigree | ||
| #' @rdname pedigree | ||
| #' @export | ||
| "[.pedigreeList" <- function(x, ..., drop = FALSE) { | ||
| if (length(list(...)) != 1) stop("Only 1 subscript allowed") | ||
| ufam <- unique(x$famid) | ||
| if (is.factor(..1) || is.character(..1)) { | ||
| indx <- ufam[match(..1, ufam)] | ||
| } else { | ||
| indx <- ufam[..1] | ||
| } | ||
|
|
||
| if (any(is.na(indx))) { | ||
| stop(paste("Family", (..1[is.na(indx)])[1], "not found")) | ||
| } | ||
|
|
||
| keep <- which(x$famid %in% indx) # which rows to keep | ||
| for (i in c("id", "famid", "sex")) { | ||
| x[[i]] <- (x[[i]])[keep] | ||
| } | ||
|
|
||
| kept.rows <- (1:length(x$findex))[keep] | ||
| x$findex <- match(x$findex[keep], kept.rows, nomatch = 0) | ||
| x$mindex <- match(x$mindex[keep], kept.rows, nomatch = 0) | ||
|
|
||
| # optional components | ||
| if (!is.null(x$status)) x$status <- x$status[keep] | ||
| if (!is.null(x$affected)) { | ||
| if (is.matrix(x$affected)) { | ||
| x$affected <- x$affected[keep, , drop = FALSE] | ||
| } else { | ||
| x$affected <- x$affected[keep] | ||
| } | ||
| } | ||
| if (!is.null(x$relation)) { | ||
| keep <- !is.na(match(x$relation$famid, indx)) | ||
| if (any(keep)) { | ||
| x$relation <- x$relation[keep, , drop = FALSE] | ||
| ## Update twin id indexes | ||
| x$relation$indx1 <- match(x$relation$indx1, kept.rows, nomatch = 0) | ||
| x$relation$indx2 <- match(x$relation$indx2, kept.rows, nomatch = 0) | ||
| ## If only one family chosen, remove famid | ||
| if (length(indx) == 1) { | ||
| x$relation$famid <- NULL | ||
| } | ||
| } else { | ||
| x$relation <- NULL | ||
| } # No relations matrix elements for this family | ||
| } | ||
|
|
||
| if (length(indx) == 1) { | ||
| class(x) <- "pedigree" | ||
| } # only one family chosen | ||
| else { | ||
| class(x) <- "pedigreeList" | ||
| } | ||
| x | ||
| } | ||
|
|
||
| #' @rdname pedigree | ||
| #' @export | ||
| "[.pedigree" <- function(x, ..., drop = FALSE) { | ||
| if (length(list(...)) != 1) stop("Only 1 subscript allowed") | ||
| if (is.character(..1) || is.factor(..1)) { | ||
| i <- match(..1, x$id) | ||
| } else { | ||
| i <- (1:length(x$id))[..1] | ||
| } | ||
|
|
||
| if (any(is.na(i))) paste("Subject", ..1[which(is.na(i))][1], "not found") | ||
|
|
||
| z <- list( | ||
| id = x$id[i], findex = match(x$findex[i], i, nomatch = 0), | ||
| mindex = match(x$mindex[i], i, nomatch = 0), | ||
| sex = x$sex[i] | ||
| ) | ||
| if (!is.null(x$affected)) { | ||
| if (is.matrix(x$affected)) { | ||
| z$affected <- x$affected[i, , drop = FALSE] | ||
| } else { | ||
| z$affected <- x$affected[i] | ||
| } | ||
| } | ||
| if (!is.null(x$famid)) z$famid <- x$famid[i] | ||
|
|
||
|
|
||
| if (!is.null(x$relation)) { | ||
| indx1 <- match(x$relation$indx1, i, nomatch = 0) | ||
| indx2 <- match(x$relation$indx2, i, nomatch = 0) | ||
| keep <- (indx1 > 0 & indx2 > 0) # keep only if both id's are kept | ||
| if (any(keep)) { | ||
| z$relation <- x$relation[keep, , drop = FALSE] | ||
| z$relation$indx1 <- indx1[keep] | ||
| z$relation$indx2 <- indx2[keep] | ||
| } | ||
| } | ||
|
|
||
| if (!is.null(x$hints)) { | ||
| temp <- list(order = x$hints$order[i]) | ||
| if (!is.null(x$hints$spouse)) { | ||
| indx1 <- match(x$hints$spouse[, 1], i, nomatch = 0) | ||
| indx2 <- match(x$hints$spouse[, 2], i, nomatch = 0) | ||
| keep <- (indx1 > 0 & indx2 > 0) # keep only if both id's are kept | ||
| if (any(keep)) { | ||
| temp$spouse <- cbind( | ||
| indx1[keep], indx2[keep], | ||
| x$hints$spouse[keep, 3] | ||
| ) | ||
| } | ||
| } | ||
| z$hints <- temp | ||
| } | ||
|
|
||
| if (any(z$findex == 0 & z$mindex > 0) | any(z$findex > 0 & z$mindex == 0)) { | ||
| stop("A subpedigree cannot choose only one parent of a subject") | ||
| } | ||
| class(z) <- "pedigree" | ||
| z | ||
| } | ||
| #' @keywords internal | ||
| #' @noRd | ||
| pedigree.coerce_relation_code <- function(code) { | ||
| levels <- c("MZ twin", "DZ twin", "UZ twin", "spouse") | ||
|
|
||
| if (is.factor(code)) { | ||
| code <- as.character(code) | ||
| } | ||
|
|
||
| if (is.numeric(code)) { | ||
| if (any(code < 1L | code > 4L | is.na(code))) { | ||
| stop("Invalid relationship code") | ||
| } | ||
| factor(code, levels = 1:4, labels = levels) | ||
| } else { | ||
| idx <- match(code, levels) | ||
| if (any(is.na(idx))) { | ||
| stop("Invalid relationship code") | ||
| } | ||
| factor(idx, levels = 1:4, labels = levels) | ||
| } | ||
| } | ||
| #' @rdname pedigree | ||
| #' @method print pedigree | ||
| print.pedigree <- function(x, ...) { | ||
| cat("Pedigree object with", length(x$id), "subjects") | ||
| if (!is.null(x$famid)) { | ||
| cat(", family id=", x$famid[1], "\n") | ||
| } else { | ||
| cat("\n") | ||
| } | ||
| cat("Bit size=", kinship2_bitSize(x)$bitSize, "\n") | ||
| } | ||
|
|
||
| #' @rdname pedigree | ||
| #' @method print pedigreeList | ||
| print.pedigreeList <- function(x, ...) { | ||
| cat( | ||
| "Pedigree list with", length(x$id), "total subjects in", | ||
| length(unique(x$famid)), "families\n" | ||
| ) | ||
| } | ||
|
|
||
|
|
||
| pedigree.makemissingid <- function(id) { | ||
| ## Doc: Errors2 | ||
|
|
||
| if (is.numeric(id)) { | ||
| 0 | ||
| } else { | ||
| "" | ||
| } | ||
| } |
There was a problem hiding this comment.
A new file "R/kinship2_pedigree.X" has been added that appears to contain an alternative implementation of pedigree functions. The .X extension is unusual for R source files and may cause issues with R package building, as R typically expects .R files. If this is meant to be temporary or experimental code, consider using a different approach like a feature branch or moving it outside the R/ directory. If it's meant to replace the existing kinship2_pedigree.R file, it should be properly renamed with a .R extension.
| # debug = TRUE, | ||
| config = list( | ||
| code_male = 0, | ||
| code_female=1, |
There was a problem hiding this comment.
Missing space after the equals sign. Should be code_female = 1 for consistent formatting with the rest of the codebase.
| ) | ||
| # if all numeric strings, convert to numeric | ||
| if (length(unique(suppressWarnings(as.numeric(sex)))) > 1 && | ||
| all(is.numeric(unique(suppressWarnings(as.numeric(sex)))), na.rm = TRUE)) { |
There was a problem hiding this comment.
The condition on line 354-355 has a logical issue. It checks if length(unique(suppressWarnings(as.numeric(sex)))) > 1 AND all(is.numeric(...)). However, as.numeric() always returns a numeric vector, so is.numeric() will always be TRUE. The intended check is likely whether the conversion from character to numeric produces valid numbers (i.e., not all NAs). Consider using !all(is.na(suppressWarnings(as.numeric(sex)))) instead of all(is.numeric(...), na.rm = TRUE) to check if the conversion was successful.
| all(is.numeric(unique(suppressWarnings(as.numeric(sex)))), na.rm = TRUE)) { | |
| !all(is.na(suppressWarnings(as.numeric(sex))))) { |
| sex <- as.numeric(sex) | ||
| } else { | ||
| print(unique(sex)) | ||
| # stop("Invalid values for 'sex'") |
There was a problem hiding this comment.
There's a commented-out debugging print statement that should be removed for production code. If debugging is needed, consider using the config$debug flag that's already established in the codebase.
| # stop("Invalid values for 'sex'") |
| .adjustSpacing <- function(ds, config) { | ||
| # set shift y to have min at zero | ||
| min_y <- min(ds$y_pos, na.rm = TRUE) | ||
| if (min_y > 0) { |
There was a problem hiding this comment.
The condition on line 100 checks if min_y > 0, but the goal stated in the comment is to "have min at zero". This logic seems inverted - if the minimum is already at or below zero, no adjustment would be needed. If the minimum is above zero, shifting down makes sense. However, if the minimum is negative, it should also be shifted up to zero. Consider changing the condition to if (min_y != 0) to handle both cases correctly.
| if (min_y > 0) { | |
| if (min_y != 0) { |
| expect_silent(buildPlotConfig(default_config, config, pedigree_size = 1)) | ||
| expect_warning(buildPlotConfig(default_config, config)) |
There was a problem hiding this comment.
Line 93 adds a test that expects a warning when pedigree_size is not provided to buildPlotConfig. However, this test appears to expect a warning for all cases where pedigree_size is missing, but according to the implementation in R/defaultPlotConfig.R (lines 768-774), the warning should only occur when point_scale_by_pedigree is TRUE and pedigree_size is NULL. Since the test uses the default config which has point_scale_by_pedigree = TRUE by default, this test should pass, but the test name and structure don't make this dependency clear. Consider adding a comment explaining why this warning is expected.
| test_that("buildPlotConfig returns a list", { | ||
| default_config <- getDefaultPlotConfig(function_name = "ggrelatednessmatrix") | ||
| custom <- list(point_size = 2) | ||
|
|
||
| result <- buildPlotConfig(default_config, custom) | ||
| pedigree_size <- 3 | ||
| result <- buildPlotConfig(default_config, custom, pedigree_size = pedigree_size) | ||
| expect_equal(result$point_size, 2 / sqrt(pedigree_size), tolerance = 1e-8) |
There was a problem hiding this comment.
The test on line 132 expects that when point_scale_by_pedigree is TRUE (the default), the point size will be scaled by dividing by sqrt(pedigree_size). However, this test is for "ggrelatednessmatrix" function which may have different defaults. The behavior might be inconsistent across different function types. Consider verifying that this scaling behavior is intended for all functions or adding assertions about the point_scale_by_pedigree setting.
| # debug = TRUE, | ||
| config = list( | ||
| code_male = 0, | ||
| code_female=1, |
There was a problem hiding this comment.
Missing space after the equals sign. Should be code_female = 1 for consistent formatting with the rest of the codebase.
|
@smasongarrison I've opened a new pull request, #101, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: smasongarrison <6001608+smasongarrison@users.noreply.github.com>
Fix spelling errors in pedigree construction error messages
This pull request significantly improves how the package handles sex coding and ID types in pedigree data, especially for non-standard or inconsistent inputs. It adds more informative error messages, expands and refactors tests for edge cases, introduces new configuration options, and improves robustness in data handling and plotting. Below are the most important changes grouped by theme:
Improved Error Handling and Messaging
code_malefor non-standard codes (e.g., "M"/"F", "1"/"2") [1] [2].pedigree.sexrepairto clarify when all sex values are unknown or when a large proportion are unknown, suggesting configuration fixes.Expanded and Refactored Testing
test-ggpedigree_datainputs.R) that systematically tests combinations of sex codes, ID types (numeric/character), and missing parent encodings, checking for correct error/warning handling and round-trip data invariance.test-joss_review.R, consolidating input validation coverage into the new, more thorough test suite.code_male,code_female, andcode_nasettings [1] [2] [3] [4].Configuration and Documentation Enhancements
code_unknownparameter togetDefaultPlotConfigand its documentation, allowing users to specify codes for unknown sex values [1] [2] [3] [4] [5].Robustness in Data Handling
transformPedto ensure consistent ID types formomIDanddadID, preventing class mismatches that could cause errors [1] [2]..adjustSpacingto ensurey_posandy_famstart at zero for more consistent plotting.ggPedigree.coreto ensure the minimum y-value is respected.Miscellaneous
These changes collectively make the package more user-friendly, robust, and reliable when handling diverse and potentially inconsistent pedigree data inputs.