Skip to content

Conversation

@cristinamullin
Copy link
Collaborator

@cristinamullin cristinamullin commented Jul 29, 2025

  • Check vignettes to make sure recent updates have been incorporated
  • Consider adding tests to avoid issues in the future

add unit to grouping cols for TADA_AggregateMeasurements, fix TN and TP bug, add combos to harmonization table, update vignette
remove intermediate variables and update scatterplot examples
Combine TADA_AutoFilter with TADA_RetainRequired and delete TADA_AutoFilter
incorporate changes made to TADA_FindQCActivities, TADA_RetainRequired, and TADA_AutoFilter
@cristinamullin
Copy link
Collaborator Author

Need to look into these tomorrow:

── Failed tests ─────────────────────────────────────────────────────────────────────────────
Failure (test-TADARefTables.R:28:3): No combos were missed in NP key from harmonization table
dim(orphs)[1] == 0 is not TRUE

`actual`:   FALSE
`expected`: TRUE 

Failure (test-Transformations.R:21:3): np summation key matches nutrient harmonization ref
any(is.na(check$np)) is not FALSE

`actual`:   TRUE 
`expected`: FALSE

@hillarymarler
Copy link
Collaborator

hillarymarler commented Jul 30, 2025

Need to look into these tomorrow:

── Failed tests ─────────────────────────────────────────────────────────────────────────────
Failure (test-TADARefTables.R:28:3): No combos were missed in NP key from harmonization table
dim(orphs)[1] == 0 is not TRUE

`actual`:   FALSE
`expected`: TRUE 

Failure (test-Transformations.R:21:3): np summation key matches nutrient harmonization ref
any(is.na(check$np)) is not FALSE

`actual`:   TRUE 
`expected`: FALSE

I took a look at this and it looks like the following may be missing from TADA_GetNutrientSummationRef

image

I haven't worked on the NutrientSummation ref before, so if you'd like me to work on adding those additional rows, we might need to have a quick chat about everything I should consider when filling this ref table out, @cristinamullin

@cristinamullin
Copy link
Collaborator Author

#588 I will work to incorporate this feedback into this PR. Also, thanks @hillarymarler for looking into the missing combinations. They will need to get prioritized for inclusion in the TN/TP calculations in that ref table. I have not edited this since Elise and I worked on it when she was an ORISE with our branch. I'll take a look & may loop the TADA team in to review the logic.

@cristinamullin
Copy link
Collaborator Author

cristinamullin commented Jul 31, 2025

I found a bug in TADA_ConvertSpecialChars where we are getting new columns with TADA.TADA. if the function input column already contains the TADA prefix. Writing a test to double check for this issue.

image
test_that("Column names do not contain the pattern 'TADA.TADA.'", {
  test_TADA.TADA. <-
    TADA_ConvertSpecialChars(
      Data_Nutrients_UT,
      "TADA.DetectionQuantitationLimitMeasure.MeasureValue"
    )
  # Create a logical vector indicating which columns contain the pattern
  pattern_found <- grepl("TADA.TADA.", colnames(test_TADA.TADA.))
  
  # Test should pass if none of the columns contain the pattern
  expect_false(any(pattern_found), info = "Some column names contain the pattern 'TADA.TADA.'")
})

test_that("Column names do not contain the pattern 'TADA.TADA.'", {
  test_TADA.TADA. <-
    TADA_ConvertSpecialChars(
      Data_Nutrients_UT,
      "TADA.ResultMeasureValue"
    )
  # Create a logical vector indicating which columns contain the pattern
  pattern_found <- grepl("TADA.TADA.", colnames(test_TADA.TADA.))
  
  # Test should pass if none of the columns contain the pattern
  expect_false(any(pattern_found), info = "Some column names contain the pattern 'TADA.TADA.'")
})

test_that("Column names do not contain the pattern 'TADA.TADA.'", {
  test_TADA.TADA. <- TADA_AutoClean(Data_R5_TADAPackageDemo)
  # Create a logical vector indicating which columns contain the pattern
  pattern_found <- grepl("TADA.TADA.", colnames(test_TADA.TADA.))
  
  # Test should pass if none of the columns contain the pattern
  expect_false(any(pattern_found), info = "Some column names contain the pattern 'TADA.TADA.'")
})

I added logic to only run TADA_ConvertSpecialChars on a col if the associated flag col for that col does not already exist.

add tests and examples for TADA_ConvertSpecialChars and TADA_CalculateTotalNP
@wokenny13
Copy link
Collaborator

I am looking at the example in the Module 1 vignette and just for phosphorus ( TADAProfileClean5)

It seems like there are only 3 other phosphorus TADA.ComparableDataIdentifier in our data frame. Since they are already as P would it then be converted as is to Total Phosphorus? Should we expect the Total Phosphorus, Mixed Forms to only increase by a maximum of (the sum of 62 + 8 + 125) 195 rows once you run TADA_CalculateTotalNP?

It seems like it goes from 581 rows to 1180? Are duplicate rows being added on to the Total Phosphorus, Mixed Forms summation?

image image

@cristinamullin
Copy link
Collaborator Author

The TN/TP function is not working correctly...

today <- Sys.Date()
oneyearago <- as.character(today - 1 * 365)
testdat <- TADA_DataRetrieval(statecode = "UT", 
                              startDate = oneyearago, 
                              characteristicType = "Nutrient",
                              ask = FALSE)

testdat <- TADA_SimpleCensoredMethods(testdat,
                                      nd_method = "multiplier",
                                      nd_multiplier = 0.5,
                                      od_method = "as-is",
                                      od_multiplier = "null")

testdat2 <- TADA_CalculateTotalNP(testdat, daily_agg = "max")

unique(testdat2$TADA.ResultMeasureValueDataTypes.Flag)

# Filter data to show only rows where TADA.ResultMeasureValue is NA
filtered_data <- testdat2[is.na(testdat$TADA.ResultMeasureValue), ]

unique(filtered_data$TADA.NutrientSummation.Flag)
unique(filtered_data$TADA.ResultMeasureValueDataTypes.Flag)

# isolate rows that are NA related to TP and TN function
filtered_data_2 <- filtered_data[!(
  filtered_data$TADA.ResultMeasureValueDataTypes.Flag %in% 
    c("TP estimated from one or more subspecies.", 
      "TN estimated from one or more subspecies.")
)]

# isolate rows that are NA related to TP and TN function
# reduces number of columns... why
filtered_data_3 <- filtered_data[!(
  filtered_data$TADA.NutrientSummation.Flag %in% 
    c("Used to calculate Total Phosphorus as P.", 
      "Used to calculate Total Nitrogen as N." #,
      #"Not used to calculate Total N or P."
      )
)]

# these should not be used to calculate TN and TP if result value is NA
# and TADA.ResultMeasureValueDataTypes.Flag equals any of the following...
# [1] "NA - Not Available"                                        
# [2] "Result Value/Unit Cannot Be Estimated From Detection Limit"
# [3] "Text"

@cristinamullin
Copy link
Collaborator Author

I am looking at the example in the Module 1 vignette and just for phosphorus ( TADAProfileClean5)

It seems like there are only 3 other phosphorus TADA.ComparableDataIdentifier in our data frame. Since they are already as P would it then be converted as is to Total Phosphorus? Should we expect the Total Phosphorus, Mixed Forms to only increase by a maximum of (the sum of 62 + 8 + 125) 195 rows once you run TADA_CalculateTotalNP?

It seems like it goes from 581 rows to 1180? Are duplicate rows being added on to the Total Phosphorus, Mixed Forms summation?

image image

I noticed this too. I did add additional grouping cols on lines 342-351 in Transformations.R. I will comment these out for now. Can you test this again? Taking a break on the PR for now - please feel free to jump in if you can find/fix the issue!

# "TADA.ResultMeasure.MeasureUnitCode", # this may not need to be included since we have TADA.ComparableDataIdentifier
# consider adding columns below... 
# "OrganizationIdentifier", # should be okay to include here since this is by MonitoringLocationIdentifier which are org specific anyway
# "OrganizationFormalName",
# "CountryCode",
# "StateCode",
# "CountyCode",
# "HUCEightDigitCode",
# "MonitoringLocationTypeName",
# "MonitoringLocationDescriptionText",

@wokenny13
Copy link
Collaborator

Yes, I can take a look into this

@@ -449,10 +452,71 @@ TADA_CheckRequiredFields <- function(.data) {
#' reducedcols_Data_Nutrients_UT <- TADA_RetainRequired(Data_Nutrients_UT)
#'
TADA_RetainRequired <- function(.data) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs review. I combined functions TADA_RetainRequired and TADA_AutoFilter here. Does that make sense or are they may be better left as separate functions? Possible new name suggestion: maybe TADA_ReduceCols

"TADA.ComparableDataIdentifier",
# "TADA.ResultMeasure.MeasureUnitCode",
# "TADA.ResultMeasure.MeasureUnitCode", # this may not need to be included since we have TADA.ComparableDataIdentifier
# consider adding columns below...
Copy link
Collaborator Author

@cristinamullin cristinamullin Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wokenny13 FYI - these column additions were meant to address this issue with metadata being carried over. Maybe it can be addressed elsewhere in this function (or might not be possible). Needs more research: #588

@cristinamullin cristinamullin linked an issue Aug 1, 2025 that may be closed by this pull request
add dataRetrieval depends to EPATADA depends, improve print messages, add example to TADA_CalculateTotalNP, address QC and non-numeric results within TADA_CalculateTotalNP, add TADA_CalculateTotalNP tests
move functions to more applicable .R files
the default should be clean = FALSE for TADA_AggregateMeasurements and TADA_RunKeyFlagFunctions, add better messages for flag functions
add more TN/TP combinations, update TADA_CalculateTotalNP documentation, add tests
add additional matches, handle aggregated values, handle non-numeric results
@cristinamullin
Copy link
Collaborator Author

ideally, these could be run in any order... or we should think about running censored data handling in autoclean to ensure the proper order....

Suggest to add logic for NA TADA.ResultMeasure.MeasureUnitCode's to TADA_ConvertSpecialChars in future PR

test_that("TADA_SimpleCensoredMethods does not introduce duplicates or NAs in result or unit cols", {
  
  testdat <- TADA_RandomTestingData()
  
  testdat <- TADA_ConvertSpecialChars(testdat, 
                                      col = "TADA.ResultMeasureValue",
                                      clean = TRUE)
  
  # Test to ensure the column is entirely numeric
  expect_true(is.numeric(testdat$TADA.ResultMeasureValue))
  
  # Test to ensure value column does not contain any NA values
  expect_true(!any(is.na(testdat$TADA.ResultMeasureValue)))
  
  # TADA_ConvertSpecialChars does not handle this yet 8/11/25
  # # Test to ensure unit column does not contain any NA values
  # expect_true(!any(is.na(testdat$TADA.ResultMeasure.MeasureUnitCode)))
  
  testdat2 <- TADA_SimpleCensoredMethods(testdat,
                                         nd_method = "multiplier",
                                         nd_multiplier = 0.5,
                                         od_method = "as-is",
                                         od_multiplier = "null")
  
  testdat3 <- TADA_ConvertSpecialChars(testdat3, 
                                      col = "TADA.ResultMeasureValue",
                                      clean = TRUE)
  
  # Test to ensure the column is entirely numeric
  expect_true(is.numeric(testdat3$TADA.ResultMeasureValue))
  
  # Test to ensure value column does not contain any NA values
  expect_true(!any(is.na(testdat3$TADA.ResultMeasureValue)))
  
  # # Test to ensure unit column does not contain any NA values
  # expect_true(!any(is.na(testdat2$TADA.ResultMeasure.MeasureUnitCode)))
})

@cristinamullin
Copy link
Collaborator Author

See new combined PR: #630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants