Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agregating data in temporal-covarate-settings produces NA timeId in covaraitesContinuous #114

Open
javier-gracia-tabuenca-tuni opened this issue Jan 7, 2021 · 5 comments

Comments

@javier-gracia-tabuenca-tuni
tcovset <- createTemporalCovariateSettings(
   useVisitCconceptCount = TRUE, 
   temporalStartDays = c(1, 365), 
   temporalEndDays = c(365, 365*2),
)

tempCov <- getDbCovariateData(
   ...
   ...
   covariateSettings = tcovset, 
   aggregated = TRUE
)

tempCov$covariateContinuous %>% view()

Column timeId is all NA

I can't find why this happens.
I came to find this error when using CohortDiagnostics and I trying to get the temporal characterization of visit counts.

@anthonysena
Copy link
Collaborator

Hi @javier-gracia-tabuenca-tuni - can you link me to where in CohortDiagnostics where you found this code? Upon first glance, the parameter useVisitCconceptCount appears to be malformed (but might just be a typo here).

@javier-gracia-tabuenca-tuni
Copy link
Author

Sorry, this code is not in CohortDiagnostics.

I'm trying to run a custom cohort diagnosis which shows in Temporal Characterisation the count of visits in each time period.

Here is some code to reproduce the error:


title: "R Notebook"
output: html_notebook

library(tidyverse)
Sys.setenv(JAVA_HOME='C:\\Program Files\\AdoptOpenJDK\\jdk-8.0.232.09-hotspot\\')
library(SqlRender)
library(Eunomia)
library(CohortDiagnostics)

working_folder <- getwd()
connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)

cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "tmp_cohort_table"
cohortSetReference <- tibble(
  atlasId = 1776012  , 
  atlasName = "Asthma",
  cohortId = 1776012  ,
  name = "Asthma"
)
createCohortTable(connectionDetails = connectionDetails,
                  cohortDatabaseSchema = cohortDatabaseSchema,
                  cohortTable = cohortTable)
baseUrl <- "http://api.ohdsi.org:80/WebAPI"
inclusionStatisticsFolder <- file.path(working_folder, "incStats")

instantiateCohortSet(connectionDetails = connectionDetails,
                     cdmDatabaseSchema = cdmDatabaseSchema,
                     oracleTempSchema = oracleTempSchema,
                     cohortDatabaseSchema = cohortDatabaseSchema,
                     cohortTable = cohortTable,
                     baseUrl = baseUrl,
                     cohortSetReference = cohortSetReference,
                     generateInclusionStats = TRUE,
                     inclusionStatisticsFolder = inclusionStatisticsFolder)

my_tempCovaraitesSettings <- createTemporalCovariateSettings(
   useVisitConceptCount = TRUE, 
   temporalStartDays = c(-365, 0   , 30*2+1, 365*1+1), 
   temporalEndDays =   c(  -1, 30*2,  365*1, 365*2+1)
)


databaseId <- "testDB"
exportFolder <- file.path(working_folder, "export")
runCohortDiagnostics(baseUrl = baseUrl,
                     cohortSetReference = cohortSetReference,
                     #
                     connectionDetails = connectionDetails,
                     cdmDatabaseSchema = cdmDatabaseSchema,
                     oracleTempSchema = oracleTempSchema,
                     cohortDatabaseSchema = cohortDatabaseSchema, 
                     cohortTable = cohortTable,
                     #
                     databaseId = databaseId,
                     databaseName = databaseId,
                     databaseDescription = databaseId, 
                     #
                     inclusionStatisticsFolder = inclusionStatisticsFolder,
                     exportFolder = exportFolder,
                     #
                     runInclusionStatistics = TRUE,
                     runIncludedSourceConcepts = TRUE,
                     runOrphanConcepts = TRUE,
                     runTimeDistributions = TRUE,
                     runBreakdownIndexEvents = TRUE,
                     runIncidenceRate = TRUE,
                     runCohortOverlap = TRUE,
                     runCohortCharacterization = TRUE,
                     minCellCount = 5, 
                     #
                     runTemporalCohortCharacterization = TRUE,
                     temporalCovariateSettings = my_tempCovaraitesSettings
                     )
read_csv("./export/temporal_time_ref.csv")
read_csv("./export/temporal_covariate_value.csv")
preMergeDiagnosticsFiles(exportFolder)
launchDiagnosticsExplorer(exportFolder)

image

@javier-gracia-tabuenca-tuni
Copy link
Author

However, I traced the error and it seems to originate in FeatureExtraction. For this reason I posted it here.

Here is a simpler code to reproduce the error in FeatureExtraction:


title: "R Notebook"
output: html_notebook

library(tidyverse)
Sys.setenv(JAVA_HOME='C:\\Program Files\\AdoptOpenJDK\\jdk-8.0.232.09-hotspot\\')
library(SqlRender)
library(Eunomia)
library(FeatureExtraction)

working_folder <- getwd()
connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)

cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "tmp_cohort_table"
cohortSetReference <- tibble(
  atlasId = 1776012  , 
  atlasName = "Asthma",
  cohortId = 1776012  ,
  name = "Asthma"
)
createCohortTable(connectionDetails = connectionDetails,
                  cohortDatabaseSchema = cohortDatabaseSchema,
                  cohortTable = cohortTable)
baseUrl <- "http://api.ohdsi.org:80/WebAPI"
inclusionStatisticsFolder <- "~/incStats"

instantiateCohortSet(connectionDetails = connectionDetails,
                     cdmDatabaseSchema = cdmDatabaseSchema,
                     oracleTempSchema = oracleTempSchema,
                     cohortDatabaseSchema = cohortDatabaseSchema,
                     cohortTable = cohortTable,
                     baseUrl = baseUrl,
                     cohortSetReference = cohortSetReference,
                     generateInclusionStats = TRUE,
                     inclusionStatisticsFolder = inclusionStatisticsFolder)
my_tempCovaraitesSettings <- createTemporalCovariateSettings(
   useVisitConceptCount = TRUE, 
   temporalStartDays = c(-365, 0   , 30*2+1, 365*1+1), 
   temporalEndDays =   c(  -1, 30*2,  365*1, 365*2+1)
)

tempCov <- getDbCovariateData(
  connectionDetails = connectionDetails,
  oracleTempSchema = oracleTempSchema,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortTable = cohortTable,
  cohortDatabaseSchema = cdmDatabaseSchema,
  cohortTableIsTemp = FALSE,
  cohortId = 1776012,
  rowIdField = "subject_id",
  covariateSettings = my_tempCovaraitesSettings,
  aggregated = TRUE
)
tempCov$covariatesContinuous

image

Should be one value for each time_id not only one NA

@anthonysena
Copy link
Collaborator

Hi @javier-gracia-tabuenca-tuni - thanks for the code to help reproduce this problem! I've taken your code and made some changes changed it since there were some functions you referenced that are missing and I don't have the Asthma cohort referenced. So, I constructed a very basic cohort to use anyone in the Eunomia data set that has an observation_period entry to keep things simple. Below is the code:

library(SqlRender)
library(Eunomia)
#> Loading required package: DatabaseConnector
library(FeatureExtraction)
#> Loading required package: Andromeda
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

connectionDetails <- getEunomiaConnectionDetails()
connection <- connect(connectionDetails)
#> Connecting using SQLite driver

cdmDatabaseSchema <- "main"
oracleTempSchema <- NULL
cohortDatabaseSchema <- "main"
cohortTable <- "cohort"

# Construct a cohort of people
cohortMembers <- querySql(connection, "SELECT 1 cohort_definition_id, 
                                              person_id subject_id, 
                                              observation_period_start_date cohort_start_date, 
                                              observation_period_end_date cohort_end_date 
                                       FROM observation_period")

# Insert into the cohort table
DatabaseConnector::insertTable(connection = connection,
                               tableName = cohortTable,
                               data = cohortMembers,
                               dropTableIfExists = FALSE,
                               createTable = FALSE)

# Characterize visit counts over the following time windows
tempCovaraitesSettings <- createTemporalCovariateSettings(
  useVisitConceptCount = TRUE, 
  temporalStartDays = c(0,31,61,91), 
  temporalEndDays =   c(30,60,90,365*2+1)
)

tempCov <- getDbCovariateData(
  connectionDetails = connectionDetails,
  oracleTempSchema = oracleTempSchema,
  cdmDatabaseSchema = cdmDatabaseSchema,
  cohortTable = cohortTable,
  cohortDatabaseSchema = cdmDatabaseSchema,
  cohortTableIsTemp = FALSE,
  cohortId = 1,
  rowIdField = "subject_id",
  covariateSettings = tempCovaraitesSettings,
  aggregated = TRUE
)
#> Connecting using SQLite driver
#> Sending temp tables to server
#> Constructing features on server
#>   |                                                                              |                                                                      |   0%  |                                                                              |==                                                                    |   4%  |                                                                              |=====                                                                 |   7%  |                                                                              |========                                                              |  11%  |                                                                              |==========                                                            |  14%  |                                                                              |============                                                          |  18%  |                                                                              |===============                                                       |  21%  |                                                                              |==================                                                    |  25%  |                                                                              |====================                                                  |  29%  |                                                                              |======================                                                |  32%  |                                                                              |=========================                                             |  36%  |                                                                              |============================                                          |  39%  |                                                                              |==============================                                        |  43%  |                                                                              |================================                                      |  46%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  54%  |                                                                              |========================================                              |  57%  |                                                                              |==========================================                            |  61%  |                                                                              |=============================================                         |  64%  |                                                                              |================================================                      |  68%  |                                                                              |==================================================                    |  71%  |                                                                              |====================================================                  |  75%  |                                                                              |=======================================================               |  79%  |                                                                              |==========================================================            |  82%  |                                                                              |============================================================          |  86%  |                                                                              |==============================================================        |  89%  |                                                                              |=================================================================     |  93%  |                                                                              |====================================================================  |  96%  |                                                                              |======================================================================| 100%
#> Executing SQL took 0.0273 secs
#> Fetching data from server
#> Fetching data took 0.125 secs

data.frame(tempCov$timeRef)
#>   timeId startDay endDay
#> 1      1        0     30
#> 2      2       31     60
#> 3      3       61     90
#> 4      4       91    731
data.frame(tempCov$covariatesContinuous)
#>   cohortDefinitionId covariateId countValue minValue maxValue averageValue
#> 1                  1     9201911         15        0        1  0.002807412
#>   standardDeviation medianValue p10Value p25Value p75Value p90Value timeId
#> 1                 0           0        0        0        0        0     NA

Created on 2021-01-12 by the reprex package (v0.3.0)

As shown, the temporal analysis attempts to use 4 different time periods but the resulting continuous covariate result does not use the timeId as you originally reported. So, I dug a bit deeper to try and find the issue and tracked it down to the ConceptCount.sql file. What I found is that some of the SQL operations in this file do not use the time_id column throughout the analysis for temporal analysis. Additionally, I found the following code in that script that seems to imply that we do not want to consider the time_id as part of the results:

{@temporal} ? {
CAST(NULL AS INT) AS time_id,
}

@schuemie - tagging you here to get your input on this particular code to ask: is this a bug or a decision to not include the temporal time identifiers as part of the results when constructing these types of covariates? Or does this perspective change at all when doing this at a patient level vs aggregating the results?

@javier-gracia-tabuenca-tuni
Copy link
Author

Thanks Anthon. The asthma cohort was in atlas-demo.ohdsi.org, but yeap, using query to build cohort much better for reproducing. I'll do it like that next time.

I see you dug deeper than me. Let's wait what @schuemie has to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants