Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing entrires in results of query1-0.json #24

Closed
liuchangshiye opened this issue Apr 5, 2022 · 4 comments · Fixed by #25
Closed

Missing entrires in results of query1-0.json #24

liuchangshiye opened this issue Apr 5, 2022 · 4 comments · Fixed by #25
Assignees

Comments

@liuchangshiye
Copy link
Collaborator

After executing the command with query1-0.json to obtain the specific users from the health dataset, COOL shows 8513 users. But the correct number of users should be 8592. The user ids that start with "P-19709" are all missed. The missed user ids are as follows.

 P-19709, P-19712, P-19713, P-19718, P-19720, P-19721, P-19732, P-19749, P-19762, P-19767,
 P-19772, P-19773, P-19776, P-19777, P-19778, P-19780, P-19782, P-19789, P-19790, P-19791,
 P-19797, P-19798, P-19799, P-19801, P-19803, P-19805, P-19806, P-19809, P-19812, P-19813,
 P-19820, P-19830, P-19834, P-19841, P-19842, P-19843, P-19844, P-19846, P-19860, P-19861,
 P-19864, P-19872, P-19874, P-19879, P-19882, P-19887, P-19890, P-19893, P-19901, P-19902,
 P-19907, P-19909, P-19910, P-19911, P-19915, P-19919, P-19923, P-19929, P-19931, P-19941,
 P-19944, P-19948, P-19950, P-19952, P-19954, P-19959, P-19962, P-19964, P-19972, P-19973,
 P-19975, P-19978, P-19983, P-19985, P-19989, P-19991, P-19992, P-19994, P-19998
@hugy718
Copy link
Collaborator

hugy718 commented Apr 5, 2022

Found the bug. It is due to incorrect dealing with time window, when birth time is determined in getUserBirthTime in the ExtendedCohortSelection class. maxDate of this class is the max date in dataset. The date range for locating birth event in the original code needs to fall in the date of the user's first record and the max date - window size. This creates a problem for tail users that have their records very close to the max date. So their ranges to find birth event are empty ranges. Now I set the range to look for birth event for one user to be from the date of its first record to the date of its last record. It can output all the users.

@KimballCai
Copy link
Contributor

I agree with the reason that causes this issue, but I do not agree to change the code because it is not the problem of the core logic. Instead, we need to add introductions about how to utilize the time window attribute.

@KimballCai KimballCai self-assigned this Apr 5, 2022
@KimballCai
Copy link
Contributor

If we want to use the time window, it means that we only want to select users who have records for at least x days where x is defined in the time window.

@KimballCai
Copy link
Contributor

But we also need to consider the users whose last record date is earlier than the max data in the data but do not have records for continuous x days.

KimballCai added a commit that referenced this issue Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants