-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing entrires in results of query1-0.json #24
Comments
Found the bug. It is due to incorrect dealing with time window, when birth time is determined in getUserBirthTime in the ExtendedCohortSelection class. maxDate of this class is the max date in dataset. The date range for locating birth event in the original code needs to fall in the date of the user's first record and the max date - window size. This creates a problem for tail users that have their records very close to the max date. So their ranges to find birth event are empty ranges. Now I set the range to look for birth event for one user to be from the date of its first record to the date of its last record. It can output all the users. |
I agree with the reason that causes this issue, but I do not agree to change the code because it is not the problem of the core logic. Instead, we need to add introductions about how to utilize the time window attribute. |
If we want to use the time window, it means that we only want to select users who have records for at least x days where x is defined in the time window. |
But we also need to consider the users whose last record date is earlier than the max data in the data but do not have records for continuous x days. |
After executing the command with query1-0.json to obtain the specific users from the health dataset, COOL shows 8513 users. But the correct number of users should be 8592. The user ids that start with "P-19709" are all missed. The missed user ids are as follows.
The text was updated successfully, but these errors were encountered: