Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate current clickhouse query #11

Open
alisman opened this issue May 14, 2024 · 1 comment
Open

Validate current clickhouse query #11

alisman opened this issue May 14, 2024 · 1 comment
Assignees

Comments

@alisman
Copy link
Collaborator

alisman commented May 14, 2024

No description provided.

@alisman alisman self-assigned this May 14, 2024
@inodb inodb assigned uklineale and unassigned alisman May 16, 2024
@inodb inodb changed the title Validate current CH query Validate current clickhouse query May 16, 2024
@alisman
Copy link
Collaborator Author

alisman commented May 20, 2024

@uklineale @haynescd

Here's are a few examples from the public portal of clinical data which is declared type=numeric but has non-numerical values

image.png

You can obtain this list with a query like this. There is also clinical_sample table to look through. We need to figure out:

  • What are the documented rules around these kind of values
  • Does legacy code actually handle these properly?
  • Is there any way to handle these performantly in Clickhouse since we're going to have to handle at least some of these cases
SELECT DISTINCT ATTR_VALUE FROM clinical_patient targ
                  JOIN clinical_attribute_meta cam on targ.ATTR_ID = cam.ATTR_ID
WHERE DATATYPE = 'number'
  AND NOT REGEXP_LIKE(ATTR_VALUE, '^-?[0-9.]+$')
UNION
SELECT DISTINCT ATTR_VALUE FROM clinical_sample targ
                                    JOIN clinical_attribute_meta cam on targ.ATTR_ID = cam.ATTR_ID
WHERE DATATYPE = 'number'
  AND NOT REGEXP_LIKE(ATTR_VALUE, '^-?[0-9.]+$')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants