Expected behavior
Three Atlas functions that work fine on SQL Server are not working on Spark: Characterizations, Cohort Pathways, and Cohort Reports (Heracles reports). Characterizations worked fine in Spark/Databricks in the 2.11 release of Atlas and WebAPI, but do not work in 2.12.1
Actual behavior
Each of those types of analyses fail - usually after significant runtime. In each case the root cause appears to be SqlRender error in substituting <START_WITH> and <END_WITH>. See issue # 330 in SqlRender for more detail. There is also discussion on this forum post.
Steps to reproduce behavior
Characterizations
Load the attached json into Chacterizations. Select Executions and click Generate
Lipitor Users vs Zocor Users.json.txt
There are multiple queries like show columns in #analysis_ref where the temporary table names are not substituted to something like show columns in tmp.analysis_ref. However, those queries do not cause the job to fail.
Instead, it fails the first time there is a unprocessed <START_WITH> substitution. The error file is below.
Lipitor Users vs Zocor Users.error.txt
Cohort Pathways
Load the attached json into Cohort Pathways. Select Executions, and click Generate
Diabetic Treatment Pathway.json.txt
The error message generated is attached.
Diabetic Treatment Pathway.error.txt
Cohort / Heracles Reports.
Load the attached json cohort. Generate the cohort. Then, under Cohort Reporting, select that source and choose Quick Analysis
[COVID HCQ ID62 v1] New users of sulfasazine with prior rheumatoid arthritis.json.txt
The Heracles reports after a few seconds. The SQL query that caused the error is attached.
Cohort-Heracles.error.txt
Expected behavior
Three Atlas functions that work fine on SQL Server are not working on Spark: Characterizations, Cohort Pathways, and Cohort Reports (Heracles reports). Characterizations worked fine in Spark/Databricks in the 2.11 release of Atlas and WebAPI, but do not work in 2.12.1
Actual behavior
Each of those types of analyses fail - usually after significant runtime. In each case the root cause appears to be SqlRender error in substituting <START_WITH> and <END_WITH>. See issue # 330 in SqlRender for more detail. There is also discussion on this forum post.
Steps to reproduce behavior
Characterizations
Load the attached json into Chacterizations. Select Executions and click Generate
Lipitor Users vs Zocor Users.json.txt
There are multiple queries like
show columns in #analysis_refwhere the temporary table names are not substituted to something likeshow columns in tmp.analysis_ref. However, those queries do not cause the job to fail.Instead, it fails the first time there is a unprocessed <START_WITH> substitution. The error file is below.
Lipitor Users vs Zocor Users.error.txt
Cohort Pathways
Load the attached json into Cohort Pathways. Select Executions, and click Generate
Diabetic Treatment Pathway.json.txt
The error message generated is attached.
Diabetic Treatment Pathway.error.txt
Cohort / Heracles Reports.
Load the attached json cohort. Generate the cohort. Then, under Cohort Reporting, select that source and choose Quick Analysis
[COVID HCQ ID62 v1] New users of sulfasazine with prior rheumatoid arthritis.json.txt
The Heracles reports after a few seconds. The SQL query that caused the error is attached.
Cohort-Heracles.error.txt