-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue #3: Add Support for Identifying Identical Table Names Across Different Data Sets #4
base: main
Are you sure you want to change the base?
Fix Issue #3: Add Support for Identifying Identical Table Names Across Different Data Sets #4
Conversation
}, | ||
"second_table": { | ||
"id_in_accepted_values": "id IN (1, 2, 3)" | ||
// Format: "schema": { "table": { "conditionName": "conditionQuery", ... }, ... } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected Format is here.
"childSchema": "dataform", | ||
"childTable": "second_table", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add childSchema
.tags("assert-data-completeness") | ||
.query(ctx => `SELECT COUNT(*) AS total_rows, | ||
SUM(CASE WHEN ${columnName} IS NULL THEN 1 ELSE 0 END) AS null_count | ||
FROM ${ctx.ref(tableName)} | ||
FROM ${ctx.ref(schemaName, tableName)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference schemaName.tableName
for (let schemaName in dataCompletenessConditions) { | ||
const tableNames = dataCompletenessConditions[schemaName]; | ||
for (let tableName in tableNames) { | ||
const columnConditions = tableNames[tableName]; | ||
createDataCompletenessAssertion(globalParams, schemaName, tableName, columnConditions); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix nest structure.
.tags("assert-data-freshness") | ||
.query(ctx => ` | ||
WITH | ||
freshness AS ( | ||
SELECT | ||
DATE_DIFF(CURRENT_DATE(), MAX(${dateColumn}), ${timeUnit}) AS delay | ||
FROM | ||
${ctx.ref(tableName)} | ||
${ctx.ref(schemaName, tableName)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference schemaName.tableName
.database(globalParams.database) | ||
.schema(globalParams.schema) | ||
.description(`Check referential integrity for ${childTable}.${childKey} referencing ${parentTable}.${parentKey}`) | ||
.tags("assert-referential-integrity") | ||
.query(ctx => ` | ||
SELECT pt.${parentKey} | ||
FROM ${ctx.ref(parentTable)} AS pt | ||
LEFT JOIN ${ctx.ref(childTable)} AS t ON t.${childKey} = pt.${parentKey} | ||
FROM ${ctx.ref(parentSchema, parentTable)} AS pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference parentSchema.parentTable
FROM ${ctx.ref(parentTable)} AS pt | ||
LEFT JOIN ${ctx.ref(childTable)} AS t ON t.${childKey} = pt.${parentKey} | ||
FROM ${ctx.ref(parentSchema, parentTable)} AS pt | ||
LEFT JOIN ${ctx.ref(childSchema, childTable)} AS t ON t.${childKey} = pt.${parentKey} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference childSchema.childTable
for (let tableName in dataCompletenessConditions) { | ||
const columnConditions = dataCompletenessConditions[tableName]; | ||
createDataCompletenessAssertion(globalParams, tableName, columnConditions); | ||
for (let schemaName in dataCompletenessConditions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- get
schemaName
- get
tableNames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrialan
Please review this PR!
Thanks a lot for you support @KazuSh1geru ! This is indeed a super important feature that need to be implemented. What do you think about making the schema argument optional? You can have projects with many datasets, and specify them one by one can be very long and we loose usage simplicity. I am thinking about something like that: const commonAssertions = require("dataform-assertions");
const commonAssertionsResult = commonAssertions({
globalAssertionsParams: {
"schema": "dataform_assertions_" + dataform.projectConfig.vars.env,
"location": "asia-northeast1",
"tags": ["assertions"],
},
uniqueKeyConditions: {
"schema1.filter": ["Id"],
"schema2.filter": ["Id"],
"otherTable": ["Id"]
}
}); |
Thank you for your comment!! I believe my method of writing the schema in isolation is better from 2 perspectives. As the first, the above method of specifying tables is not sufficient. Second, it is against the Dry Principle. const commonAssertionsResult = commonAssertions({
globalAssertionsParams: {
"schema": "dataform_assertions_" + dataform.projectConfig.vars.env,
"location": "asia-northeast1",
"tags": ["assertions"],
},
uniqueKeyConditions: {
"schema1.filter": ["Id"],
"schema1.preprocess": ["Id"],
"schema1.postprocess": ["Id"],
"schema2.filter": ["Id"],
"schema2.postprocess": ["Id"],
"schema3.filter": ["Id"],
"schema3.preprocess": ["Id"],
"schema3.postprocess": ["Id"],
"schema3.transform": ["Id"],
"otherTable": ["Id"]
}
}); isolation schema pattern const commonAssertionsResult = commonAssertions({
globalAssertionsParams: {
"schema": "dataform_assertions_" + dataform.projectConfig.vars.env,
"location": "asia-northeast1",
"tags": ["assertions"],
},
uniqueKeyConditions: {
"schema1": {
"filter": ["Id"],
"preprocess": ["Id"],
"postprocess": ["Id"]
},
"schema2": {
"filter": ["Id"],
"postprocess": ["Id"]
},
"schema3": {
"filter": ["Id"],
"preprocess": ["Id"],
"postprocess": ["Id"],
"transform": ["Id"]
},
"otherTable": ["Id"]
}
}); |
@hrialan |
ok, I agree for the isolation pattern, It seems to be more readable. Before merging, here are two points to improve:
Can be implemented in a future version if it is not so complicated. As the example here for "otherTable" : const commonAssertionsResult = commonAssertions({
globalAssertionsParams: {
"schema": "dataform_assertions_" + dataform.projectConfig.vars.env,
"location": "asia-northeast1",
"tags": ["assertions"],
},
uniqueKeyConditions: {
"schema3": {
"filter": ["Id"],
"preprocess": ["Id"],
"postprocess": ["Id"],
"transform": ["Id"]
},
"otherTable": ["Id"] // THIS is not working
}
});
In many projects, envs are included in the dataset name. Do you have an idea on how we can do this with your solution ? Example: const commonAssertionsResult = commonAssertions({
globalAssertionsParams: {
"schema": "dataform_assertions_" + dataform.projectConfig.vars.env,
"location": "asia-northeast1",
"tags": ["assertions"],
},
uniqueKeyConditions: {
"schema3" + dataform.projectConfig.vars.env : { // THIS is not working
"filter": ["Id"],
"preprocess": ["Id"],
"postprocess": ["Id"],
"transform": ["Id"]
},
"otherTable": ["Id"]
}
}); Thanks! |
What is this PR ?
This PR solve #3
Overview
Information
Context Method:
ref
ref
method use individual arguments for the "schema", and "name" values.document