A CLI tool to parse dataform compilation output generated by running dataform compile --json
to get insights on
- cost
- errors
- other metadata
at table and tag level.
at table and tag level.
To explore all capabilities of dj
run the following command
dj help
Cost of of running all tables in Dataform project
dj --json-file <path-to-json-file> table-ops cost --all
OR (using stdin as input )
dataform compile --json | dj table-ops cost --all
Cost of of running a file
dj --json-file <path-to-json-file> table-ops cost --file <file-name-wo-extension>
Compiled query for a file
dj --json-file <path-to-json-file> table-ops query --file <file-name-wo-extension>
List unique tags in Dataform project
dj --json-file <path-to-json-file> tag-ops --unique
Cost of of running a tag
dj --json-file <path-to-json-file> tag-ops cost --tag <tag-name>
Latest release
curl -sSfL https://raw.githubusercontent.com/ashish10alex/dj/main/install_latest.sh | bash
OR
go install github.com/ashish10alex/dj@latest
This installs the binary dj
to $GOBIN, which defaults to $GOPATH/bin.
OR
Manually clone the repository and build the cli and add the cli to your system path
git clone <repo-url>
go build -o dj
mv dj /usr/local/bin/dj
Transfer the cost of tags to BigQuery along with dry run metadata by adding the following just config
tags_cost_to_bigquery:
dataform compile --json > out.json;
dj --json-file out.json tag-ops cost --all > res.json
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect dataset_id.TAGS_RUN_COSTS res.json
Query the cost of a tag when it is successfully transferred to BigQuery
SELECT
GitMetadata.GitRepositoryId,
GBProcessed,
Cost,
HasError,
RunDateTime,
FROM `gcp-project-id.dataset_id.TAGS_RUN_COSTS`
WHERE 1=1
AND Tag = "xxx"
AND RunDateTime IN (
SELECT RunDateTime FROM (
SELECT RunDateTime, ROW_NUMBER() OVER(PARTITION BY GitMetadata.GitRepositoryId ORDER BY RunDateTime DESC ) RN FROM `gcp-project-id.dataset_id.TAGS_RUN_COSTS` QUALIFY RN = 1
)
)
Query cost of all tags in a Dataform prject
SELECT
GitMetadata.GitRepositoryId,
Tag,
GBProcessed,
Cost,
HasError,
RunDateTime,
FROM `gcp-project-id.dataset_id.TAGS_RUN_COSTS`
WHERE 1=1
AND GitMetadata.GitRepositoryId = 'xxx'
AND RunDateTime IN (
SELECT RunDateTime FROM (
SELECT RunDateTime, ROW_NUMBER() OVER(PARTITION BY GitMetadata.GitRepositoryId ORDER BY RunDateTime DESC ) RN FROM `gcp-project-id.dataset_id.TAGS_RUN_COSTS` QUALIFY RN = 1
)
)