-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing some elementary data quality tests #3
Implementing some elementary data quality tests #3
Conversation
Nice to see you getting started! Just write, if I can help with anything! |
Thanks! I would like to implement a simple use case/example:
There is a simple thing where you could maybe help, @stefannegele : writing the part (in https://github.com/data-engineering-helpers/datacontract-cli/blob/main/quality.go) where the specification is cross-referenced from another YAML file; which is the case in my example: https://github.com/data-engineering-helpers/data-contracts/blob/main/datacontract.com/contracts/data-contract-flight-route.yaml#L59 |
If you need contributor access to the https://github.com/data-engineering-helpers GitHub organization, do not hesitate. Just let me know and I'll add you to one of the teams (https://github.com/orgs/data-engineering-helpers/teams). |
Hi @da115115 , thanks for the insights. I also think integrating SODA via a command line execution is fine for now. We will figure out a good solution, while integrating other tools. I will put the resolution of referenced resources in my to-do list! |
From what I can see in the Go code, yes, that is exactly what I meant. I have yet to test it, especially the part where a remote reference is downloaded locally. As a matter of fact, the same way there is a default file-name for local data contract files (that is |
We have just released version 0.3.0. The new Tell me, if I can help to merge my latest changes. |
Thanks! I've just tried the new Here's what I've tried:
2.2. Revert local changes
|
And on this branch/pull request, I've merged the last changes, up until 0.3.0, and there are now a few compilation issues:
I guess the Thanks! |
I tried the command on this file. Your "$ref ..." should be "$ref: ..." (adding colons). Then it works.
Correct. You now can just use |
Thanks, indeed the inlining now seems to work better. However, if I follow up with a
Moreover, it may be better, when executing the Moreover, if we want to keep the cross-reference pattern for local files, we could have specific parameters like |
I just added some tickets. Does not mean I will implement it right now, but it's open for discussion. The bug with lint, I will check when I have time - probably end of the week. I wanted to do some cleanup and add better testing anyways. edit: |
Since my data contracts are in Git (https://github.com/data-engineering-helpers/data-contracts/blob/main/datacontract.com/contracts/data-contract-flight-route.yaml), it is easy enough to do |
@stefannegele , the integration of SodaCL is now working. It is still simple and not that much robust, but it is a good start (IMHO). $ python -mpip install -U duckdb
$ mkdir -p ~/dev/infra/data-contracts && \
git clone https://github.com/data-engineering-helpers/data-contracts.git ~/dev/infra/data-contracts/data-contracts && \
cd ~/dev/infra/data-contracts/data-contracts/datacontract.com
$ datacontract quality-init --file contracts/data-contract-flight-route.yaml --quality-file contracts/data-contract-flight-route-quality.yaml
$ duckdb quality/db.duckdb < sql/duckdb-ddl-create-view-from-csv.sql
$ datacontract quality-check --file contracts/data-contract-flight-route.yaml --quality-file contracts/data-contract-flight-route-quality.yaml Basically:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @da115115, nice work!
I had a brief look at your changes and added some comments. Please tell me if I can help or if I misunderstood something.
I plan to merge tomorrow when I have time for my adjustments. Thank you! |
datacontract.com
CLI utility: Validate-quality-object question #2