Skip to content

Develop your first Data quality test using AWS Deequ

Notifications You must be signed in to change notification settings

ajithshetty/spark-deequ-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AWS Deequ Demo

https://ajithshetty28.medium.com/deequ-i-mean-data-quality-a0e6c048469d

Introduction Deequ is a library built on top of Apache Spark for defining “unit tests for data”, which measure data quality in large datasets. Python users may also be interested in PyDeequ, a Python interface for Deequ. You can find PyDeequ on GitHub, readthedocs, and PyPI. source: https://github.com/awslabs/deequ

What does it do

Amazon Deequ would help you in:

Metrics Computation: You can use Deequ to get the quality metrics like maximum, minimum, correlation, completeness etc. Once the metrics are calculated you can store the data in S3 to analyse at later point.

Constraint Verification: You may define the constraint verification and the Deequ will generates the data quality report.

Constraint Suggestion: Well Deequ is smart enough to generate automated constraints based on the data you define.

About

Develop your first Data quality test using AWS Deequ

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published