Skip to content

Latest commit

 

History

History
10 lines (5 loc) · 1.26 KB

data-quality.md

File metadata and controls

10 lines (5 loc) · 1.26 KB

DQ is the difference

Lake vs Swamp

The difference between a business-critical lake and a swamp is data quality. One organization’s Data Lake may be another's Data Swamp. The difference lies in how data is curated. A Data Lake describes a vast amount of data that can be stored, assessed, and analyzed. A Data Swamp has little data governance, DQ automation, or contextual metadata.

The accuracy and cleanliness of data is directly proportional to the quality of insights end-users will derive. Data lakes that gain broad adoption have strong governance programs. The challenge is, adding a DQ program typically takes 6-12 months but the project never really ends due to the volume, variety and velocity of incoming data. OwlDQ uses autoML so solve this problem. OwlDQ constantly monitors the lake with native integration and unlimited scale. Use OwlDQ to generate the equivalent of 10K rules, while continuously adapting to the natural variance in your data. When erroneous data enters your lake OwlDQ will alert the data steward and provide a rich visual displaying the break records and explainable AI describing the issue. OwlDQ's approach is to learn from data and become incrementally smarter each day to ensure a statistically defensible DQ program.