Skip to content

bdiouf/data-quality

Repository files navigation

#alt text

Data Quality Libraries

This repository contains the source files of Talend Data Quality libraries.

Content structure

Project Description
dataquality-common Abstractions of data analysis, and low-level utilities such as East Asian text pattern recognition
dataquality-email Email validation library
dataquality-libraries Parent pom aggregating other library projects, devops tools
dataquality-record-linkage Record Matching algorithms, blocking key calculation and T-Swoosh
dataquality-sampling Reservoir sampling, data masking, data duplication
dataquality-semantic-model Definition of semantic category related objects
dataquality-semantic API for semantic category analysis
dataquality-standardization Standardization library based on Apache Lucene
dataquality-statistics API for data analysis and statistics (require JDK1.8)
dataquality-wordnet Content validation API based on WordNet dictionary

Product Download

Talend Open Studio for Data Quality can be download from the Talend website.

Build

  • All project are maven based.
  • The parent pom builds all the libraries.

License

Copyright (c) 2006-2016 Talend

Licensed under the Apache Licence v2