Skip to content
Venkatesh Venkataramani edited this page Nov 16, 2022 · 8 revisions

Introduction

pyQualitas is a data quality library built on top of Apache Spark (PySpark) which enables to define and validate various data quality checks on the data. The result from the checks will be converted into a HTML report which will be easier to read for QA Engineers.

How does it work?

  • The checks are defined in your python code as checksuites which contains Test Case Name, Test Description, Method which performs the check
  • The checksuite will return a collection containing the Test Case Name, Test Description, Status of each check as Passed or Failed
  • The library also has an option to publish & save the test report/results as a HTML file which can be shared among the stakeholders.
  • The library will be packaged and will be available for installation using pip inside any python virtual environment.

Getting started

The package can be installed using the following command:

pip install pyQualitas

Checkout next section - Writing your first suite of checks