Skip to content

profile tabular datasets, manage automatic validation for new datasets, automatic handling for quality issues.

License

Notifications You must be signed in to change notification settings

Ezzaldin97/dprofiler

Repository files navigation

[interrogate]

qprofiler

qprofiler is a Python package that provides an intelligent way to create a data quality profile for your development(train) dataset(s) and save it as a reference to use in creating quality check tests and automatic handling cases for production(test) datasets.

Table of Contents

Installation

The source code is currently hosted on GitHub at: dprofiler-github

Binary installers for the latest released version are available at the PyPi

# PyPi
pip install qprofiler

Dependencies

  • Polars(>=0.19.0 <0.20.0)
  • PyYAML(>=6.0.1 <7.0.0)
  • Pathlib(>=1.0 <2.0)
  • rumamel.yaml(>=0.17.32 <0.18.0)

Usage

check the notebook that contains everything about how to use DataProfiler module in profiling datasets, and how to use QTest module to create quality check tests.

check the notebook that contains everything about how to use QPipeline

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Licence

MIT

New in v0.2.3

  • Create Quality Pipeline(v0.2.2).
  • enhance documentation.
  • add utility method in DataProfiler to .dprofiler structure tree.

New in v0.3.0

  • Modify file structure of .dprofiler.
  • add Metadata Module of .dprofiler.
  • modify test cases.