Skip to content
Ivan Zhang edited this page Dec 1, 2023 · 14 revisions

Welcome to the Panda Patrol Wiki!

Do you have data tests within your data pipelines? Do you have data tests in many different pipelines? Do you want to be able to monitor the health of your data tests and pipelines? Do you want all these tests and monitors in one place? Do you want greater context into your data pipelines and the data flowing them at any given step of the process? If you answered yes to any of these questions, then Panda Patrol is for you!

Panda Patrol has the following core features:

  1. Out of the box general column-based data tests for accuracy, completeness, duplicates, enums, freshness, and volume
  2. Automatically generates data tests based off of your data
  3. Stores and tracks data profiles that you generate from data in your pipelines
  4. Automatically create dashboards, alerting, and silencing around custom data tests in your data pipelines
  5. Quickly and easily detect anomalies in your data
  6. Monitor each step of your data pipelines
  7. Fully open-source and free to use

The best part? Panda Patrol doesn't require you to uproot your existing data pipeline setup. Simply drop it in and you're good to go. Get started with less than 5 lines of code. To get started, check out the Quickstart guide to see how to quickly start using Panda Patrol.

Note that Panda Patrol is not a data testing library. It provides additional tooling and functionality around your existing data tests. For your data tests, you can write any Python code or use one of the many existing open-source data testing libraries out there.

Note that Panda Patrol does not provide any orchestration. It is meant to be dropped into your existing data tests within your existing data pipelines. It is then executed as part of your data pipeline. There are many great orchestration tools out there already.

panda-patrol