-
Notifications
You must be signed in to change notification settings - Fork 0
Quickstart
Install the latest version of the Panda Patrol package using pip:
pip install panda-patrol
Attach patrols to your existing data tests with patrol_group
and patrol
. The following example uses dagster as the data pipeline. However, you can use whatever Python-based data pipeline.
At a high level, you do the following:
- Import
patrol_group
- Group several data tests with
patrol_group
- Wrap each individual existing data test with
patrol
from panda_patrol.patrols import patrol_group
...
with patrol_group(PATROL_GROUP_NAME) as patrol:
@patrol(PATROL_NAME)
def DATA_TEST_NAME():
...
Here is a more detailed example of how to wrap a data test in a dagster pipeline (hello-dagster.py
from https://docs.dagster.io/getting-started/hello-dagster).
Before:
def hackernews_top_stories(context: AssetExecutionContext):
"""Get items based on story ids from the HackerNews items endpoint."""
with open("hackernews_top_story_ids.json", "r") as f:
hackernews_top_story_ids = json.load(f)
results = []
# Get information about each item including the url
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)
# DATA TEST: Make sure that the item's URL is a valid URL
for item in results:
print(item["url"])
get_item_response = requests.get(item["url"])
assert get_item_response.status_code == 200
...
After:
+ from panda_patrol.patrols import patrol_group
...
def hackernews_top_stories(context: AssetExecutionContext):
"""Get items based on story ids from the HackerNews items endpoint."""
with open("hackernews_top_story_ids.json", "r") as f:
hackernews_top_story_ids = json.load(f)
results = []
# Get information about each item including the url
for item_id in hackernews_top_story_ids:
item = requests.get(
f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
).json()
results.append(item)
# DATA TEST: Make sure that the item's URL is a valid URL
+ with patrol_group("Hackernews Items are Valid") as patrol:
+ @patrol("URLs work")
+ def urls_work():
"""URLs for stories should work."""
for item in results:
print(item["url"])
get_item_response = requests.get(item["url"])
assert get_item_response.status_code == 200
return len(results)
...
Start and run your data pipelines as you normally would. For example, if you are using dagster, you can run the following command:
dagster dev -f hello-dagster.py
In the output of your data pipeline, you should see a link to the Panda Patrol dashboard. Click on the link to view the dashboard. It should look like
See your Panda Patrol dashboard here: https://panda-patrol.vercel.app/public/public-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
You should see something like the following:
Dashboard
Run Details
Congrats! You have created your first data test dashboard! See Features for more information on other features like adjustable parameters, alerting, silencing, and saving data profiles.
Documentation