Skip to content

Quickstart

Ivan Zhang edited this page Nov 1, 2023 · 12 revisions

1) Installation

Install the latest version of the Panda Patrol package using pip:

pip install panda-patrol

2) Wrap your existing data tests

Attach patrols to your existing data tests with patrol_group and patrol. The following example uses dagster as the data pipeline. However, you can use whatever Python-based data pipeline.

At a high level, you do the following:

  1. Import patrol_group
  2. Group several data tests with patrol_group
  3. Wrap each individual existing data test with patrol
from panda_patrol.patrols import patrol_group
...
with patrol_group(PATROL_GROUP_NAME) as patrol:
    @patrol(PATROL_NAME)
    def DATA_TEST_NAME():
        ...

Here is a more detailed example of how to wrap a data test in a dagster pipeline (hello-dagster.py from https://docs.dagster.io/getting-started/hello-dagster).

Before:

def hackernews_top_stories(context: AssetExecutionContext):
    """Get items based on story ids from the HackerNews items endpoint."""
    with open("hackernews_top_story_ids.json", "r") as f:
        hackernews_top_story_ids = json.load(f)

    results = []
	# Get information about each item including the url
    for item_id in hackernews_top_story_ids:
        item = requests.get(
            f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
        ).json()
        results.append(item)

        # DATA TEST: Make sure that the item's URL is a valid URL
        for item in results:
		print(item["url"])
		get_item_response = requests.get(item["url"])
		assert get_item_response.status_code == 200
    ...

After:

+ from panda_patrol.patrols import patrol_group
...
def hackernews_top_stories(context: AssetExecutionContext):
    """Get items based on story ids from the HackerNews items endpoint."""
    with open("hackernews_top_story_ids.json", "r") as f:
        hackernews_top_story_ids = json.load(f)

    results = []
	# Get information about each item including the url
    for item_id in hackernews_top_story_ids:
        item = requests.get(
            f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json"
        ).json()
        results.append(item)

    # DATA TEST: Make sure that the item's URL is a valid URL
+   with patrol_group("Hackernews Items are Valid") as patrol:
+	@patrol("URLs work")
+	def urls_work():
		"""URLs for stories should work."""
		for item in results:
			print(item["url"])
			get_item_response = requests.get(item["url"])
			assert get_item_response.status_code == 200
		
		return len(results)
    ...

3) Run your data pipeline

Start and run your data pipelines as you normally would. For example, if you are using dagster, you can run the following command:

dagster dev -f hello-dagster.py

4) View the results

In the output of your data pipeline, you should see a link to the Panda Patrol dashboard. Click on the link to view the dashboard. It should look like

See your Panda Patrol dashboard here: https://panda-patrol.vercel.app/public/public-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

You should see something like the following:

Dashboard

Panda Patrol Dashboard

Run Details Log

Congrats! You have created your first data test dashboard! See Features for more information on other features like adjustable parameters, alerting, silencing, and saving data profiles.