Skip to content

Latest commit

 

History

History
299 lines (229 loc) · 11.6 KB

debugging-queries.mdx

File metadata and controls

299 lines (229 loc) · 11.6 KB
title description
Debugging Queries
Learn techniques for debugging online and offline queries

import { Attribute, AttributeTable, SubAttribute, SubAttributeTable, } from '@/components/AttributeTable'


Online and offline queries can fuse data from multiple sources, and the structure of how data flows through queries can be complex. Chalk has a number of tools to help you debug queries. In this section, we'll explore how to use these tools on a few simple features and resolvers.

First, we'll define our feature classes. We'll work with a User and a Transaction object, representing users performing financial transactions. In this scenario, we'll write a simple aggregation that has a bug in its definition, and explore various ways to debug that issue.

from datetime import datetime
from chalk import features, has_many, DataFrame

@features
class Transaction:
    id: int
    user_id: int
    amount: float
    ts: datetime

@features
class User:
    id: int
    sum_transaction_amount: float
    transactions: DataFrame[Transaction] = has_many(lambda: Transaction.user_id == User.id)

Then, we'll define simple online resolvers to ingest user and transaction data from a postgres database:

-- (users.chalk.sql)
-- resolves: User
-- source: postgres

select id from users
-- (transactions.chalk.sql)
-- resolves: Transaction
-- source: postgres

select id, user_id, amount, ts from transactions

Next, we'll write a simple (buggy) aggregation resolver to compute the sum of a user's transactions:

from chalk import online

@online
def sum_transaction_amount(txns: User.transaction[Transaction.amount]) -> User.sum_transaction_amount:
    return txns[Transaction.amount].count() # this 'count' is a bug! we'll debug it shortly.

Notice that we've defined this aggregation with a bug! We're actually computing the count of the User's Transactions, rather than the sum of their amounts.

Let's run a query --

data = ChalkClient().query(input={User.id: 1}, output=[User.sum_transaction_amount])

print(data.get_value(User.sum_transaction_amount))

# Outputs 100!!

Whoops! Because we have prior knowledge of our data, we know that 100 is the wrong value for User 1s transaction sum. Let's debug this issue.


Query Plan Visualizer

The query plan visualizer is a tool that allows you to see the structure of a query, and also inspect the data that flows through it.

To make full use of the query plan visualizer, we can re-execute the previous query with the store_plan_stages=True kwarg. This stores all of the data that passes through each stage of the query plan.

ChalkClient.query(input={User.id: 1}, output=[User.sum_transactions], store_plan_stages=True)

# -or to debug multiple users at once-
ChalkClient.offline_query(input={User.id: [1,2,3]}, output=[User.sum_transactions], store_plan_stages=True)

Then, we navigate to the 'Queries' tab in the web dashboard. Our query appear under the 'Online' or 'Offline' tab depending on which kind of query. Here we'll take a look at the online query, and we can see the structure of the query:

Query Plan Visualizer

The query plan visualizer shows us the structure of the query. Note that the transaction resolver is executed to fetch the transactions for each user, and then the sum_transaction_amount resolver is executed to compute the sum.

We can also see the data that flows through the query. Clicking on the transaction resolver shows us the output of the resolver:

Transaction Output

Notice that the sum of the transaction amounts is definitely not 100! By inspecting the transaction.amount column we notice that the sum should be much larger. If we were paying attention, we'd probably notice that our (incorrect) output 100 happens to be identical to the count summary of the transaction resolver, which would be a good hint that our aggregation is actually computing the count of the transactions, rather than the sum of their amounts.

However, if we're having a slow day and don't notice that the aggregation happens to match the count, we have more tools we can use to debug. The next section will walk through how to execute the aggregation locally using the raw input data, so we can use a debugger.

(Note: the query plan visualizer UI displays the first 20 rows of each stage of the query plan in the preview areas. If you want to see more, click "Download data" to download the raw parquet file.)


Resolver Replay

Resolver execution can be examined using the Query Plan Visualizer, but Chalk also allows you to directly execute your resolver on your local computer using the same arguments that were used in your query.

To use the resolver_replay functionality:

  1. Run an offline_query with store_plan_stages=True to store the query plan stages, and recompute_features=True to allow resolver execution.
  2. On the returned dataset, call resolver_replay with your resolver function.

Example:

from chalk import ChalkClient

ChalkClient().offline_query(
    input={User.id: [1]},
    output=[User.sum_transactions],
    store_plan_stages=True,
    recompute_features=True
).resolver_replay(sum_transaction_amount)

This will execute the resolver locally, using the same input data that was used in the query. This allows you to debug using your IDE (vscode, pycharm, or even pdb):

Debugger

You can edit the definition of the resolver locally, and re-run resolver_replay to see the results of your changes in your terminal or by stepping through with a debugger.

Using logs to debug

If you're not able to use the query plan visualizer or resolver replay, you can also use the logs to debug your query. Use the chalk_logger from the chalk.clogging just like a standard python logger. This logger will output to the web interface, and to configured log sinks (i.e. your DataDog instance).

from chalk.clogging import chalk_logger

@online
def sum_transaction_amount(txns: User.transaction[Transaction.amount]) -> User.sum_transaction_amount:
    chalk_logger.info(f"Transactions: {txns}")
    return txns[Transaction.amount].count() # this 'count' is a bug! we'll debug it shortly.

You can also view these logs using resolver_replay, where they will be emitted to your terminal:

Logs output example

Query Error Categories

In using the debugging techniques above, you may encounter a few different categories of errors that have common root causes:

Request error

Request errors are raised before your resolver code executes. They are often caused by invalid feature names in the input or by requests that cannot be satisfied by the resolvers you have defined.

Feature error

Feature errors are raised when a specific resolver that maps to a feature fails. For this type of error, you'll find a feature and resolver attribute in the error type.

When a resolver crashes, you will receive null value in the response. To differentiate from a resolver returning a null value and a failure in the resolver, you need to check the error schema.

Network error

Network errors are thrown outside your resolvers. They can be caused by unauthenticated requests or other connection failures.


Error schema

The online query interface for resolvers returns the following schema:

Response Schema

FeatureResult[]}> The outputs features and any query metadata (discussed in detail at{' '} Query Basics.) ChalkError[]?}> Errors encountered while running the resolvers. Each element in the list is a{' '} ChalkError. If no errors were encountered, this field is empty.

ChalkError

ErrorCode}> The type of error, matching one of the error codes. ErrorCode.kind}> The category of the error, given in the type field for the{' '} error codes. This will be one of{' '} "REQUEST", "NETWORK", and{' '} "FIELD". A readable description of the error message. The exception that caused the failure, if applicable. The name of the class of the exception. The message taken from the exception. The stacktrace produced by the code. The fully qualified name of the failing feature, eg. user.identity.has_voip_phone. The fully qualified name of the failing resolver, eg. my.project.get_fraud_score.

Error codes

<AttributeTable title={"Values"}> <Attribute field={'PARSE_FAILED'} kind={<a href={"#request-error"}>REQUEST}> The query contained features that do not exist. <Attribute field={'RESOLVER_NOT_FOUND'} kind={<a href={'#request-error'}>REQUEST}> A resolver was required as part of running the dependency graph that could not be found. <Attribute field={'INVALID_QUERY'} kind={<a href={'#request-error'}>REQUEST}> The query is invalid. All supplied features need to be rooted in the same top-level entity. <Attribute field={'VALIDATION_FAILED'} kind={<a href={'#field-error'}>FIELD}> A feature value did not match the expected schema (eg. incompatible type "int"; expected "str") <Attribute field={'RESOLVER_FAILED'} kind={<a href={'#field-error'}>FIELD}> The resolver for a feature errored. <Attribute field={'RESOLVER_TIMED_OUT'} kind={<a href={'#field-error'}>FIELD}> The resolver for a feature timed out. <Attribute field={'UPSTREAM_FAILED'} kind={<a href={"#field-error"}>FIELD}> A crash in a resolver that was to produce an input for the resolver crashed, and so the resolver could not run crashed, and so the resolver could not run. <Attribute field={'UNAUTHENTICATED'} kind={<a href={'#network-error'}>NETWORK}> The request was submitted with an invalid authentication header. <Attribute field={'UNAUTHORIZED'} kind={<a href={'#network-error'}>NETWORK}> The request has credentials that do not provide the required authorization to execute an operation. <Attribute field={'INTERNAL_SERVER_ERROR'} kind={<a href={'#network-error'}>NETWORK}> An unspecified error occurred.