Skip to content

Latest commit

 

History

History
112 lines (84 loc) · 3.83 KB

query-online.mdx

File metadata and controls

112 lines (84 loc) · 3.83 KB
title description
Online Queries
Fetch feature values via online queries

import { TipInfo } from '@/components/Tip'


Online queries access or compute feature values for a single feature set in real time. The term "real time" is a little vague—what it means is that responses (even though they might need to compute features on new data) should seem practically instantaneous.

However, online queries don't only perform data retrieval, they also store the results of the features that they compute. This provides visibility and long-term tracking into the features you are generating. In this section, we provide a high level overview of how online queries compute and record their outputs.

How Do Online Queries Work?

Chalk responds to online queries by getting and executing a query plan.

Getting a Query Plan

A query plan is a sequence of tasks, some of which can be executed in parallel, that will produce a target output. Although a simplification, your resolvers are a subset of these tasks. Consider the following feature class and resolvers:

from chalk.features import features

@features
class User:
  id: int
  name: str
  is_palindrome: str
  is_short: bool
  palindrome_and_short: bool

@online
def get_is_palindrome(name: User.name, User.backwards_name) -> User.is_palindrome:
  return name == name[::-1]

@online
def get_is_short(name: User.name) -> User.is_short:
  return name.len() < 3

@online
def get_is_short_palindrome(is_short: User.is_short, is_palindrome: User.is_palindrome) -> User.is_short_palindrome:
  return is_short and is_palindrome

If you were to begin assembling a dependency graph for the features. You would wind up with something like the following:

┌───────────┐
│name       │
└┬─────────┬┘
┌▽───────┐┌▽────────────┐
│is_short││is_palindrome│
└┬───────┘└┬────────────┘
┌▽─────────▽────────┐
│is_short_palindrome│
└───────────────────┘

When you run an online query, such as:

chalk query --in user.id=1 --in user.name=bob --out is_short_palindrome

Chalk constructs a plan for how to "solve" this query. This query plan is viable for other queries with the same input and output features.

Running chalk query with the --explain flag outputs your query plan.

The following query can reuse the plan generated by the one above:

chalk query --in user.id --in user.name=bartholomew --out is_short_palindrome

Even though both the input name and the output of the query are different, the query plan remains valid.

Executing a Query Plan

As illustrated above, a query plan is not a linear sequence of tasks that must be executed one after another: a lot of work can often be performed in parallel.

After getting a query plan, Chalk distributes subtasks to workers, applies a number optimizations on your resolvers/datasource connections, and computes the target outputs of your query. These outputs are then returned.

Caching and Writing

Online queries write computed values to two places: the offline store and the online store. However, computed features are only written to the online store if they have a caching policy. The online store is used to circumvent recomputation of expensive features that are either unlikely to have changed or can tolerate slightly stale values. Properly configuring the caching policies for your features can make your online queries significantly more efficient.

If you haven't specified a caching policy, Chalk recomputes the values for a feature each time it is requested. We go into more depth on query caching in a later section.