title	description
Performance Benchmarking	Chalk serves features efficiently with high throughput and low latency

Background

Chalk is built for applications that require low latency and high throughput online feature serving. Consequently, we benchmark our feature pipeline service in a variety of scenarios that are representative of real-world use cases.

Achieving maximum performance requires constructing feature pipelines that avoid calls to inefficient or unpredictable external storage systems. For applications that are the most sensitive to latency, we recommend ensuring that all features are resolved from data fetched from the Chalk online store.

As background for these benchmarks, we'll define a few terms:

Online Store: Chalk uses an online store to serve eligible pre-computed feature values to queries. This storage layer is optimized for extremely low latency and high throughput.
Resolver: A resolver is a Python function defined by a Chalk user, which may execute in order to compute features to satisfy queries.
Feature: A feature is a single point of data, computed by resolvers. A feature may specify a max staleness to become eligible for serving from the online store.

Chalk uses a horizontally scalable stateless compute tier that responds to queries submitted by API clients. Each online query is satisfied by executing resolvers in the compute tier, potentially in combination with requests to the online store.

Benchmark Setup

All benchmarks were executed using k6, an open-source load testing framework. All statistics reported are "end-to-end" over the network -- i.e., they are representative of the performance that would be observed by an actual API client.

K6 was configured to use many concurrent "virtual users" which submitted requests conforming to the scenario's parameters sequentially as fast as possible. The compute instances were pre-warmed prior to the beginning of each scenario.

Machine Setup

All tests were run against a deployment consisting of:

Compute instances: 8vCPUs, 16gb of memory. Deployed on GKE.
Online store: Cloud Memorystore (Redis). Single node. 16GB of memory.

All machines are co-located in the same GCP region, and queries were submitted from a single GCE instance with 8 CPUs and 16 GB of memory.

Benchmarking data and query patterns

All benchmarks were executed as queries against a set of 25 resolvers resolving 100 features. Feature data was generated randomly.

Queries sampled features randomly from the "pre-computed" feature class, which consists of data stored in the online store, and from the "on-the-fly" set, which consists of features that are never stored and instead always computed in response to queries.

The online store was pre-warmed to simulate a use-case where features are pre-ingested via cron scheduling or streaming resolvers.

Benchmarking Results

Scenario 1: Fetching pre-computed features

For the most latency-sensitive applications, we recommend serving all data from the online store. In this access pattern, Chalk pulls the latest value of a feature that meets your maximum staleness requirements, as determined by the maximum staleness on the feature, and any overrides to the max staleness for a particular request. Note that this value can be set to infinity when you want to serve a stored value, no matter how old. Cron and streaming are helpful tools for pre-computing the data in the online store.

Statistic	Observed Value
Mean	15.34ms
Median	13.49ms
P95	20.8ms
P99	28.3ms
Mean QPS	8146 requests/second
Error rate	0%

In this scenario, Chalk sustained roughly 8150 requests per second with 0 errors and overall 28.3ms P99 latency. Latency and throughput were consistent throughout the test.

Scenario 2: Computing features on-the-fly

We recommend pre-computing or pre-fetching features that are 'expensive' to resolve on-the-fly, but simple Python computations can be executed efficiently on-demand even in latency sensitive scenarios. This flexibility makes it easy to iterate on feature computation and evolve feature pipelines in response to evolving business needs without having to execute large migrations.

In this scenario, we tested Chalk's ability to execute a high volume of simple Python resolvers. No queries were served from the online feature store.

Statistic	Observed Value
Mean	12.83ms
Median	11.3ms
P95	16.22ms
P99	23.891ms
Mean QPS	9337 requests/second
Error rate	0%

In this scenario Chalk sustained over 9,300 QPS with a p99 latency of less than 24ms. Interestingly, the performance during the 'on-the-fly' computation scenario was even better than the performance during the previous 'pre-computed' scenario. This is reasonable when the computation that Chalk needs to perform is simple: the online storage tier is very fast, but executing pure Python can be even faster than making network round-trips to storage in some cases.

Scenario 3: Mixture of on-the-fly & pre-computed

Of course, most use-cases require a mix of on-the-fly and pre-computed computed feature values. We configured this scenario as a combination of Scenario 1 and Scenario 2. We submitted a 50/50 blend of requests for pre-computed features and features which needed to be computed on-the-fly.

This demonstrated Chalk's flexibility and ability to intermingle CPU and IO-bound work -- requests that could be served purely by compute were interleaved with requests that required Chalk's engine to make network requests to resolve the cached values.

Statistic	Observed Value
Mean	14.01ms
Median	12.52ms
P95	18.96ms
P99	29.49ms
Mean QPS	8742 r/s
Mean Pre-computed QPS	4079 r/s
Mean On-the-Fly QPS	4662 r/s
Error rate	0%

In this scenario Chalk sustained over 8740 QPS with an overall p99 latency of 29ms. This shows that Chalk is able to serve a blend of query patterns efficiently.

Conclusion

These benchmark results demonstrate that Chalk is capable of serving high feature volumes efficiently in both pre-computed and on-the-fly scenarios.

While sustaining between 8150 QPS and 9,350 QPS, Chalk achieved a 100% success rate, and in all scenarios achieved a p99 latency of less than 30ms. Chalk is currently well-suited to demanding applications that require significant throughput and minimal latency, and we're constantly working to improve performance even further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks.mdx

benchmarks.mdx

Background

Benchmark Setup

Machine Setup

Benchmarking data and query patterns

Benchmarking Results

Scenario 1: Fetching pre-computed features

Scenario 2: Computing features on-the-fly

Scenario 3: Mixture of on-the-fly & pre-computed

Conclusion

Files

benchmarks.mdx

Latest commit

History

benchmarks.mdx

File metadata and controls

Background

Benchmark Setup

Machine Setup

Benchmarking data and query patterns

Benchmarking Results

Scenario 1: Fetching pre-computed features

Scenario 2: Computing features on-the-fly

Scenario 3: Mixture of on-the-fly & pre-computed

Conclusion