The Zed Project: Stumbling Upon a New Data Model while Hacking on Packets
This README comprises a presentation given by Steve McCanne at Sharkfest '21, 8-9am on September 17, 2021. You can reproduce all the examples herein from the tools referenced and files in this repo. This is Part 2 of Steve's talk from 10 years ago at Sharkfest '11.
If you've ever tried to assemble operational infrastructure for search and analytics of network-oriented traffic logs, you know what a daunting task this can be. Roughly three years ago, we embarked upon a project to explore search and analytics for Zeek and Suricata "sensors" running on live network taps or over archived PCAP files. Having experimented extensively with well-known, open-source search and analytics systems and after talking to a wide range of practitioners of such tech stacks, we noticed a recurring and compelling design pattern: a search cluster is often deployed to hold recent logs for interactive queries, while some sort of data lake is deployed in parallel to hold historical data for batch analytics.
We wondered if this bifurcation between search and analytics was fundamental or if we could tackle this problem with a different approach. We eventually concluded these silos arise, at least in part, from the bifurcation of the underlying data models themselves: search systems typically use the schema-less model of JSON while analytics systems use more structured formats like Parquet (or relational tables), enforcing one-or-the-other design decisions.
In this talk, I'll describe a new data model called Zed --- designed to unify the document model of JSON with the relational model of databases -- where our ultimate goal is to converge search, analytics, and ETL. I will then discuss how we've leveraged the Zed data model in a new query engine that operates over Zed data instead of JSON objects or relational tables, and admits a new query language that is a superset of SQL and log-search style languages. Finally, I'll outline our "work in progress" adapting the Zed system to a Git-like data lake for cloud storage -- called a Zed lake --- providing time travel, live ingest, search indexes, and transactionally consistent views across distributed workers.
Feel free to follow along at github.com/brimdata/sharkfest-21
Drag demo.pcap into the Brim app to start the import...
Thanks to Erik Hjelmvik of Netresec for granting permission to distribute this pcap, and thanks also to his employer FM CERT at the time it was created.
Heads up and joke about video... screen may lag behind voice.
- Some ancient history: PCAP, BPF, tcpdump
- Ten years ago: Stanford Sharkfest '11 and Riverbed
- Present: Brim and Zed
- Zed: stumbling on a new data model through PCAP hacking
- Like Crockford and JSON, Zed was discovered not invented
About 18 months ago, we got some early feedback from smart people...
Steve... the world doesn't need another data model.
Steve... the world REALLY doesn't need another query language. Use SQL.
Steve... no one cares about your tech. What problem are you solving?
- I was stubborn and persevered.
- I couldn't really articulate it yet, but I felt we were onto something.
- We're just getting to the point where we can rationalize it all...
Despite the pushback, we built useful stuff and have anecdotal validation from our user community that we're doing something right...
"Once my data is in Zed, everything is easy..." - Community User
Another community user tweeted:
Underneath all this is a gut instinct:
Brim+Zed is bigger than just an app and search experience. There's something up with the data model we found underneath it all. We think Zed is all about ergonomics for easier data engineering.
Our audacious goal: Zed will do for data lakes what JSON did for APIs.
Zed and Brim have become our research vehicle for exploring the data model.
- Zed & Brim are open source (BSD license)
- Search-like experience optimized for Zeek and Suricata
- (quick demo of pcap drag into Brim)
While the PCAP is loading, here is the wiring behind the scenes...
- Data organized into "pools" like Mongo collections
brimcap
bundles integrations for Zeek and Suricatabrimcap
builds a PCAP indexbrimcap
loads processed logs into a Zed pool
- Brim interacts with
brimcap
for click-to-packets
Note: in upper left, size of logs from 4GB pcap and time span
- Brim is
- A search tool
160.176.58.77
- search of an IP181.42.0.0/16
- search for IPs in a networkweird
- show me Zeek's weird logsalert spam
look for who knows what
- An analytics engine
count() by _path
count() by query
_path=="conn" | bytes:=sum(orig_bytes+resp_bytes) by id.resp_p | sort -r bytes
every 10s count() by _path
- (try different intervals, 24h, 1w)
- This is the query the app uses to create the bar chart.
- A learning tool - beginner doesn't have to know complex queries
- Right-click filter by
- Right-click count by
- Right-click on pivot to logs
- Click on column header
- A security skin
- Click on conn to show Zeek details
- Click on Suricata Alerts query
- A wireshark navigator - Zeek provides context for PCAP drill downs
- Click to packets
- A search tool
- James Kerr
- Mason Fish
- Steve McCanne
- Noah Treuhaft
- Matt Nibecker
- Al Landrum (ex-Brim)
- Henri Dubois-Ferriere (ex-Brim)
- Phil Rzewski
- Amy Ousterhout
- Silvery Fu
- Sylvia Ratnasamy
- Joe Hellerstein
Brim+Zed looks a lot like ELK...
Douglas Crockford: JSON
- Just send a javascript data structure to a javascript entity
- JSON APIs proliferate: so much easier than XML, SOAP, RPC
- And node.js arrived on the backend and we had full stack
Shay Banon: Elastic
- Wrap Lucene Java lib in a REST API and shard indexes
- Post JSON docs to API
- Submit JSON search queries to API
It's hard to make things easy.
They did it. Brilliant, easy, simple.
Yet, search alone is usually not enough...
- We saw a recurring design pattern in large-scale deployments
- Need for historical analytics
- Bifurcated search/analytics model
- Search: OpenSearch, Elastic, Splunk
- Unstructured logs or semistructured JSON
- Analytics: in a lake or warehouse
- Relational tables, e.g, ClickHouse, BigQuery, Snowflake
- Parquet files on S3
- Search: OpenSearch, Elastic, Splunk
Schemas are incredibly important
- ETL uses them to clean up data
- Warehouses uses relational schemas to organize data across tables
- Elastic uses them to normalize data and recover type richness
But they a double-edge sword:
Schemas are essential as an organizing policy for your data, but they get in the way as a mechanism for data storage and transport.
The ETL pipeline breaks when something changes
- Someone adds a new plugin to Zeek
- The plugin adds a new field to Zeek "conn logs"
- The ETL doesn't have a schema with for that new column
The Zeek log format is type-rich and structured:
#fields ts uid id.orig_h id.orig_p ...
#types time string addr port ...
1521911721.255387 C8Tful1TvM3Zf5x8fl 10.164.94.120 39681 ...
1521911721.411148 CXWfTK3LRdiuQxBbM6 10.47.25.80 50817 ...
Elastic Common Schema (ECS) is a type-rich schema system:
- data communicated as regular type-impoverished JSON
- ingest pipelines map JSON to ECS objects
So we have this song and dance:
Writing data to a Parquet file on S3:
- Scan all records in input to determine schema
- Open the Parquet file with the schema
- Write records that conform to schema
Sending structured data over Kafka with Avro:
- Sender has record to send
- Sender contacts the "schema registry" with record's schema and receives ID
- (or it retrieves the ID from a local cache)
- Sender tags data of record with ID
- Sender transmits data on Kafka topic
- Receives pulls data from Kafka topic
- Receiver contacts schema registry with record's ID to get schema
- (or it retrieves the schema from a local cache)
- Receiver decodes record data using schema
Your business is a complex application with complex data pipelines:
- the data models are all defined as protobuf schemas called protos
- the system components all communicate using protobuf-compiled gRPC endpoints
- every time a model is updated, everything needs to be recompiled and redeployed
In all of the above examples:
- A schema defines the organizing policy for the data, and
- the very same schema constrains the mechanism for encoding the data.
An old design principle says you should separate policy from mechanism ...
Could this be good advice here?
What if the mechanism were self-describing data:
- A comprehensive type system embedded in the data itself
- First-class types for "types as values"
- Entities in the Zed system adapt to data types
- instead of being constrained by a rigid set of schemas
And what if policy was enabled by the flexible Zed type system instead of relying on rigid schemas?
Before we can explain how this will work, let's give a quick tour of the Zed data model.
We'll use our tooling for a quick tour of Zed.
We have taken a very modular, "composable tools" approach
- CLI tools written in Go
- any new functionality developed as CLI command
- fabulous for dev, test, and debug
- bite-sized chunks for learning the system
- most functionality exposed through service API
Like the docker
command, everything is packaged under the zed
command.
Here are a just few:
zed query
- perform Zed queries and analytics on files and Unix streamszed api
- execute commands on a remote Zed lake service (or localhost)zed lake serve
- run a service endpoint for a "Zed lake"- We have a couple shortcuts:
zq
forzed query
(play onjq
if you know that tool)zapi
forzed api
We'll use zq
to take an input, do no processing, and display it
in pretty-printed Zed.
echo '{"hello": "world"}' | zq -Z -
We leveraged the familiarity and simplicity of JSON:
- Zed is a superset of JSON
- The human-readable form of Zed is called ZSON
- We can take JSON/ZSON as input and pretty-print it:
echo '{"s":"hello","val":1,"a":[1,2],"b":true}' | zq -Z -
Note that we drop the quotes from field names and use them only when necessary:
echo '{"funny@name":1}' | zq -Z -
And we're fully compatible with all of JSON's corner cases:
echo '{"":{}}' | zq -Z -
zq
supports a bunch of formats so you can output to JSON and other formats:
echo '{"s":"hello","val":1,"a":[1,2],"b":true}' | zq -f json -
echo '{"s":"hello","val":1,"a":[1,2],"b":true}' | zq -f csv -
echo '{"s":"hello","val":1,"a":[1,2],"b":true}' | zq -f zeek -
Unlike JSON, Zed has a comprehensive type system.
Here is a ZSON record with a bunch of different types:
zq -Z values.zson
{
v1: 1,
v2: 1.5,
v3: 1 (uint8),
v5: 2018-03-24T17:30:20.600852Z,
v6: 2m30s,
v7: 192.168.1.1,
v8: 192.168.1.0/24,
v9: [1,2,3],
v10: [1(int32),2(int32),3(int32)],
v11: |["PUT","GET","POST"]| (=HTTP_Methods),
v12: |{{"key1","value1"},{"key2","value2"}}|,
v13: { a:1, r:{s1:"hello", s2:"world"}}
}
What we don't do here is define a schema and then fit the values into the schema.
- Data is always self describing.
- No need to declare types explicitly.
- No external schema registry needed.
As in Brim, the Zed query language lets you operate on Zed data:
zq -Z "cut v1,v2,v7,sum:=v1+v2,net:=network_of(v7)" values.zson
Most data structures from any modern language can be marshaled into ZSON without loss of information.
Zed has more types and "type unions", but I'll leave these details out of this talk. This is all documented in the ZSON spec.
- A Zed value always has a type
- Zed types are Zed values
- The
typeof
operator creates a type value from a value
echo '{s:"hello"}' | zq -Z "typeof_s:=typeof(s)" -
The type of a type value is type type
:
echo '{s:"hello"}' | zq -Z "typeof_s:=typeof(s) | typeof_type:=typeof(typeof_s)" -
The special value "this" refers to the current record in a declarative style:
echo '{s:"hello"}' | zq -Z "put copy:=this" -
this
is a value, so it always has type:
echo '{s:"hello"}' | zq -Z "put typeof_this:=typeof(this)" -
Well that's starting to look like a schema...
Compare with JSON/Javascript:
node
typeof({s:"hello"})
Hmm, another limitation of Javascript. This is not helpful here.
Let's try something a little more complex:
echo '{name:"Sally",city:"Berkeley",salary:350000.}' | zq -Z "cut typeof_this:=typeof(this)" -
And now, let's name the schema with a Zed typedef:
echo '{name:"Sally",city:"Berkeley",salary:350000.}(=employee)' | zq -Z "cut typeof_this:=typeof(this)" -
Isn't this exactly a relational schema?
This is powerful: we can now get the schema from the data itself and do not have to synchronize data values with some external schema definition.
So this feels like a promising direction to converge the document and relational models.
Many relational tables start out as CSV.
Let's make some sample data:
- Take some CSVs, clean them, and form ZSON
- Tack on some junk that doesn't fit in
junk.json
- Import the "pile of stuff" into Brim and play around with it
cat employees.csv
cat deals.csv
cat junk.zson
zq -z -i csv "type deal = {name:string,customer:string,forecast:float64}; this:=cast(this,deal)" deals.csv > pile.zson
zq -z -i csv "type employee = {name:string,city:string,phone:string,salary:float64}; phone:=int64(phone) | this:=cast(this,employee)" employees.csv >> pile.zson
zq -z junk.zson >> pile.zson
Ok, let's get this file to Brim.
Instead of dragging it in, we'll go through the API with zapi
.
Remember there is a
zed lake serve
process running to support Brim.
zapi create PileOfStuff
zapi use PileOfStuff@main
zapi load pile.zson
The "zapi use" command is like "git checkout".
Brim doesn't really know what to make of it.
- It doesn't look like Zeek or Suricata
- It doesn't have timestamps
- There's no single schema to organize it in a table.
But we can query it in Brim...
Since I was told no one wants a new query language and I should use SQL, we made Zed a superset of SQL...
SELECT * FROM employee
SELECT * FROM employee | cut typeof(this)
SELECT name,salary FROM employee WHERE salary >= 250000 ORDER by salary DESC
SELECT * FROM deal
SELECT name, sum(forecast) as forecast FROM deal GROUP BY name ORDER BY forecast DESC
SELECT name, union(forecast) as deals, sum(forecast) as total FROM deal GROUP BY name ORDER BY total DESC
SELECT d.customer as customer, d.name AS name, e.phone as phone FROM deal d JOIN employee e ON d.name=e.name
Note all the queries worked just fine even with the junk in the way!
This is because the Zed type defines the table at query time (policy), and the data is not stored in a fixed-schema relational table (mechanism).
The schema
- is neither required to define the data,
- nor does it stop you from putting the data into the system.
James, from our team, summarized it nicely:
So to summarize my understanding, a database has a mechanism to write data to disk. It also has a policy that data in a table must conform to a schema. Therefore to use the “write data to disk” mechanism, that data must conform to the policy of the table’s schema. With Zed, data need not conform to any policy before it gets saved to disk. Then later a policy can deem certain types of data valid based on its shape.
With Zed, we can let the data in before we know its shape.
Because data is "allowed in" without knowing its schema ahead of time, you can use the very same system to explore and discover the "shapes" of data (i.e., their schemas) before you've figured out how to clean it up.
Zed's type system is the key to introspection and "schema discovery".
Key insight from Henri:
Because Zed types are also values, we can put a type anywhere a value goes... in particular, a type can be a group-by key.
count() by typeof(this)
And now we can clearly see the junky shapes mixed in with our tables.
I love this query so much, I have it in my query library as
Shapes
.
And we can filter the junky values with this:
is(type({a:string,b:string,c:string})) or is(type({message:string}))
So the clean data is
is(type(deal)) or is(type(employee))
Let's put the clean data in a new pool...
zapi create CleanTables
zapi query "from PileOfStuff@main | is(type(deal)) or is(type(employee))" | zapi load -use CleanTables@main -
Now if I look at the shapes in the clean table, I can confirm it worked:
count() by typeof(this)
So let's go back to the PCAP example in the app and run Henri's query:
count() by typeof(this)
Ok, that's really powerful, but it would be more intuitive to see a sample value of each type... you can use the any aggregator!
any(this) by typeof(this) | cut any
We love this so much we call it sample:
sample
You can sample a field too...
sample uid
sample query
sample id.orig_h
But if we go back to the top level, we could see what the data would look like
if we tried to fuse the shapes all into one uber schema using the fuse
operator...
sample | fuse
And here is an important insight:
This ultra wide table with many columns is precisely how data warehouses work (or Parquet files that hold lots of columns). You define a single, very-wide schema to hold any possible field that might show up, and as long as your ETL logic can find a slot in this single schema for all the fields of an incoming record, everything is fine. But when a field shows up that doesn't fit, you have problems. Data warehouses compress all the null columns efficiently, and perform column-oriented analytics really well.
- No wonder there is such a big gap between the relational model and the document model.
- The two forms of data look nothing alike!
Ok, this separation of policy and mechanism argument sounds great, but how can you make this ZSON text format efficient?
Like JSON, text-based ZSON is horribly inefficient.
And columnar data warehouses are really fast!
- the wide-schema model has evolved over decades
- modern data warehouses use all sorts of clever trick to make things fast
Of course ZSON can't compete.
But maybe the Zed data model can!
Let's just borrow the good ideas from Avro and Parquet...
- Avro from the Hadoop ecosystem
- Parquet from Google's Dremel paper
But leave out the schema-rigid bits...
- Avro requires a schema for every record or completely uniform records
- or a schema registry as mentioned earlier
- Parquet requires a schema for each file where all records conform to the schema
To this end, we end up with a format of families that all adhere to the Zed data model but emulate the efficiency of Avro and Parquet.
But they all conform to the same Zed data model.
ZNG works by including small typedefs in the binary stream every time a type is needed by a record.
These mappings are stored in a table called the "type context".
The type context is
- locally scoped so no need for a schema registry
- mergeable so different streams with different type contexts can be interleaved, and
- concatenatable so streams can easily be processed.
For example,
echo '{a:1}' | zq -f zng - > example1.zng
echo '{s:"hello"}' | zq -f zng - > example2.zng
hexdump -C example1.zng example2.zng
cat example1.zng example2.zng example2.zng example1.zng example2.zng | zq -
Note
ff
end-of-stream marker.
Armed with the type context, we can create ZST files where the columnar layout is
- not defined by a single, wide schema created before writing to the file,
- but rather self-organizes around the types in the type context on the fly.
For example,
cat pile.zson
zq -f zst pile.zson > pile.zst
hexdump -C pile.zst
You can see the column in the hexdump output.
More importantly, the data model is exactly the same across format families, so we can boomerang through the formats without loss of information...
zq -f zng -i zst pile.zst > pile.zng
zq -z pile.zng > pile-boomerang.zson
diff pile.zson pile-boomerang.zson
Of course, you can't do a boomerang with Parquet...
zq -f parquet pile.zson > pile.parquet
Oops, it needs a single uber-schema, but we can get that with fuse
zq -f parquet "fuse" pile.zson > pile.parquet
But now when we read it, it's not the same!
zq -Z -i parquet pile.parquet
There you have it... the Zed format family
- ZSON (like JSON) - human readable like JSON
- ZNG (like Avro) - performant, binary, compressed record-based format
- ZST (like Parquet) - performant, binary, compressed column-based format
They are all perfectly compatible because they all adhere to the same data model: no loss of information transcoding between formats.
ZNG is typically 5-10X smaller than ZSON/JSON...
zapi query "from demo.pcap" > demo.zng
zq -f ndjson demo.zng > demo.ndjson
ls -lh demo.*
Okay, final chapter of the talk: How do we leverage the Zed data model at scale?
Enter the Zed Lake.
Built on a cloud storage model...
- Objects are stored as ZNG/ZST, immutable, and have globally unique name
- Search indexes are just ZNG objects (with simple b-tree indexing)
- All state changes to the lake view stored in a cloud-based transaction journal
We realized many compelling use cases could be supported by a Git-like model for the Zed lake.
- Create branches
- Work on a branch, test, debug
- Merge branch to main or drop
The Zed Lake's use cases are malleable through its API
- the Zed service is not a monolithic black box
- rich API with many verbs for automation and tooling
- let's look at a few use cases
It's one thing to get data into a Zed lake, it's another to derive insights and automate that process...
A security example...
- Compute an edge graph of all communicating host pairs
- Add some connection stats
- Look for bad SSL certs
cat graph.zed
filter _path=="conn" OR _path=="ssl"
| summarize
count(_path=="conn"),
maxConnTime:=max(duration),
maxConnBytes:=max(orig_bytes+resp_bytes),
totConnBytes:=sum(orig_bytes+resp_bytes),
totConnTime:=sum(duration),
badSSL:=or(!(validation_status=="ok" OR validation_status==" " OR
validation_status=="" OR validation_status==null))
by id.resp_h,id.orig_h
Because everything is driven off the API, it is easy to run this automation at periodic intervals and populated a data pool with the analysis.
You could do this on a time window, but I'll do the whole pool by hand:
zapi create NetGraph
zapi query -use demo.pcap@main -I graph.zed | zapi load -use NetGraph@main -
(show bad certs in app, click through to logs, then to packets)
What about augmenting data in addition to deriving insights?
Enter join
Say you had a list of badguys.
zq badguys.zson
And you wanted to decorate your logs that had an IP in this list.
First we put the badguys list in its own pool...
zapi create BadGuys
zapi use BadGuys@main
zapi load badguys.zson
(see BadGuys pool in app... it's like a toy threat intel feed.)
Now we can do a join with the logs, but let's test it first on a branch.
cat join-badguys.zed
zapi use demo.pcap@main
zapi branch test
zapi use test
zapi query -I join-badguys.zed | zapi load -
Use log
to see that we added the joined data on the branch,
and run a test query.
zapi log
zapi query "count() by _path"
and narrow it down to the new records with a simple search...
zapi query "count() by _path | badguy"
Check the app and see that it's not there because it's not in main.
Okay, it looks good so merge branch test
into main
!
zapi merge main
zapi use main
zapi log
Now check the app and see the records...
Now suppose, oops, you didn't want to merge that to main after all.
No problem!
zapi log
zapi revert <cut-and-paste-commit-id>
Now search for "badguy" in app... it's gone!
The data is still there. It just was removed by a commit to the log.
So if we grab the old commit ID, we can still see the old data...
zapi log
zapi query "from demo.pcap@<id> | badguy | head 10"
A main/live branching model for streaming pipelines... work in progress.
- Small, batch updates to tip of
live
branch- Many times per second
- Orchestration agent sweeps the
live
branch intomain
- Compact little data objects into big objects
- Compute search indexes for new data
- Merge
live
intomain
and rebaselive
back to tip of mergedmain
- Showed you our new app Brim
- Described the Zed data model underneath Brim that we stumbled upon while hacking PCAPs
- Proposed that Zed can
- unify the document and relational models
- elegantly separate the schema-policy from the data-mechanism
- be performant with its record- and columnar-based formats
- Walked through how it all comes together in a Git-like Zed lake
We have a ways to go, but we hope someday that...
Zed will do for data lakes what JSON did for APIs.
It's hard to make things easy...
Do you buy it? Dimitri Does!
If you're interested, please connect with us online
Steve McCanne is the "Coding CEO" at Brim, a small startup working on the open-source Zed Project and a new application called "Brim" that leverages Zed. Back in the days before the Web, Steve worked at the Lawrence Berkeley National Laboratory where he developed BPF, libpcap, the PCAP file format, and the tcpdump language and compiler, while also working on the Real-time Transport Protocol (RTP) for Internet video when the telcos claimed that real-time Internet communication was impossible without end-to-end virtual-circuit guarantees. (Guess who was right?) After a brief stint in academia in the late '90s, Steve crossed over to the dark side, became a tech entrepreneur, and never looked back. He has founded several startups and took his '02 company and Sharkfest's sponsor, Riverbed, public in '06. After many years working in other areas of tech, Steve has returned to his roots, dabbling again with PCAPs, leading him to Zed and a whole new way to approach network and IT observability data.