Skip to content
This repository has been archived by the owner on Apr 1, 2024. It is now read-only.

print-types.zeek: handle nesting more than two levels deep #15

Closed
philrz opened this issue Apr 16, 2020 · 1 comment
Closed

print-types.zeek: handle nesting more than two levels deep #15

philrz opened this issue Apr 16, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@philrz
Copy link

philrz commented Apr 16, 2020

This is a known limitation that is documented in print-types.zeek, but hadn't made its way over into this bug tracking system. The short summary is that the print-types script doesn't handle more than two levels of nesting. I had initially run into this with the openflow log, and here @philrz notes something similar with the smb_cmd log.

Repro is with the print-types.zeek script in Brim's fork of Zeek at commit e35de70 and zq commit 0397e42.

As of Zeek release v3.1.2, the smb_cmd.log is not generated by default. However, it can be brought to life by adding this line to local.zeek on a "stock" config:

@load policy/protocols/smb/log-cmds

Now we generate a typing config (attached):

# ZEEK_ALLOW_INIT_ERRORS=1 /usr/local/zeek-3.1.2/bin/zeek print-types.zeek /usr/local/zeek-3.1.2/share/zeek/site/local.zeek | tail +2 | jq | python -m json.tool > ~/work/zq/zeek/types-with-smb-cmd.txt

If I attempt to use it with zq commit 0397e42, it is rejected.

# zq -j ~/work/zq/zeek/types-with-smb-cmd.txt *
syntax error parsing type string

@henridf and I had discussed this at some point in the past at which point we noted that the cause of the error seems to lie within the referenced_file portion, as if I remove this it works:

# diff types-with-smb-cmd.txt types-with-smb-cmd-fixed.txt
2554,2598d2553
<                 "name": "referenced_file",
<                 "type": [
<                     {
<                         "name": "ts",
<                         "type": "time"
<                     },
<                     {
<                         "name": "uid",
<                         "type": "bstring"
<                     },
<                     {
<                         "name": "id",
<                         "type": "record conn_id"
<                     },
<                     {
<                         "name": "fuid",
<                         "type": "bstring"
<                     },
<                     {
<                         "name": "action",
<                         "type": "zenum"
<                     },
<                     {
<                         "name": "path",
<                         "type": "bstring"
<                     },
<                     {
<                         "name": "name",
<                         "type": "bstring"
<                     },
<                     {
<                         "name": "size",
<                         "type": "uint64"
<                     },
<                     {
<                         "name": "prev_name",
<                         "type": "bstring"
<                     },
<                     {
<                         "name": "times",
<                         "type": "record SMB::MACTimes"
<                     }
<                 ]
<             },
<             {
# zq -j ~/work/zq/zeek/types-with-smb-cmd-fixed.txt *
#0:record[_path:string,ts:time,peer:bstring,mem:uint64,pkts_proc:uint64,bytes_recv:uint64,pkts_dropped:uint64,pkts_link:uint64,pkt_lag:duration,events_proc:uint64,events_queued:uint64,active_tcp_conns:uint64,active_udp_conns:uint64,active_icmp_conns:uint64,tcp_conns:uint64,udp_conns:uint64,icmp_conns:uint64,timers:uint64,active_timers:uint64,files:uint64,active_files:uint64,dns_requests:uint64,active_dns_requests:uint64,reassem_tcp_size:uint64,reassem_file_size:uint64,reassem_frag_size:uint64,reassem_unknown_size:uint64,_write_ts:time]
0:[stats;1425565512.943615;bro;87;58;20905;-;-;-;410;13;0;0;1;0;0;1;36;31;0;0;0;0;0;0;0;0;1425565512.943615;]
...

The comments in print-types.zeek imply there's limits to how it deals with recursion, so perhaps the nested records like id inside referenced_file are a source of trouble.

Since this particular Zeek log isn't even enabled as default, this probably needn't be a high priority. However, since this same symptom might be lurking among other Zeek logs that aren't on by default, we may want to give it some consideration before too long, since use of zq with Zeek JSON is likely to start soon, and we expect to guide users through running print-types.zeek as needed to generate their own custom schemas.

@philrz philrz added the bug Something isn't working label Apr 16, 2020
@henridf henridf changed the title print-types.zeek: unusable typing config for Zeek's smb_cmd log print-types.zeek: handle nesting more than two levels deep Apr 24, 2020
@henridf henridf added bug Something isn't working and removed bug Something isn't working labels Apr 24, 2020
@henridf henridf removed their assignment Apr 27, 2020
brim-bot pushed a commit to brimdata/zui that referenced this issue Apr 3, 2021
…ilrz

This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#2489, authored by @philrz,
has been merged.

Add openflow logs to reference Zeek shaper

While checking if logs generated by Zeek v4.0.0 and v3.2.4 are still compatible with our published reference configs, I was reminded of the existence of brimdata/zeek#15 which was preventing the auto-generation of legacy `types.json` config that covered Zeek's `openflow` logs. Since the future lies in Zed shapers, however, I looked into what it would take to cover the logs in that approach instead.

I found an example Zeek openflow log at [https://github.com/zeek/zeek/blob/master/testing/btest/Baseline/scripts.base.frameworks.openflow.log-basic/openflow.log](https://github.com/zeek/zeek/blob/master/testing/btest/Baseline/scripts.base.frameworks.openflow.log-basic/openflow.log) and replaced its X'ed out timestamps with real ones, producing attached test file [openflow.log.gz](https://github.com/brimdata/zed/files/6251146/openflow.log.gz). From that it was easy to grab a type definition that's added in the shaper in this PR (zq commit `e113e89e` is currently in use):

```
$ zq -Z 'count() by typeof(.)' openflow.log.gz
{
    typeof: ({_path:string,ts:time,dpid:uint64,match:{in_port:uint64,dl_src:bstring,dl_dst:bstring,dl_vlan:uint64,dl_vlan_pcp:uint64,dl_type:uint64,nw_tos:uint64,nw_proto:uint64,nw_src:net,nw_dst:net,tp_src:uint64,tp_dst:uint64},flow_mod:{cookie:uint64,table_id:uint64,command:zenum=(string),idle_timeout:uint64,hard_timeout:uint64,priority:uint64,out_port:uint64,out_group:uint64,flags:uint64,actions:{out_ports:[uint64],vlan_vid:uint64,vlan_pcp:uint64,vlan_strip:bool,dl_src:bstring,dl_dst:bstring,nw_tos:uint64,nw_src:ip,nw_dst:ip,tp_src:uint64,tp_dst:uint64}}}),
    count: 7 (uint64)
} (=0)
```

This actually turns out to be a handy log to test with because it happens to be the first log type we've encountered that actually uses the Zeek `subnet` type, which of course translates to the Zed `net` type. Testing out the modified shaper on an NDJSON version of the log reminded me that brimdata/super#2113 is still an open issue, so these subnets will have to be content to live as strings for the moment.

```
$ zq -z -I ~/work/zed/zeek/shaper.zed openflow.ndjson
cast to net not implemented
{_path:"openflow",dpid:42,flow_mod:{actions:{dl_dst:null (null),dl_src:null (null),nw_dst:null (null),nw_src:null (null),nw_tos:null (null),out_ports:[3,7],tp_dst:null (null),tp_src:null (null),vlan_pcp:null (null),vlan_strip:false,vlan_vid:null (null)} (=0),command:"OpenFlow::OFPFC_ADD",cookie:4398046511105,flags:0,hard_timeout:0,idle_timeout:0,out_group:null (null),out_port:null (null),priority:0,table_id:null (null)} (=1),match:{dl_dst:null (null),dl_src:null (null),dl_type:null (null),dl_vlan:null (null),dl_vlan_pcp:null (null),in_port:null (null),nw_dst:null (null),nw_proto:null (null),nw_src:null (null),nw_tos:null (null),tp_dst:null (null),tp_src:null (null)} (=2),ts:"1970-01-01T00:00:00Z"} (=3)
{_path:"openflow",dpid:42,flow_mod:{actions:{dl_dst:null (null),dl_src:null (null),nw_dst:null (null),nw_src:null (null),nw_tos:null (null),out_ports:[] (4=([null])),tp_dst:null (null),tp_src:null (null),vlan_pcp:null (null),vlan_strip:false,vlan_vid:null (null)} (=5),command:"OpenFlow::OFPFC_ADD",cookie:4398046511147,flags:0,hard_timeout:0,idle_timeout:30,out_group:null (null),out_port:null (null),priority:5,table_id:null (null)} (=6),match:{dl_dst:null (null),dl_src:null (null),dl_type:2048,dl_vlan:null (null),dl_vlan_pcp:null (null),in_port:null (null),nw_dst:"74.53.140.153/32",nw_proto:6,nw_src:"10.10.1.4/32",nw_tos:null (null),tp_dst:25,tp_src:1470} (=7),ts:"2018-03-24T17:15:21.255387Z"} (=8)
...
```

I also found a couple little formatting glitches in the reference shaper config that I fixed along the way.
@philrz
Copy link
Author

philrz commented Apr 1, 2024

The kind of shaping config output from this Zeek script is no longer supported in Zed, so it seems unlikely we'd ever go back and add this enhancement. While I've found this script still sometimes useful for getting a quick summary of additional default log types added in new Zeek versions, I've grown accustomed to performing the minimal, manual surgery to graft on deeply nested fields in modern Zed type definitions. Therefore I'm going to close out this issue.

@philrz philrz closed this as not planned Won't fix, can't repro, duplicate, stale Apr 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants