Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify users of shaper "misses" #2776

Closed
philrz opened this issue Mar 18, 2021 · 2 comments · Fixed by #4686
Closed

Notify users of shaper "misses" #2776

philrz opened this issue Mar 18, 2021 · 2 comments · Fixed by #4686

Comments

@philrz
Copy link
Contributor

philrz commented Mar 18, 2021

#2369 and its fix in #2370 describe an example where the shaper currently imports additional fields that are not covered by the shaping config, assigning them inferred types. In that case it resulted in an IP address being stored as a string rather than an ip type, which has the follow-on effect of type-specific operations not working (e.g. CIDR matches in this case).

This behavior was last visited in #2309. Before that change, such additional fields would have been "cropped". After that change, now they're let in with inferred types. If choosing between these two options, indeed the current "infer" approach seems slightly better, but from a user perspective, they both seem sub-optimal: Whether the unexpected fields are lost entirely or stored incorrectly, I imagine the user will want to know what happened so they can adjust their queries, ask us for help, etc. Before #2298 was merged, the legacy types.json approach at least gave the user a heads-up in the form of a warning ("Unexpected additional field(s) 'tunnel.src_ip' 'tunnel.src_port' 'tunnel.dest_ip' 'tunnel.dest_port' 'tunnel.proto' 'tunnel.depth'. Please email this error message to support@brimsecurity.com.") which is how we got a heads-up about #2369 in the first place. But now we've lost that, hence this issue.

It seems the first order of business will be to just have the warnings generated by zqd and presented to the user in Brim. I expect the ultimate solution will be when we have a fairly rich shaper UX that will allow the user to react to the warnings themselves, which means them being able to see/modify/apply their own shaper configs. But as long as they're stuck with static shapers that we provide, some kind of notification like we had before feels like it would be an improvement.

brim-bot referenced this issue in brimdata/zui Mar 18, 2021
This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/zed#2370, authored by @philrz,
has been merged.

Add tunnel fields to Suricata shaper

brimdata/zed#2369 describes a problem where missing NDJSON typing config for rarely-seen Suricata fields generated a user-facing warning. This PR adds the additional config to the Suricata shaper to cover these fields.

Our switchover from legacy `types.json` to the new shaping configs in the time since the user reported the issue actually changes the symptom, so before I show the fix having its impact, I'll take a step back and show how the symptom was looking as of Brim commit 8569148 which pointed at `zqd` commit 8ab4607. In that case, because of the change in brimdata/zed#2309, the user no longer was shown a warning, but the additional fields were silently imported with inferred types. So for instance, notice how the `tunnel` fields are shown in black font that's used for strings, rather than the blue that's used for the `ip` type fields.

![](https://user-images.githubusercontent.com/5934157/111556887-d2dc9a80-8748-11eb-9a86-329b652f1c1a.png)

More importantly, if the user doesn't notice this subtle difference and tries to treat them like the IP addresses they otherwise _appear_ to be, they get unexpected behavior, such as this failing CIDR match.

![](https://user-images.githubusercontent.com/5934157/111557022-1afbbd00-8749-11eb-9e7e-8258739777fe.png)

Of course, if they catch the problem and do the cast at query time, they can fix it. But this is what we want the shaper to do, hence the changes in this PR.

![](https://user-images.githubusercontent.com/5934157/111557058-2b139c80-8749-11eb-9ca2-746635067188.png)

If I start from the same Brim commit 8569148 and advance my `zq` pointer to 6613aba to use the shaper in this branch, then re-import, now we get the correct typing out of the gate.

![](https://user-images.githubusercontent.com/5934157/111557351-bc830e80-8749-11eb-897c-b43862ed9d54.png)

![](https://user-images.githubusercontent.com/5934157/111557392-cf95de80-8749-11eb-9d99-144e8479d31a.png)

A couple additional details:

1.  The sample data I'm using from brimdata/zed#2369 doesn't happen to generate the `tunnel.src_port` and `tunnel.dest_port` fields in the Suricata EVE JSON. However, since the original user's issue showed these being generated, I've included them here in the shaping config.
2.  Since we've been through so many example pcaps before seeing Suricata generates this field, some may wonder if it makes sense to make the current single schema even wider to always make room for them as opposed to letting multiple shapes get created. However, we've had users recently be confused by the disappearing columns that result from unexpected multiple shapes (#967). Therefore, until we have new UX in Brim that makes it easier for users to browse their data in such situations and/or also apply+modify their own shaper modifications, I'm keen to maintain the single schema.
3.  While this now covers us for all Suricata fields we're aware of for `alert` events, the fact the user was not being informed these `tunnel` fields were being imported with inferred types seems hazardous. I've opened separate issue brimsec/brim#1519 regarding this, as it seems like something we'll need to address in the wider context of shaper UX.

 Fixes brimdata/zed#2369
@philrz
Copy link
Contributor Author

philrz commented Mar 31, 2021

@mccanne had a slightly different take when reviewing this one. In addition to "crop" or "infer" he'd also imagined yet another approach where shaper "misses" are still stored in the lake but are marked as "unclassified" (e.g. tagged in a way that reflects that it was not a perfect match for a shaper rule and hence might need to be dealt with). This seems to imply workflows where users could periodically browse the unclassified data in their lake and re-shape/re-store it, delete it, etc. That said, I still imagine users may want to know ASAP if this has happened so they're aware that such data is present to be browsed, so the idea of notifications still seems to apply.

@philrz philrz transferred this issue from brimdata/zui Jun 1, 2021
@philrz philrz added this to the Data MVP1 milestone Jun 1, 2021
@philrz philrz removed this from the ETL Lake milestone Oct 25, 2022
@philrz
Copy link
Contributor Author

philrz commented Jun 29, 2023

First-class errors have been a part of the language for some time and provide an approach similar to the one attributed to @mccanne in the last comment above. Linked PR #4686 added some user-facing docs that describe how to make use of Zed errors to get the kind of notice of shaper "misses" as described in this issue. The docs show multiple approaches based on how severe the user thinks a shaper "miss" is for their use case, i.e., whether an entire record becomes an error value or just individual problematic fields. The subject of how user could be "notified" when such errors accumulate is fairly orthogonal, but mechanisms already exist such as running frequent has_error() checks (as proposed in the docs) on recently-loaded data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant