Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Histogram #1541

Open
1 task
jameskerr opened this issue Mar 29, 2021 · 3 comments · Fixed by #2785 or #2794
Open
1 task

Generic Histogram #1541

jameskerr opened this issue Mar 29, 2021 · 3 comments · Fixed by #2785 or #2794
Assignees

Comments

@jameskerr
Copy link
Member

jameskerr commented Mar 29, 2021

Create a new histogram component to be used when the search results are clearly not "Zeek" or analytics. Keep the old one to render when the resulting schemas contain a "ts" and a "_path" field.

The underlying query will be <current filter> | every <lake key> count() by typeof(.)

  • Use a color mapping to apply to the data shapes automatically
@jameskerr jameskerr added this to the Data MVP0 milestone Mar 29, 2021
@jameskerr jameskerr self-assigned this Mar 31, 2021
@philrz philrz modified the milestones: Data MVP0, Data MVP1 May 10, 2021
@jameskerr jameskerr modified the milestones: ETL Lake, v0.27.0 Oct 7, 2021
@philrz philrz modified the milestone: v0.27.0 Oct 8, 2021
@mason-fish mason-fish removed this from the v0.27.0 milestone Oct 26, 2021
@mccanne
Copy link
Contributor

mccanne commented Nov 3, 2021

shapes barchart

@philrz
Copy link
Contributor

philrz commented May 25, 2023

We had a longer team discussion about this one. There was consensus that generalizing the existing histogram so it would work with any time-typed field would be a logical minimum next step. However, we discussed in more detail whether it would make sense to handle pool keys of any Zed type and what this would mean in terms of the "bucketing" to generate the values that would populate the histogram bars. In terms of how other solutions approach the problem, one example is this width_bucket() function that allows a specified number of buckets to be created across any min/max range, though in that case only numeric types are supported. We debated for a bit if this made sense for string types for instance. @mccanne pointed out how a mathematical approach could be applied where a function could determine for each type where a value falls in a range [0,1], e.g., a string aaaa is "almost 0" and a string zzzz is "almost 1". Whether users would actually benefit from this specific bucketing shown in histogram form vs. some other visualization for their non-time data is not immediately clear to us. However, the discussion is being recorded in this comment in the event we want to pursue the topic here or spawn a different issue when this one closes.

@philrz
Copy link
Contributor

philrz commented Jun 20, 2023

During a recent group discussion of what we've got thus far in #2785, @mccanne summarized his long-term vision of how he hopes the app can flexibly handle this histogram. (I'm paraphrasing a bit here, but I've got the original wording archived on video if anyone needs to reference it. 😄)

At a high level, I think the goal should be that the UI just figures out what to do without the user having to type in field names. It could do queries to figure out the cardinality of different fields (e.g., since low cardinality fields are probably "interesting fields" for segmentation) and use a heuristic to pick a default. It could similarly look for a time-typed field to use for the X-axis & bucketing. So ~90% of the time it would just do the right thing. Then for the 10% of the time the heuristic makes the "wrong" picks, there'd be simple pull-downs/checkboxes to pick alternatives, e.g., if there's several "interesting fields" or several time-typed values. (It should be noted that we don't currently have an efficient way to do a "count distinct" of every field, but the dictionaries in vcache may be able to provide something adequate here.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants