Percentage discrepancy when creating "quick values" pie chart with large data table. #2639
Comments
Good point, the result apparently ignores the sum of the long tail terms in the result set. Thanks for your comprehensive report! |
For reference, we need to take into account the sum_other_doc_count return value of the aggregation.
|
Turns out this was purely a display bug with the pie chart, the data table was ok. |
kroepke
added a commit
that referenced
this issue
Aug 11, 2016
the others group did not take into account the other document groups which were outside of the first 45 "other" buckets this led to incorrect rendering of pie chart slices fixes #2639
edmundoa
added a commit
that referenced
this issue
Aug 12, 2016
the others group did not take into account the other document groups which were outside of the first 45 "other" buckets this led to incorrect rendering of pie chart slices fixes #2639
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When analyzing the flow logs from my firewall and building a graph of IDS alerts centered around "source_address" (source IP), I'll get a pie graph and a data table (obviously). The problem is this. Often times, when creating the query, there may be 100 or more unique values for "source_address".
Expected Behavior
One would expect the percentages for a given value (in my case, source_address) shown visually, to be the same from the pie graph, to the data table below.
Current Behavior
If you have more than 50 unique data values for the query in the field used to create your pie graph, then you'll have a discrepancy between the pie graph and the data table on the dashboard widget. The data table appears to still build it's percentage based on the entire query results. (all 100+ IP addresses)
However, Graylog only shows 50 results for source_address in the data table. The problem comes in when the pie graph appears to calculate the percentage for that value (in my case, source_address) based only on the 50 source_addresses in the displayed data table (and not on the full query results).
Possible Solution
Would suggest that the pie graph should also be calculated / drawn based on the percentage from the full query results so that the numbers there visually match what is displayed in the data table (i.e. If the data table says that IP number 10.10.16.1 accounted for 18% of the results, then that slice of the pie should visually represent about 18% of the pie graph.
Steps to Reproduce (for bugs)
Context
Our use case is based on using Juniper SRX firewall logs. We capture Intrusion Detection (IDS) logs and then build a dashboard item for "IDS alerts by Source IP". This is a "quick values" chart based on "source_address". It usually results in many hundreds of unique values for "source_address" with only a few that are statistically significant (above 3-5%). However the pie graph looks very skewed when compared to the data table.
Your Environment
The text was updated successfully, but these errors were encountered: