-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JSON as on-disk format for histograms #1854
Conversation
Hi @idoqo! Good progress. What your commit is missing is a test case showcasing the functionality. If you haven't figured out how to write test cases yet, you can read this: |
sql/sql_statistics.h
Outdated
@@ -179,6 +182,7 @@ class Histogram | |||
case SINGLE_PREC_HB: | |||
return (uint) (((uint8 *) values)[i]); | |||
case DOUBLE_PREC_HB: | |||
case JSON: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently the code behaves identical for JSON type and DOUBLE precision. That's fine for a first iteration.
Do keep in mind when going to the actual implementation if the get_value() method will make sense with the "uint i" as a parameter for JSON histograms.
@@ -154,6 +155,7 @@ class Histogram | |||
case SINGLE_PREC_HB: | |||
return ((uint) (1 << 8) - 1); | |||
case DOUBLE_PREC_HB: | |||
case JSON: | |||
return ((uint) (1 << 16) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok for now, but in the future this code should be removed as the concept of prec_factor
doesn't apply to Histograms that are stored as JSON.
Ok, this is good for Milestone-1 |
b00f809
to
0d082d2
Compare
0d082d2
to
0f9552a
Compare
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the memory allocation for json histogram builder and add more column types for testing. Some challenges at the moment include: * Garbage value at the end of JSON array still persists. * Garbage value also gets appended to bucket values if the column is a primary key. * There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Preparation for handling different kinds of histograms: - In Column_statistics, change "Histogram histogram" into "Histogram *histogram_". This allows for different kinds of Histogram classes with virtual functions. - [Almost] remove the usage of Histogram->set_values and Histogram->set_size. The code outside the histogram should not make any assumptions about what/how is stored in the Histogram. - Introduce drafts of methods to read/save histograms to/from disk.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
A demo of how to use in-memory data structure for histogram. The patch shows how to * convert string form of data to binary form * compare two values in binary form * compute a fraction for val in [X, Y] range. grep for GSOC-TODO for notes.
This fixes the wrong calculation for avg_frequency in json histograms by replacing the specific histogram objects with the generic Histogram_base class. It also restores get/set size functions as they were useful in calculating fields for binary histogram. Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
* it also adds an "explain select" statement to the test so that the fprintf calls can print the computed intervals to mysqld.1.err Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
* Also merges tests relating to JSON statistics into one file Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
2819cbb
to
363aeec
Compare
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
@idoqo, @spetrunia is now working on getting Json histograms into 10.8. As such the PR can be considered merged! Thank you for working on this. I am closing this PR given the current status. If there are fixes to add, please sync with @spetrunia on which branch to base them on and open a new PR for those. |
@idoqo And just to clarify what I mean: The work you have done has made it as a preview release in 10.7.0, but it didn't make the cut into 10.7.1, due to regressions we have identified. @spetrunia is tracking the progress of the task here: https://jira.mariadb.org/browse/MDEV-21130 with different subtasks for current outstanding parts. You are more than welcome to contribute to those. The work is now happening on this branch: |
Got it, thanks for the heads up! |
This adds support for JSON as an on-disk format for histograms as described in MDEV-21130.
At the moment, JSON histograms can now be enabled with:
set histogram_type=JSON;
and runningANALYZE TABLE
as usual.Some improvements over the existing histogram format is also tracked at MDEV-26125.
Signed-off-by: Michael Okoko okokomichaels@outlook.com