Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support JSON as on-disk format for histograms #1854

Closed
wants to merge 32 commits into from

Conversation

idoqo
Copy link
Contributor

@idoqo idoqo commented Jun 9, 2021

This adds support for JSON as an on-disk format for histograms as described in MDEV-21130.

At the moment, JSON histograms can now be enabled with: set histogram_type=JSON; and running ANALYZE TABLE as usual.

Some improvements over the existing histogram format is also tracked at MDEV-26125.

Signed-off-by: Michael Okoko okokomichaels@outlook.com

@cvicentiu
Copy link
Member

Hi @idoqo!

Good progress. What your commit is missing is a test case showcasing the functionality.

If you haven't figured out how to write test cases yet, you can read this:

https://mariadb.org/get-involved/getting-started-for-developers/writing-good-test-cases-mariadb-server/

@@ -179,6 +182,7 @@ class Histogram
case SINGLE_PREC_HB:
return (uint) (((uint8 *) values)[i]);
case DOUBLE_PREC_HB:
case JSON:
Copy link
Member

@cvicentiu cvicentiu Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently the code behaves identical for JSON type and DOUBLE precision. That's fine for a first iteration.

Do keep in mind when going to the actual implementation if the get_value() method will make sense with the "uint i" as a parameter for JSON histograms.

@an3l an3l added this to the 10.6 milestone Jun 10, 2021
@@ -154,6 +155,7 @@ class Histogram
case SINGLE_PREC_HB:
return ((uint) (1 << 8) - 1);
case DOUBLE_PREC_HB:
case JSON:
return ((uint) (1 << 16) - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok for now, but in the future this code should be removed as the concept of prec_factor doesn't apply to Histograms that are stored as JSON.

@spetrunia
Copy link
Member

Ok, this is good for Milestone-1

@CLAassistant
Copy link

CLAassistant commented Aug 22, 2021

CLA assistant check
All committers have signed the CLA.

@idoqo idoqo changed the title Prepare JSON as valid histogram_type Support JSON as on-disk format for histograms Aug 23, 2021
idoqo and others added 20 commits August 23, 2021 11:05
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Preparation for handling different kinds of histograms:

- In Column_statistics, change "Histogram histogram" into
  "Histogram *histogram_".  This allows for different kinds
  of Histogram classes with virtual functions.

- [Almost] remove the usage of Histogram->set_values and
  Histogram->set_size. The code outside the histogram should
  not make any assumptions about what/how is stored in the Histogram.

- Introduce drafts of methods to read/save histograms to/from disk.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.

grep for GSOC-TODO for notes.
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.

It also restores get/set size functions as they were useful in calculating fields
for binary histogram.

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
* it also adds an "explain select" statement to the test so that the fprintf calls
  can print the computed intervals to mysqld.1.err

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
* Also merges tests relating to JSON statistics into one file

Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
@idoqo idoqo changed the base branch from 10.6 to 10.7 August 23, 2021 10:08
@idoqo idoqo marked this pull request as ready for review August 23, 2021 10:09
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
@cvicentiu
Copy link
Member

cvicentiu commented Nov 26, 2021

@idoqo, @spetrunia is now working on getting Json histograms into 10.8. As such the PR can be considered merged! Thank you for working on this.

I am closing this PR given the current status.

If there are fixes to add, please sync with @spetrunia on which branch to base them on and open a new PR for those.

@cvicentiu cvicentiu closed this Nov 26, 2021
@cvicentiu
Copy link
Member

@idoqo And just to clarify what I mean:

The work you have done has made it as a preview release in 10.7.0, but it didn't make the cut into 10.7.1, due to regressions we have identified.

@spetrunia is tracking the progress of the task here: https://jira.mariadb.org/browse/MDEV-21130 with different subtasks for current outstanding parts. You are more than welcome to contribute to those.

The work is now happening on this branch:
https://github.com/MariaDB/server/tree/preview-10.7-MDEV-26519-json-histograms

@idoqo
Copy link
Contributor Author

idoqo commented Nov 29, 2021

Got it, thanks for the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants