Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert schema to JSON for Gemini #14

Closed
grighetto opened this issue Sep 24, 2020 · 5 comments
Closed

Convert schema to JSON for Gemini #14

grighetto opened this issue Sep 24, 2020 · 5 comments
Assignees
Labels
Milestone

Comments

@grighetto
Copy link
Collaborator

grighetto commented Sep 24, 2020

┆Issue is synchronized with this Asana task by Unito

@grighetto grighetto added this to the Gemini Robustness milestone Sep 24, 2020
@grighetto
Copy link
Collaborator Author

DEPENDS ON #12

Let's first see the outcome of #12 before we go down this route.

@aboudreault
Copy link
Contributor

@grighetto about the --drop-schema option: This basically just ensure that the keyspace ks1 (default name) is dropped and recreated at start. If I create a custom ks1.table1 table and omit the -d option, gemini fails to execute the statements because it tries to use its default schema (unless I provide one).

So I'm afraid we'll need to implement a --json option to our anonymizer if we want to use gemini. You probably looked more at Gemini than me, am I missing something here? If not, our options are:

1- Implement the json option to the anonymizer. One concern I have about this is that we don't have a clear vision of what the gemini schema format supports. So it's hard to know if we'll hit issues in the future about unsupported things, like table options or data types.
2- Help the nosqlbench devs to get what we need for providing schema as input and generating statements automatically.

@grighetto
Copy link
Collaborator Author

grighetto commented Sep 28, 2020

@aboudreault right, we need to generate the JSON file either way if we want to use Gemini (more details in #12). I only suggested using the --drop-schema=false to make sure Gemini will not recreate the schema, that is, it will use the schema we create beforehand with the output from the anonymizer. This way we guarantee NoSQLBench and Gemini will operate on the exact same schema.

To answer your questions, I think the simplest solution at the moment is probably generating the JSON file for Gemini. You can gain some insight on the format by letting it generate a random schema, which it does by default if you don't provide one and then checking the JSON schema it prints to the console at startup.
There's some work happening already on the NoSQLBench side to generate the statements automatically, but that's more involved. Let's start with Gemini and add NoSQLBench at a later moment.

@aboudreault
Copy link
Contributor

aboudreault commented Sep 28, 2020

Ok I see now what you were suggesting.

1- Use the anonymizer to generate the cql schema file.
2- Load the cql schema file on the clusters.
3- use gemini with -d=false to use the schema already loaded in the cluster. This ensures gemini doesn't recreate stuff, which could potentially create inconsistencies between the CQL schema and the generated one.
4- Provide the appropriate schema.json to gemini, so it is able to generate statements etc.

Sounds good. I will start looking at this.

@aboudreault
Copy link
Contributor

For the MVP, udt types are going to be skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants