Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating bulk import data that covers a subset of the table. #89

Open
keith-turner opened this issue Jul 10, 2019 · 2 comments

Comments

@keith-turner
Copy link
Contributor

keith-turner commented Jul 10, 2019

Would be nice to be able to generate bulk import data for the CI test that covers a subset of the table instead of the entire table. This may be possible with a -o min=y -o max=z config command line options, not sure. If it is possible, could update example test scripts to suggest using it.

@keith-turner
Copy link
Contributor Author

I think for users, importing to a subset of the table is the more common than importing data that covers the entire table.

@keith-turner
Copy link
Contributor Author

There are existing test options test.ci.ingest.row.min and test.ci.ingest.row.max that would support this. What is really needed is the ability to set those on the command line as follows.

./bin/cingest bulk /tmp/bulk-subrange -o test.ci.ingest.row.min=1000000000 -o test.ci.ingest.row.max=2000000000

Other continuous ingest programs support setting test properties on the command line, however the bulk command does not. The scripts and java code need to be tweaked so that the -o options can make it to ContinuousEnv in the map reduce code.

Also it would be much more natural if the min and max option had hex values instead of decimal, because all of the data and split points are hex.

Once the min and max options are supported, the map reduce job should intersect those with the tablet split point only using the split points that fall within the min and max range. If this not done the map reduce job could end up generating empty files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant