Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Add an option to set import strategy #77

Open
yuce opened this issue May 23, 2018 · 2 comments
Open

Add an option to set import strategy #77

yuce opened this issue May 23, 2018 · 2 comments
Assignees

Comments

@yuce
Copy link
Contributor

yuce commented May 23, 2018

The Go client supports importing using different strategies. It may be useful to have an option to specify that

@yuce yuce self-assigned this May 23, 2018
@jaffee
Copy link
Member

jaffee commented May 24, 2018

I'd rather not add more API surface area right now (either here, or in the Go client). I'm not convinced that having multiple strategies to choose from is a good user experience. I'm thinking that we can combine the two (batch and timeout) so that normally batch is used, but there is a timer that fires if data has been sitting buffered for too long so that there is a cap on the lag between data coming in and being indexed.

@travisturner
Copy link
Member

Adding to this a little bit, I ran into an interesting case importing batches of records which did not behave the way I was expecting (not wrong necessarily, but unexpected).

Importing 3M records (columns 0 -> 3M-1) with BatchSize = 1M resulted in the following batch pattern. It was unexpected because I was expecting every post to the Pilosa server to contain 1M records. But what actually occurs is that 1M records are mapped to the appropriate slice, and then all slices are posted. This resulted in posts containing various sizes.

IMPORT: slice: 0, records: 1000000
IMPORT: slice: 0, records: 48576
IMPORT: slice: 1, records: 951424
IMPORT: slice: 0, records: 0
IMPORT: slice: 1, records: 97152
IMPORT: slice: 2, records: 902848

(note that FeatureBaseDB/go-pilosa#142 fixes the 0 records issue)

I realize that waiting until a slice has 1M records before posting may not be ideal either, especially in the case where columns are set randomly across many slices.

So I agree with @jaffee, we need think through the batch strategy and be smart about it ourselves. Putting it on the user will likely result in unexpected and/or poor performance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants