New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using binary type and doc_values=true makes doc() lookups completely unusable #14469
Comments
I'm curious what your use-case is, for now the only use-case I knew about for doc values on binary fields was an image plugin for elasticsearch (#5669) and doc values were consumed directly through the plugin, bypassing the scripting layer. |
Well I have an index that contains +/- 100M documents, the data that is indexed are "persons" and a typically search is : An array of all questions I have answered will look like (encoded in 64 bit ints: ) [-8989044006797179904,5629656300523520,0,2323857442082914304,36028797287432192,153122456050075656,8388640,144115188075855872,158329681740288,1099511627776,274877906944,34632368128,8796093022208,8623497224,3467807172325277696,0,274945015808,2305843009213693984,8192,576460752303423488,144116287587549184,0,128,4096,2147483648,0,36028797018964992,0,2161728080043835392,536870912
] And for an other person like: [-8989040706113233866,5629656858499072,3499296910466940960,2323857992107163712,45036563478904864,1306043995025050154,2199031669792,180143985099014144,562949960761856,36047626155590656,274877906952,34632384512,1225551944202847296,4611967502027857928,3756319573209513984,1477180677777523712,9075601440770,2377900603251622304,134225920,612489549322420544,2449959296801276032,2305843009213693952,128,100732928,576601492006502400,22517998271135872,36028797018963968,288230376151711744,6773554836496449536,3149824
] I first stored the data as a long array in elastic, but you cannot rely on the order as doc_values will be ordered low to high, byte[] decodeString = Base64.getDecoder().decode(encodedString);
ByteBuffer byteBuf = ByteBuffer.wrap( decodeString );
byteBuf.order( ByteOrder.BIG_ENDIAN );
LongBuffer longBuf = byteBuf.asLongBuffer();
long[] questions = new long[numberOfQuestions];
longBuf.get(questions); I found a workaround for now, to store it as "text" with: |
Thanks for explaining the use-case. Unrelated to binary doc values but I'm wondering that storing the questions ids directly could be a better option both in terms of storage and runtime. You could take ids from the shortest array and then use galloping search to find common ids in the other array? Otherwise I agree that we should either document the limitations with doc values on binary fields or add support so that you can at least use them in scripts. |
(ES 2.0 )
In the documentation it states that you can fetch binary doc_values by using "doc_values": true, for that type: https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html.
When trying to fetch this in a java plugin I always get unsupported_operation_exception, thrown by UnsupportedOperationException, BytesBinaryDVAtomicFieldData line 104:
you can reproduce it by creating a simple index and adding some documents with binary data
Writing a simple plugin that just calls doc() or calls doc().get('qa_data') will throw exceptions immediately.
The text was updated successfully, but these errors were encountered: