New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate results when using a Scrolled Search #60
Comments
I think you may be running into this bug in Elasticsearch elastic/elasticsearch#8788 Is this data you have indexed newly on 1.4.4, or are they documents that you indexed on some older version? |
Hi @clintongormley, The index is brand new on both versions. I don't seem to be able to replicate the problem just using curls only when I use Search::Elasticsearch. |
@Edward-Francis Please could you do the following: Run this query and send the output:
Turn on trace logging, run your scroll request (which generates duplicates) and send me the logs, eg:
thanks |
Hi @clintongormley, I've had to change the id because that id has been lost during re-indexing. I've noticed that when I remove the sort clause it seems to be fine. Could this be the cause? The curl:
And the logging:
|
@clintongormley, any idea on this? |
I'm unable to replicate this locally on 1.4.3 or 1.5.0. You're sorting on a Either way, the problem is in Elasticsearch, not with the Perl API. Please could you reopen this ticket there (and it'd be great if you could provide a full recreation, if possible) thanks |
@Edward-Francis are you by any chance using logstash in your cluster? See elastic/elasticsearch#10244 (comment) for the reason I ask |
@clintongormley: I written a little script to test this issue - works as expected on v1.1.1 but not on v1.4.4. But I still can't replicated just through curls. We are also using Logstash on the clusters use v5.16.3;
use strict;
use warnings;
use DDP;
use Search::Elasticsearch;
my $es = Search::Elasticsearch->new(
nodes => [
'w-play-dev-es-1:9200', 'w-play-dev-es-2:9200',
'w-play-dev-es-3:9200',
],
trace_to => ['File', '/tmp/es_output'],
);
my $index = 'my_index';
my $type = 'my_type';
eval { $es->indices->delete( index => $index ) };
$es->indices->create(
index => $index,
body => {
mappings => {
$type => {
properties => {
id => { type => 'integer' },
(
map {
$_ =>
{ type => 'string', index => 'not_analyzed' }
} qw/name email country city
/,
),
}
}
}
}
);
for ( data() ) {
$es->index(
index => $index,
type => $type,
id => $_->{id},
body => $_,
);
}
sleep(1);
for ( 1 .. 5 ) {
say "------";
my $scroll = $es->scroll_helper(
index => $index,
type => $type,
size => 500,
body => query(),
);
say "Total hits: " . $scroll->total;
my @results;
while ( $scroll->refill_buffer ) {
push @results, $scroll->drain_buffer;
}
say "Total results: " . scalar @results;
}
sub query {
return {
query => { term => { id => 3 } },
sort => [ { city => { order => 'asc' } } ],
};
}
sub data {
return (
{ id => 1,
name => "Christopher Schmidt",
email => "cschmidt0\@meetup.com",
country => "Cameroon",
city => "Douala"
},
{ id => 2,
name => "Gloria Banks",
email => "gbanks1\@joomla.org",
country => "Argentina",
city => "Libertador General San Martín"
},
{ id => 3,
name => "Elizabeth Shaw",
email => "eshaw2\@huffingtonpost.com",
country => "Armenia"
},
{ id => 4,
name => "Anna Fisher",
email => "afisher3\@wikipedia.org",
country => "Indonesia"
},
{ id => 5,
name => "Nicholas Ford",
email => "nford4\@trellian.com",
country => "Yemen"
},
{ id => 6,
name => "Terry Sanders",
email => "tsanders5\@cnn.com",
country => "China"
},
{ id => 7,
name => "Susan Shaw",
email => "sshaw6\@nba.com",
country => "Russia",
city => "Kislovodsk"
},
{ id => 8,
name => "Sara Flores",
email => "sflores7\@nytimes.com",
country => "Brazil",
city => "Arapongas"
},
{ id => 9,
name => "Mark White",
email => "mwhite8\@statcounter.com",
country => "China",
city => "Bayan Hure"
},
{ id => 10,
name => "Cynthia Medina",
email => "cmedina9\@miitbeian.gov.cn",
country => "Russia"
}
);
} |
Many thanks for the recreation. the problem is indeed because of the older version of Elasticsearch that logstash is using, ie the same as elastic/elasticsearch#10244 If you change the |
Ah great - thank you! |
When using the scroll search I am getting duplicate results. I am expecting 1 document to be returned with my query, but it returns either one, two or three documents. The documents returned are exactly the same and have the same ID.
This is what I am doing:
Response:
We currently have 2 clusters on different versions - the code works as expected on Elasticsearch 1.1.1 but not on Elasticsearch 1.4.4.
The text was updated successfully, but these errors were encountered: