Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show force merge/optimize progress #15975

Open
markwalkom opened this issue Jan 14, 2016 · 16 comments
Open

Show force merge/optimize progress #15975

markwalkom opened this issue Jan 14, 2016 · 16 comments
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement help wanted adoptme Team:Distributed Meta label for distributed team

Comments

@markwalkom
Copy link
Contributor

Are we able to expose this somehow, maybe pending tasks?
At the moment it just runs until finished, but I'm hoping there may be a way to figure out some kind of progress percentage, even a rough one.

I realise this is all a little vague, but I don't really know if it's possible, just that it'd be nice to have :p

@clintongormley
Copy link

If it can be done, it'll be via the task management API. Closing in favour of #15117

@imotov
Copy link
Contributor

imotov commented Jun 15, 2018

I think we dropped this issue on the floor at the end of the task management task. Now that the task management framework is in the place, should we revisit this issue?

@imotov imotov reopened this Jun 15, 2018
@oleg-andreyev
Copy link

@imotov I'm not an expert in elasticsearch internal things, but here are my though as consumers:

  • force merge action should have wait_for_completion flag as reindex action, if it is possible, this way I as consumer would send simple http request with wait_for_completion=false and won't wait for completion, because I could monitor force merge from _tasks endpoint
  • also it would be great to know if force merge is cancelable? Force merge should be cancellable #17094

@colings86 colings86 added the :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. label Jun 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor

bleskes commented Jul 23, 2018

@jpountz is there any way to track progress of a user driven forced merge?

@jpountz
Copy link
Contributor

jpountz commented Jul 27, 2018

Not everything is exposed publicly today, but we could expose information about ongoing merges. Giving progress information is more complex due to the fact that Lucene doesn't merge document by document but data-structure by data-structure, eg. the inverted index of _id, then the inverted index of foo then doc values of seq_no, then doc values of foo, etc.

@redlus
Copy link

redlus commented Feb 10, 2019

I understand this is a bit more complex than tracking the progress of other tasks in the cluster, but even showing the number of indices which already finished (in a multi-index force-merge) and the number of data-structures (or doc_values) already processed would be nice. Is there any plan to add this functionality?

Thanks

@cdenneen
Copy link

using the tasks api to see that a forcemerge might still be running doesn't really tell us anything as far as % completion to give estimation... when this is combined with something like curator to forcemerge a bunch of indexes that meet criteria it would be useful to know when these will end so they don't step on each other or cause other contention issues.

@ywelsch ywelsch added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. labels Mar 19, 2019
@SpencerLN
Copy link

The ability to at least understand what index is currently merging would be very useful. We have had users accidentally start of a force merge for an index actively being written to and there was no apparent way to identify which index was being operated on by a given task.

martijnvg pushed a commit to martijnvg/elasticsearch that referenced this issue Aug 5, 2019
This is static information that is part of the force merge request.

Relates to elastic#15975
martijnvg added a commit that referenced this issue Aug 8, 2019
* Add description to force-merge tasks (#41365)

This is static information that is part of the force merge request.

Relates to #15975
@pedrosk
Copy link

pedrosk commented Oct 22, 2019

Since some large indices with many segments may take for a long time to force merge, it would be beneficial to:

  1. see the task existing in a LIST (!) and being able to see its status (eg. in progress, failed, etc)

I feel it is essential to be able to know what is happening with forge merges that take extended amount of time.

@rjernst rjernst added the Team:Distributed Meta label for distributed team label May 4, 2020
@tanandy
Copy link

tanandy commented Jul 2, 2020

Hi, thats something we need too

@phalverson
Copy link

Even something as high level as whether the merge is running or complete would be huge. I don't need to have a play-by-play on what's going on in Lucene, I just need to know when I can start indexing documents again. Please address this soon.

@Ankithas93
Copy link

Ankithas93 commented Sep 7, 2020

I am able to see merge: current by running below command:
curl -XGET 'http://els2002:9200/_stats?pretty'

  "merges" : {
    "current" : 0,

Is this the one to monitor to check the current merge in progress?
I am running 7.8.0

@DaveCTurner
Copy link
Contributor

The task list contains a node-level task for each node holding shards that are to be force-merged. I think we could decorate this task with things like how many shards it targets, how many of those shard-level tasks are complete, the shard of the currently-running task, and how long that shard-level task has been running. We have all of that information available I think, it's more a question of exposing it in the task list.

More detailed progress from Lucene would be nice too, but that's a much larger ask, so let's rule that out of scope for now.

I'm marking this as help wanted to invite community contributions as this isn't something we expect to work on in the near future. I don't think it's an enormous job.

@DaveCTurner DaveCTurner added the help wanted adoptme label May 11, 2021
@ruslaniv
Copy link

ruslaniv commented Apr 19, 2023

But what exactly does current represent:

  1. Number of segments left to merge. I.e. total 29, merged 27, left 2, currently merging segment # 2
  2. Sequential number of the segment it is merging right now. I.e. total segments to merge 29, has merged segment # 1 now merging segment # 2

I really hope it is the former because the merge process been running for 7 days now and there is absolutely no way currently to check if anything is happening and how much time I still have to wait for the merge to complete.

"merges": {
                    "current": 2,
                    "current_docs": 28652980,
                    "current_size_in_bytes": 157314725265,
                    "total": 29,

I wish there was detailed explanation of index_name/_stats/merge results somewhere

@przemekwitek
Copy link
Contributor

there is absolutely no way currently to check if anything is happening and how much time I still have to wait for the merge to complete

+1

I have a similar issue with a large dataset that I'm trying to use in rally.
I don't know whether the _forcemerge progresses or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement help wanted adoptme Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests