Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hypopg to simulate dropping existing indexes #77

Closed
mbanck opened this issue Jan 17, 2023 · 12 comments
Closed

Use hypopg to simulate dropping existing indexes #77

mbanck opened this issue Jan 17, 2023 · 12 comments

Comments

@mbanck
Copy link

mbanck commented Jan 17, 2023

Would it be possible to have hypopg simulate dropping existing indexes, to see how the query plans change if that is done? It is my understanding that this is not possible right now (or maybe at all?), but I did not find a prior issue discussing this while browsing through the list of closed issues.

@rjuju rjuju self-assigned this Jan 17, 2023
@rjuju rjuju added the question label Jan 17, 2023
@rjuju
Copy link
Member

rjuju commented Jan 17, 2023

Hi,

hypopg doesn't let you do that, but the feature isn't hard to do and there's actually an existing project providing just that: https://github.com/postgrespro/plantuner

Syntax
plantuner.forbid_index (deprecated)
plantuner.disable_index - List of indexes invisible to planner
plantuner.enable_index - List of indexes visible to planner even they are hided by plantuner.disable_index.

That's why I never implemented it in hypopg.

@mbanck
Copy link
Author

mbanck commented Feb 21, 2023

Right, but note that hypopg is now shipped in Debian/Ubuntu/RHEL via *.postgresql.org and on several major managed platforms and plantuner isn't (and can't be installed on the latter) so it would still be useful to have this in hypopg in my opinion.

@rjuju
Copy link
Member

rjuju commented Feb 21, 2023

That makes sense. I will work on adding such a feature. What do you think of this API:

  • hypopg_hide_index(oid)
  • hypopg_unhide_index(oid)
  • hypopg_unhide_all_indexes()
  • hypopg_hidden_indexes (a view)

Most specific usage could be done directly at the SQL level. Note that all of those would be backend-local only, so it wouldn't be enough to globally hide an index. Is that something you would need? If yes, I would have to handle a set of GUC as done in plantuner.

@mbanck
Copy link
Author

mbanck commented Feb 21, 2023

So far I am using hypopg in one session only and would assume hiding real indexes would work just like creating hypothetical ones - the latter are backend-local as well, aren't they? So GUCs to make hiding global would not be necessary in my opinion, especially as that would work only with EXPLAIN and might then confuse other users.

Right now, it is pretty obvious that the planner chooses a hypopg index based on the name, but if indexes were hidden globally, I guess other people would wonder a lot about why that index isn't being used by the planner if they are unaware of the hypopg.hide_index GUC.

Or maybe I misunderstood how that would work, I have to admit that I'm not a hypopg poweruser so far :)

@rjuju
Copy link
Member

rjuju commented Feb 21, 2023

Well, my understanding is that people need this kind of feature because they're not so sure whether an index can be safely drop or not, and also because they often don't really have good QA environment. So what they want is pretend to have dropped the index globally, to make sure that every part of the app is tested, and if then the phone rings panic and make the index magically reappears instantly.

So while I obviously don't think it's a sensible approach, I'm wondering if people actually need this kind of scenario, and therefore a GUC rather than a per-session setting (along with extra session_preload_libraries setting and anything else you would need).

I guess that it could be done later if needed, but knowing it now would help to choose better name, to avoid something like hidden indexes vs globally / super hidden indexes or something poor like that.

@mbanck
Copy link
Author

mbanck commented Feb 24, 2023

Hrm I don't know; I guess this could be accomplished right now by the (despised) way of setting the index' indisvalid to false and back? I think this should be something that is in Postgres proper at some point, like ALTER INDEX SET INVALID.

The plantuner (similar to pg_hint_plan) extension seems to be concerned with altering execution plans (which the above use case would fall under in my opinion), while I understood hypopg to be only concerned with query planning. That is why I think doing this globally should not be the case, but you're the maintainer of course, and maybe your vision/project statement is different.

@rjuju
Copy link
Member

rjuju commented Feb 25, 2023

In theory hypopg won't mess with the execution plan as it's only active if you run a plain EXPLAIN (without ANALYZE), but you can definitely hit the same problem as those extensions if you try hard enough. Consider a scenario like this:

rjuju=# set plan_cache_mode = force_generic_plan ;
SET

rjuju=# select hypopg_create_index('create index on tt (id)');
    hypopg_create_index
----------------------------
 (13805,<13805>btree_tt_id)
(1 row)

rjuju=# prepare s (int) as select * from tt where id = $1;
PREPARE

rjuju=# explain execute s(1);
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using "<13805>btree_tt_id" on tt  (cost=0.04..8.06 rows=1 width=14)
   Index Cond: (id = $1)
(2 rows)

rjuju=# execute s(1);
ERROR:  XX000: could not open relation with OID 13805

The plan_cache_mode isn't even required, it's simply to avoid the usual 5 plannings before switching to the generic plan.

So with a new feature to "hide" indexes you could hit the exact same problem, whether it's been done globally or not.

It could probably be leveraged by sending a cache invalidation for the underlying relation which would force the generation of a new plan in all backends at some point, but there's no guarantee that the invalidation will be processed immediately and it would only be possible if using some SQL wrapper for C functions like suggested above. It means that the only way to have an index globally hidden would be to have that knowledge in the function itself, e.g. hypopg_hide_index(oid indexid, bool hide_globally = FALSE) which then requires hypopg to be in shared_preload_libraries, which seems quite troublesome, and isn't something that I would recommend anyway.

@nutVI
Copy link
Contributor

nutVI commented Feb 28, 2023

I've been focusing on this recently and I learn hypopg code and realize 'invalid exist index' in explain. I think hypopg's target is adding support for hypothetical indexes, so hypothetically hiding exist index is necessary too. I hope that I can contribute to this issue in the code.

Regarding global hidding, hypopg was originally designed fors user to adjust indexes to monitor changes in explain query cost, which is typically done within a single session or backend. Therefore, it may not be necessary to consider global hiding. The new function simply needs to be consistent with the existing ones.

As @mbanck pointed out, if the user really wants to globally hide an index for executing queries, PostgreSql provides the option to use update pg_catalog.pg_index set indisvalid = false where indexrelid = $1. This will hide the index everywhere, and it will still be updated when the table is updated.

@rjuju
Copy link
Member

rjuju commented Feb 28, 2023

Regarding global hidding, hypopg was originally designed fors user to adjust indexes to monitor changes in explain query cost, which is typically done within a single session or backend. Therefore, it may not be necessary to consider global hiding. The new function simply needs to be consistent with the existing ones.

I was looking a bit at the archive, and the main consensus is that hiding an index should be done in a single backend only, not globally, so I think the previous function I mentioned should work quite well.

I hope that I can contribute to this issue in the code.

Great news! If you plan to submit a PR for I will wait for you. Let me know if you face any issue.

As @mbanck pointed out, if the user really wants to globally hide an index for executing queries, PostgreSql provides the option to use update pg_catalog.pg_index set indisvalid = false where indexrelid = $1. This will hide the index everywhere, and it will still be updated when the table is updated.

It's not really a great solution. You need to be superuser to do so, doing a plain UPDATE on the catalog is really a bad idea in general, and in this case the problem is that it won't invalidate the plancache in other backends, so you have no guarantee that the index is indeed invisible globally. To do so you would need to then do some kind of ALTER TABLE to make sure that a invalidation is sent.

@tpetry
Copy link

tpetry commented Mar 2, 2023

Having the option to hide indexes global would increase the daily usage of hypopg a lot. With MySQL you can make indexes invisible which is used a lot as described before. I see no one dropping an index before making it invisible for some time on big tables. As this is not natively supported on PG (except changing the isvalid flag manually) that would be an awesome addition to hypopg.

Best make it per-session by default and an option to make it global too.

@bradnicholson
Copy link

I’d add a big plus one to this being added if possible. Especially now that it is on RDS - it would be a very useful feature

@rjuju
Copy link
Member

rjuju commented May 27, 2023

For the record the feature has already been added (#78) and I just released version 1.4.0 that contains it. Thanks a lot to everyone!

@rjuju rjuju closed this as completed May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants