Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module postgresql_vacuum - Garbage-collect and optionally analyze a PostgreSQL database #57452

Open
wants to merge 18 commits into
base: devel
from

Conversation

Projects
None yet
5 participants
@Andersson007
Copy link
Contributor

commented Jun 6, 2019

SUMMARY

New module postgresql_vacuum - Garbage-collect and optionally analyze a PostgreSQL database.

Specific options:

  • db - Name of a database for vacuum (used as a database to connect to)
  • table - Name of a table that needs to be vacuumed
  • columns - List of column names of I(table) that need to be vacuumed
  • analyze - Updates statistics used by the planner to determine the most efficient way to execute a query.
  • analyze_only - Only update optimizer statistics, no vacuum
  • full - documentation
  • disable_page_skipping - documentation

Note:
I came across with strange behavior on Ubuntu 1604 when it doesn't allow to create ssl certs by user postgres into his home directory but there is no code changes in this PR that's related with ssl.
To avoid excluding Ubuntu 1604 from SSL tests, I fixed ssl.yml too.

ISSUE TYPE
  • New Module Pull Request
EXAMPLES
- name: Vacuum database acme
  postgresql_vacuum:
    db: acme

- name: Vacuum and analyze database acme
  postgresql_vacuum:
    db: acme
    analyze: yes

- name: Vacuum full and analyze database foo. Pay attention that the database will be locked
  postgresql_vacuum:
    db: foo
    full: yes
    analyze: yes

- name: Analyze table mytable in database bar
  postgresql_vacuum:
    db: bar
    table: mytable
    analyze_only: yes

- name: Vacuum and analyze columns col1, col2, and col3 of mytable in database acme
  postgresql_vacuum:
    db: acme
    table: mytable
    columns:
    - col1
    - col2
    - col3
    analyze: yes

- name: Vacuum of mytable with freeze
  postgresql_vacuum:
    db: acme
    table: mytable
    freeze: yes

- name: Vacuum of mytable with disabled page skipping
  postgresql_vacuum:
    db: acme
    table: mytable
    disable_page_skipping: yes
RETURN
queries:
  description: List of executed queries.
  returned: always
  type: str
  sample: [ "VACUUM (ANALYZE) mytable (col1, col2, col3)" ]

@Andersson007 Andersson007 force-pushed the Andersson007:postgresql_vacuum branch from 98b62a3 to b27b50f Jun 6, 2019

@Andersson007 Andersson007 reopened this Jun 6, 2019

@Andersson007 Andersson007 force-pushed the Andersson007:postgresql_vacuum branch 3 times, most recently from 62b9a3c to db3b939 Jun 6, 2019

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

I came across with strange behavior on Ubuntu 1604 when it doesn't allow to create ssl certs by user postgres into his home directory but there is no code changes in this PR that's related with ssl.
To avoid excluding Ubuntu 1604 from SSL tests, I fixed ssl.yml too

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

ready_for_review

@ansibot

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

@Dorn- @andytom @antoinell @archf @b6d @dschep @jbscalia @jensdepuydt @jscalia @kostiantyn-nemchenko @kustodian @matburt @nerzhul @sebasmannem @tcraxs @wrouesnel

As a maintainer of a module in the same namespace this new module has been submitted to, your vote counts for shipits. Please review this module and add shipit if you would like to see it merged.

click here for bot help

@amenonsen

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

What is the point of this module? Why not just run vacuum as an SQL command? There seems to be no sensible idempotent behaviour possible in this case, nor does the module return any useful information.

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@amenonsen , of course, we can run vacuum and any SQL commands by shell module instead of using postgresql_db, postgresql_user, etc. that were written by coreteam (similarly we can install packages by shell module instead of package/yum/apt modules, etc).
The module does useful things - vacuum, analyze or vacuum full - these are useful operations

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@amenonsen , if you about unstable integration tests, I don't know why postgres sometimes does not update analyze_count or vacuum_count after vacuum or analyze, the same tests but local are always well. I'll think about changing tests.

@amenonsen

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

@amenonsen , of course, we can run vacuum and any SQL commands by shell module instead of using postgresql_db, postgresql_user, etc. that were written by coreteam (similarly we can install packages by shell module instead of package/yum/apt modules, etc).

postgresql_user can do something useful as a module: it can not create the user if it already exists, or it can update the user's properties if the role exists but is different from what is in the playbook. Same with postgresql_db. package/apt/yum can do nothing if the package is already installed, or it can update the package to the latest (or specified) version.

postgresql_vacuum cannot do anything like that. So what's the point?

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@amenonsen the point is doing vacuum/analyze/vacuum full remotely by ansible specific module

- postgresql_copy:
    db: mydb
    from: '/tmp/data.csv'
    dst: mytable
    columns:
    - id
    - name

- name: do vacuum analyze
  postgresql_vacuum:
    db: mydb
    table: mytable
    analyze: yes
    columns:
    - id
    - name

at least the example above looks more elegant and in ansible style than:

- name: do vacuum analyze
  shell: 'psql mydb -c "VACUUM ANALYZE mytable (id, name);"'

For example, 1. I uploaded the data by postgresql_copy module to a table then 2. I'd like to use a specific module for analyze, not shell like above

(Moreover, users don't need to know SQL at all to do it)

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@kostiantyn-nemchenko @sebasmannem @andytom @kustodian and others, needs your opinion about the module.
If most of us say that it is useless, I'll close the PR

@Andersson007 Andersson007 reopened this Jun 13, 2019

@Andersson007 Andersson007 force-pushed the Andersson007:postgresql_vacuum branch from 2e4b7a0 to a0ba97e Jun 13, 2019

Andersson007 added some commits Jun 13, 2019

@sebasmannem

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2019

I would say Shipit

@amenonsen , of course, we can run vacuum and any SQL commands by shell module instead of using postgresql_db, postgresql_user, etc. that were written by coreteam (similarly we can install packages by shell module instead of package/yum/apt modules, etc). ...

@amenonsen fwiw: i see your point, and I agrre that all modules should be idempotent. But I dont think that idempotency is the only reason for writing a module. The generic sql module certainly has its benefits, but I do agree with @Andersson007 that this module makes running a vacuum far more readable, and less error prone then running queries with postgresql_sql. We could make it more idempotent, say only run if it has not already run last n seconds, where n us configurable and defaults to 60 seconds. That would save some vacuumes, when run multiple times. Does that help?

Sent with GitHawk

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

ready_for_review

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

@sebasmannem thank you for the opinion!

@amenonsen

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

We could make it more idempotent, say only run if it has not already run last n seconds, where n us configurable and defaults to 60 seconds. That would save some vacuumes, when run multiple times. Does that help?

That would convert it from being a minor annoyance into a serious one.

A Postgres DBA is unlikely to use postgresql_vacuum from a playbook to perform routine vacuuming of the database (the way we used to have to put vacuumdb commands in cron). More likely the use of such a module would be, as in @Andersson007's example, as part of some custom data loading job. But now with this timeout functionality if you zapped your data and restarted the custom load, it wouldn't re-vacuum that table. So the timeout would do exactly the opposite thing you would expect, and you'd always have to turn it off to achieve sensible behaviour in the only halfway plausible use-case we've seen so far.

So in the end this module would be taking a simple one-line SQL command, adding on a bunch of complexity to kinda-sorta simulate idempotence, creating an unnecessary maintenance burden by duplicating parts of the postgresql documentation (without attribution, but that's a separate matter) and hard-wiring the available vacuum options… for the sake of "more elegant and in ansible style".

And what comes next: postgresql_reindex, postgresql_truncate, postgresql_cluster?

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

@amenonsen Despite the sarcasm in the last sentence of the comment, which looks pretty abrupt to me, regarding that I said before, I don't insist to merge this module - it looks useful to me but if other guys say that it's useless, of course, I'll close the PR. Thank you

@amenonsen

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

@amenonsen Despite the sarcasm in the last sentence of the comment, which looks pretty abrupt to me, regarding that I said before, I don't insist to merge this module - it looks useful to me but if other guys say that it's useless, of course, I'll close the PR. Thank you

@Andersson007 It was not meant to be sarcasm. I chose the examples carefully: REINDEX, CLUSTER, and TRUNCATE are all single-statement utility commands just like VACUUM, and all of them are often used as part of custom ETL processes, and so by analogy they should also make sense to implement as modules. But somehow it doesn't sound justified to me.

From your response, I can see that my earlier criticism of the idea seemed too personal, and therefore I apologise for hurting your feelings. I should have also made an effort to acknowledge and appreciate all of your recent work in this area. Thank you for that; and I hope you do not find my opinion discouraging, because that was not my intention.

@Andersson007

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

@amenonsen thank you for clarifying! ok, nps :)
I asked the community and we should wait their opinions.
If I close it now, it would be impolite.
Seems @kostiantyn-nemchenko and @sebasmannem aren't against this module but if they change their mind - no problem ;)

(btw: truncate is an option of postgresql_table module; we might add reindex option to postgresql_idx preferably after postgres 12 is released when it is possible in concurrent mode and add cluster option to postgresql_table if anybody wants to see it implemented;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.