Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility for diagnose pg locked - Step 1 #15468

Merged
merged 7 commits into from Jul 20, 2017
Merged

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Jun 28, 2017

This step is to generate yaml file from pg_stat_activity.
Story: https://www.pivotaltracker.com/n/projects/1608513/stories/147871949

\cc @gtanzillo @jrafanie @yrudman

@miq-bot add-label wip, tools

@miq-bot miq-bot added the wip label Jun 28, 2017
@ailisp
Copy link
Member Author

ailisp commented Jun 28, 2017

Current status: can output yaml for local vmdb_production. The version to accept command line options and use that to connect remote dbs is work in progress. However some questions need to be clarify:

screenshot from 2017-06-28 15-14-29

  1. Some application name is duplicated. They differ, but if we only look at application name they are the same. Do we need to keep only one of them or add more thing to application name to make a difference. For example the 15th and 16th row.

  2. The pid column is different from pid in application_name. Is this correct?

  3. Shall we ask db password from command line options or ask user interactively. What is the purpose of this step? To be use as a function or a standalone script.

@ailisp
Copy link
Member Author

ailisp commented Jun 28, 2017

Now support command line arguments. -p for password, -u for user, -s for server, --port for port, and a last filename.yml for output.

end

begin
res = conn.exec_params('SELECT application_name FROM pg_stat_activity ORDER BY application_name;')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailisp we probably want to get a list of columns in pg_stat_activity and include them along with the application_name. We definitely need the pg backend pid/spid (i think that's pid), client_addr, client_hostname, backend_start, xact_start, query_start, waiting, and maybe query?... cc @gtanzillo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is this YAML file can be processed in the later steps so we can try to correlate behavior with the cause of bad behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrafanie what is backend_xid means? I saw it's all empty. And client_addr/client_hostname are ::1 or empty. So we should get these from application_name and miq_server table?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the postgresql transaction id. I'd think the client_addr and hostname should be populated if you're not connecting locally. Let's start with application_name and a few of those other columns from pg_stat_activity. Don't look at miq_server table yet. We'll have that in another step. This step needs to hit only the pg_stat_activity table so we avoid locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Thank you.

@ailisp
Copy link
Member Author

ailisp commented Jun 29, 2017

Why we need to truncate application_name?

  def database_application_name
    zone = MiqServer.my_server.zone
    "MIQ #{Process.pid} #{minimal_class_name}[#{compressed_id}], s[#{miq_server.compressed_id}], #{zone.name}[#{zone.compressed_id}]".truncate(64)
  end

There seems no restriction on this length
https://doxygen.postgresql.org/pgstat_8h_source.html#l00955

@jrafanie
Copy link
Member

Why we need to truncate application_name?

def database_application_name
zone = MiqServer.my_server.zone
"MIQ #{Process.pid} #{minimal_class_name}[#{compressed_id}], s[#{miq_server.compressed_id}], #{zone.name}[#{zone.compressed_id}]".truncate(64)
end
There seems no restriction on this length
https://doxygen.postgresql.org/pgstat_8h_source.html#l00955

I believe so, the default installation is 64 characters:

https://www.postgresql.org/docs/9.6/static/runtime-config-logging.html#GUC-APPLICATION-NAME

@ailisp
Copy link
Member Author

ailisp commented Jun 29, 2017

Oh. right. Currently I raise an error if it's a truncated application name (end with ..)

options[:user] = user
end

opts.on("-p", "--password [STRING]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later - we may need to provide the password in a more secure way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This can be done in a more secure way since we only ask custom to run this script once dead lock happens. No need to run it periodically, so we can ask user to input password interactively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For step 2. similar issue. Can we suppose it always run in the same rails console that ManageIQ application runs in. This will require it run in rails runner as a rake task, but doesn't need password any more.

@ailisp
Copy link
Member Author

ailisp commented Jul 11, 2017

A sample output (with -a option, output all stat activity including non MIQ ones

---
- datid: '16386'
  datname: vmdb_production
  pid: '8729'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8583|1|-|1|Server|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39478'
  backend_start: '2017-07-11 10:45:29.793848-04'
  xact_start: 
  query_start: '2017-07-11 17:07:12.789297-04'
  state_change: '2017-07-11 17:07:12.789323-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: |-
    SELECT  id, lock_version, priority, role FROM "miq_queue" WHERE (    state = 'ready'
        AND (zone IS NULL OR zone = 'default')
        AND queue_name = 'reporting'
        AND (role IS NULL OR role IN ('automate','database_operations','database_owner','ems_inventory','ems_operations','event','reporting','scheduler','smartstate','user_interface','web_services','websocket'))
        AND (server_guid IS NULL OR server_guid = 'f6860425-5d14-40e6-b564-2d3c2638502d')
        AND (deliver_on IS NULL OR deliver_on <= '2017-07-11 21:07:12.788748')
        AND (priority <= 200)
    ) ORDER BY "miq_queue"."priority" ASC, "miq_queue"."id" ASC LIMIT $1
- datid: '16386'
  datname: vmdb_production
  pid: '8840'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8794|1|98|1|Generic|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39492'
  backend_start: '2017-07-11 10:45:35.810419-04'
  xact_start: 
  query_start: '2017-07-11 17:07:17.004598-04'
  state_change: '2017-07-11 17:07:17.004635-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8839'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8803|1|99|1|Generic|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39490'
  backend_start: '2017-07-11 10:45:35.779736-04'
  xact_start: 
  query_start: '2017-07-11 17:07:15.347931-04'
  state_change: '2017-07-11 17:07:15.34796-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8845'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8812|1|100|1|Priority|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39498'
  backend_start: '2017-07-11 10:45:35.89027-04'
  xact_start: 
  query_start: '2017-07-11 17:07:16.897365-04'
  state_change: '2017-07-11 17:07:16.897401-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8848'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8820|1|101|1|Priority|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39502'
  backend_start: '2017-07-11 10:45:35.949888-04'
  xact_start: 
  query_start: '2017-07-11 17:07:16.90967-04'
  state_change: '2017-07-11 17:07:16.909702-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8851'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8830|1|102|1|Schedule|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39506'
  backend_start: '2017-07-11 10:45:36.002046-04'
  xact_start: 
  query_start: '2017-07-11 17:07:12.770828-04'
  state_change: '2017-07-11 17:07:12.770843-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT "miq_schedules".* FROM "miq_schedules" WHERE (updated_at > '2017-07-11
    21:06:57.751393')
- datid: '16386'
  datname: vmdb_production
  pid: '8881'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8854|1|103|1|EventHandler|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39516'
  backend_start: '2017-07-11 10:45:46.788029-04'
  xact_start: 
  query_start: '2017-07-11 17:07:15.339799-04'
  state_change: '2017-07-11 17:07:15.339833-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8888'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8863|1|104|1|Reporting|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39520'
  backend_start: '2017-07-11 10:45:46.913674-04'
  xact_start: 
  query_start: '2017-07-11 17:07:15.547073-04'
  state_change: '2017-07-11 17:07:15.547108-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '8889'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8871|1|105|1|Reporting|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39522'
  backend_start: '2017-07-11 10:45:46.919921-04'
  xact_start: 
  query_start: '2017-07-11 17:07:15.698499-04'
  state_change: '2017-07-11 17:07:15.698534-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT pg_backend_pid()
- datid: '16386'
  datname: vmdb_production
  pid: '9043'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8882|1|106|1|Ui|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39556'
  backend_start: '2017-07-11 10:47:19.242074-04'
  xact_start: 
  query_start: '2017-07-11 16:47:34.075548-04'
  state_change: '2017-07-11 16:47:34.075558-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: COMMIT
- datid: '16386'
  datname: vmdb_production
  pid: '8916'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8882|1|106|1|Ui|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39534'
  backend_start: '2017-07-11 10:45:47.315264-04'
  xact_start: 
  query_start: '2017-07-11 10:45:47.340997-04'
  state_change: '2017-07-11 10:45:47.341019-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT  "miq_databases".* FROM "miq_databases" ORDER BY "miq_databases"."id"
    ASC LIMIT $1
- datid: '16386'
  datname: vmdb_production
  pid: '8927'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8882|1|106|1|Ui|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39536'
  backend_start: '2017-07-11 10:45:47.353158-04'
  xact_start: 
  query_start: '2017-07-11 16:17:21.23533-04'
  state_change: '2017-07-11 16:17:21.23534-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: COMMIT
- datid: '16386'
  datname: vmdb_production
  pid: '8931'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8898|1|107|1|WebService|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39542'
  backend_start: '2017-07-11 10:45:47.524192-04'
  xact_start: 
  query_start: '2017-07-11 16:47:33.77212-04'
  state_change: '2017-07-11 16:47:33.772131-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: COMMIT
- datid: '16386'
  datname: vmdb_production
  pid: '8929'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8898|1|107|1|WebService|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39540'
  backend_start: '2017-07-11 10:45:47.48725-04'
  xact_start: 
  query_start: '2017-07-11 10:45:47.508673-04'
  state_change: '2017-07-11 10:45:47.508697-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT  "miq_databases".* FROM "miq_databases" ORDER BY "miq_databases"."id"
    ASC LIMIT $1
- datid: '16386'
  datname: vmdb_production
  pid: '8942'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8907|1|108|1|Websocket|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39546'
  backend_start: '2017-07-11 10:45:47.565191-04'
  xact_start: 
  query_start: '2017-07-11 16:47:33.976493-04'
  state_change: '2017-07-11 16:47:33.976503-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: COMMIT
- datid: '16386'
  datname: vmdb_production
  pid: '8944'
  usesysid: '16384'
  usename: root
  application_name: MIQ|8907|1|108|1|Websocket|default
  client_addr: "::1"
  client_hostname: 
  client_port: '39548'
  backend_start: '2017-07-11 10:45:47.585363-04'
  xact_start: 
  query_start: '2017-07-11 10:45:47.734921-04'
  state_change: '2017-07-11 10:45:47.734943-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT  "miq_databases".* FROM "miq_databases" ORDER BY "miq_databases"."id"
    ASC LIMIT $1
- datid: '13295'
  datname: postgres
  pid: '6897'
  usesysid: '16384'
  usename: root
  application_name: pgAdmin III - Browser
  client_addr: 192.168.122.1
  client_hostname: 
  client_port: '45326'
  backend_start: '2017-07-11 10:18:57.285196-04'
  xact_start: 
  query_start: '2017-07-11 10:18:57.308334-04'
  state_change: '2017-07-11 10:18:57.308526-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT count(*) FROM pg_attribute WHERE attrelid = 'pg_catalog.pg_proc'::regclass
    AND attname = 'proargdefaults'
- datid: '16386'
  datname: vmdb_production
  pid: '7843'
  usesysid: '16384'
  usename: root
  application_name: pgAdmin III - Browser
  client_addr: 192.168.122.1
  client_hostname: 
  client_port: '45336'
  backend_start: '2017-07-11 10:20:47.666429-04'
  xact_start: 
  query_start: '2017-07-11 15:57:31.79783-04'
  state_change: '2017-07-11 15:57:31.798359-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: "SELECT t.oid, t.xmin, t.*, relname, CASE WHEN relkind = 'r' THEN TRUE ELSE
    FALSE END AS parentistable,   nspname, des.description, l.lanname, p.prosrc, \n
    \ COALESCE(substring(pg_get_triggerdef(t.oid), 'WHEN (.*) EXECUTE PROCEDURE'),
    substring(pg_get_triggerdef(t.oid), 'WHEN (.*)  \\$trigger')) AS whenclause\n
    \ FROM pg_trigger t\n  JOIN pg_class cl ON cl.oid=tgrelid\n  JOIN pg_namespace
    na ON na.oid=relnamespace\n  LEFT OUTER JOIN pg_description des ON (des.objoid=t.oid
    AND des.classoid='pg_trigger'::regclass)\n  LEFT OUTER JOIN pg_proc p ON p.oid=t.tgfoid\n
    \ LEFT OUTER JOIN pg_language l ON l.oid=p.prolang\n WHERE NOT tgisinternal\n
    \ AND tgrelid = 18132::oid\n ORDER BY tgname"
- datid: '16386'
  datname: vmdb_production
  pid: '8733'
  usesysid: '16384'
  usename: root
  application_name: pgAdmin III - Edit Grid
  client_addr: 192.168.122.1
  client_hostname: 
  client_port: '45418'
  backend_start: '2017-07-11 10:45:30.256344-04'
  xact_start: 
  query_start: '2017-07-11 10:45:37.408198-04'
  state_change: '2017-07-11 10:45:37.408247-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT format_type(oid,NULL) as typname FROM pg_type WHERE oid = 25
- datid: '16386'
  datname: vmdb_production
  pid: '15111'
  usesysid: '16384'
  usename: root
  application_name: pgAdmin III - Edit Grid
  client_addr: 192.168.122.1
  client_hostname: 
  client_port: '50070'
  backend_start: '2017-07-11 15:57:43.299728-04'
  xact_start: 
  query_start: '2017-07-11 15:57:43.337474-04'
  state_change: '2017-07-11 15:57:43.337526-04'
  waiting: f
  state: idle
  backend_xid: 
  backend_xmin: 
  query: SELECT format_type(oid,NULL) as typname FROM pg_type WHERE oid = 16
- datid: '16386'
  datname: vmdb_production
  pid: '16420'
  usesysid: '16384'
  usename: root
  application_name: "./step1.rb"
  client_addr: 192.168.122.1
  client_hostname: 
  client_port: '51042'
  backend_start: '2017-07-11 17:07:17.39739-04'
  xact_start: '2017-07-11 17:07:17.399171-04'
  query_start: '2017-07-11 17:07:17.399171-04'
  state_change: '2017-07-11 17:07:17.399172-04'
  waiting: f
  state: active
  backend_xid: 
  backend_xmin: '118330'
  query: |
    SELECT *
    FROM pg_stat_activity
    ORDER BY application_name;

@@ -0,0 +1,159 @@
#!/usr/bin/env ruby
Copy link
Member

@gtanzillo gtanzillo Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets structure the directory and files as -

tools
   |- pg_inspector.rb
   |- pg_inspector
         |- cli.rb
         |- active_connections_to_yaml.rb (step 1)
         |- servers_to_yaml.rb (step 2)
         |- active_connections_to_human.rb (step 3)

@ailisp ailisp force-pushed the lock-inspect branch 2 times, most recently from 8e8f46c to 550d4e0 Compare July 18, 2017 18:34
@ailisp
Copy link
Member Author

ailisp commented Jul 19, 2017

@yrudman @jrafanie @gtanzillo
Please review the new organization. I think it's better to get this merged before I move step 2 to this file organization. Thanks.

@ailisp
Copy link
Member Author

ailisp commented Jul 19, 2017

@miq-bot remove-label wip

@miq-bot miq-bot changed the title [WIP] Utility for diagnose pg locked - Step 1 Utility for diagnose pg locked - Step 1 Jul 19, 2017
@miq-bot miq-bot removed the wip label Jul 19, 2017
Copy link
Member

@gtanzillo gtanzillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. @jrafanie, @yrudman do you think we can merge at this point?


def connect_pg_server
conn_options = {
:dbname => 'vmdb_production',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to user postgres here instead of vmdb_production because that DB should always exist. I ran it locally and it failed because I didn't have that DB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will set dbname default to postgres and provide an command line option to set it.

@miq-bot
Copy link
Member

miq-bot commented Jul 20, 2017

Checked commits ailisp/manageiq@b660911~...0f439cb with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
6 files checked, 0 offenses detected
Everything looks fine. ⭐

@jrafanie jrafanie merged commit b04065e into ManageIQ:master Jul 20, 2017
@chessbyte chessbyte added this to the Sprint 65 Ending Jul 24, 2017 milestone Aug 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants