Skip to content

Latest commit

 

History

History
230 lines (152 loc) · 16.2 KB

20161123_01_pdf_002.md

File metadata and controls

230 lines (152 loc) · 16.2 KB

报告时间段: 2016-11-24 10:57:04.081798 ~ 2016-11-24 11:10:13.201455

一、数据库定制参数信息

1. 用户或数据库级别定制参数

database role snap_ts setconfig

建议

定制参数需要关注, 优先级高于数据库的启动参数和配置文件中的参数, 特别是排错时需要关注.

二、数据库空间使用分析

1. 表空间使用情况

tablespace tbs_location snap_ts size
pg_default - 2016-11-24 11:08:50.448082+08 108 GB
pg_default - 2016-11-24 11:10:13.201455+08 108 GB
pg_global - 2016-11-24 11:08:50.448082+08 537 kB
pg_global - 2016-11-24 11:10:13.201455+08 537 kB

建议

注意检查表空间所在文件系统的剩余空间, (默认表空间在$PGDATA/base目录下), IOPS分配是否均匀, OS的sysstat包可以观察IO使用率.

2. 数据库使用情况

database snap_ts size
db0 2016-11-24 11:08:50.448082+08 69 MB
db0 2016-11-24 11:10:13.201455+08 69 MB
postgres 2016-11-24 11:08:50.448082+08 108 GB
postgres 2016-11-24 11:10:13.201455+08 108 GB
template0 2016-11-24 11:08:50.448082+08 7225 kB
template0 2016-11-24 11:10:13.201455+08 7225 kB
template1 2016-11-24 11:08:50.448082+08 7225 kB
template1 2016-11-24 11:10:13.201455+08 7225 kB

建议

注意检查数据库的大小, 是否需要清理历史数据.

三、数据库连接分析

1. 活跃度

state snap_ts connections
active 2016-11-24 11:08:50.448082+08 1
active 2016-11-24 11:10:13.201455+08 16
idle 2016-11-24 11:10:13.201455+08 2
idle in transaction 2016-11-24 11:10:13.201455+08 16

建议

如果active状态很多, 说明数据库比较繁忙. 如果idle in transaction很多, 说明业务逻辑设计可能有问题. 如果idle很多, 可能使用了连接池, 并且可能没有自动回收连接到连接池的最小连接数.

2. 剩余连接数

snap_ts max_enabled_connections used res_for_super res_for_normal
2016-11-24 11:08:50.448082+08 100 1 3 96
2016-11-24 11:10:13.201455+08 100 34 3 63

建议

给超级用户和普通用户设置足够的连接, 以免不能登录数据库.

3. 用户连接数限制

rolename snap_ts conn_limit connects
postgres 2016-11-24 11:08:50.448082+08 -1 1
postgres 2016-11-24 11:10:13.201455+08 -1 34

建议

给用户设置足够的连接数, alter role ... CONNECTION LIMIT .

4. 数据库连接限制

database snap_ts conn_limit connects
postgres 2016-11-24 11:08:50.448082+08 -1 1
postgres 2016-11-24 11:10:13.201455+08 -1 34

建议

给数据库设置足够的连接数, alter database ... CONNECTION LIMIT .

四、数据库性能分析

1. TOP 10 SQL : total_cpu_time

rolename database calls total_ms min_ms max_ms mean_ms stddev_ms rows shared_blks_hit shared_blks_read shared_blks_dirtied shared_blks_written local_blks_hit local_blks_read local_blks_dirtied shared_blks_written temp_blks_read temp_blks_written blk_read_time blk_write_time query
postgres postgres 3257431 801212.218000147 0.0155 661.5295 0.228815984402593 2.67856748675768 3257431 126748340 2100 12197 0 0 0 0 0 0 0 0 0 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
postgres postgres 3257443 398825.853000225 0.023 648.646 0.124109951147548 4.6639211699234 3257443 17604282 2921 2476560 0 0 0 0 0 0 0 0 0 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;
postgres postgres 3257439 357644.714999923 0.0145 653.85 0.102480527524248 1.36301121326012 3257439 114801162 905 27190 0 0 0 0 0 0 0 0 0 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;
postgres postgres 3257425 76471.4549991722 0.006 462.702 0.0232362901295874 1.77897215668638 3257425 3365375 20746 23380 0 0 0 0 0 0 0 0 0 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);
postgres postgres 3257439 55327.0259991886 0.009 18.772 0.0170636195451918 0.040562028155207 3257439 13227016 0 0 0 0 0 0 0 0 0 0 0 SELECT abalance FROM pgbench_accounts WHERE aid = $1;
postgres postgres 1 19017.764 19017.764 19017.764 19017.764 0 1 2631936 2100 2130 0 0 0 0 0 14163 14081 0 0 select snap_database();
postgres postgres 1 7511.361 7511.361 7511.361 7511.361 0 0 816723 3 6 0 0 0 0 0 225 211 0 0 create table IF NOT EXISTS snap_pg_rel_space_bucket as select 1::int8 snap_id, now() snap_ts, current_database(), buk this_buk_no, cnt rels_in_this_buk, pg_size_pretty(min) buk_min, pg_size_pretty(max) buk_max from ( select row_number() over (partition by buk order by tsize), tsize, buk, min(tsize) over (partition by buk),max(tsize) over (partition by buk), count(*) over (partition by buk) cnt from ( select pg_relation_size(a.oid) tsize, width_bucket(pg_relation_size(a.oid),tmin-1,tmax+1,10) buk from ( select min(pg_relation_size(a.oid)) tmin, max(pg_relation_size(a.oid)) tmax from pg_class a, pg_namespace c where a.relnamespace=c.oid and nspname !~ $$^pg_$$ and nspname<>$$information_schema$$ ) t, pg_class a, pg_namespace c where a.relnamespace=c.oid and nspname !~ $$^pg_$$ and nspname<>$$information_schema$$ ) t ) t where row_number=1;
postgres postgres 1 3931.596 3931.596 3931.596 3931.596 0 0 909503 1 1 0 0 0 0 0 0 0 0 0 create table IF NOT EXISTS snap_pg_hash_idx as select 1::int8 snap_id, now() snap_ts, current_database(),pg_get_indexdef(oid) from pg_class where relkind=$$i$$ and pg_get_indexdef(oid) ~ $$USING hash$$;
postgres postgres 3 2363.977 392.25 1006.846 787.992333333333 280.356036853538 1043 27870 0 0 0 0 0 0 0 19604 19604 0 0 select snap_report_database(?,?);
postgres postgres 1 2158.239 2158.239 2158.239 2158.239 0 0 223555 3 4 0 0 0 0 0 4682 4666 0 0 create table IF NOT EXISTS snap_pg_table_bloat as select 1::int8 snap_id, now() snap_ts, current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta, ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages, CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint &#124;&#124; $$ bytes$$ END AS wastedsize, iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta, ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes, CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint &#124;&#124; $$ bytes$$ END AS wastedisize, CASE WHEN relpages < otta THEN CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint) ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END END AS totalwastedbytes FROM ( SELECT nn.nspname AS schemaname, cc.relname AS tablename, COALESCE(cc.reltuples,0) AS reltuples, COALESCE(cc.relpages,0) AS relpages, COALESCE(bs,0) AS bs, COALESCE(CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta, COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta FROM pg_class cc JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$ LEFT JOIN ( SELECT ma,bs,foo.nspname,foo.relname, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT ns.nspname, tbl.relname, hdr, ma, bs, SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth, MAX(coalesce(null_frac,0)) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname ) AS nullhdr FROM pg_attribute att JOIN pg_class tbl ON att.attrelid = tbl.oid JOIN pg_namespace ns ON ns.oid = tbl.relnamespace LEFT JOIN pg_stats s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname, ( SELECT (SELECT current_setting($$block_size$$)::numeric) AS bs, CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$) IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants WHERE att.attnum > 0 AND tbl.relkind=$$r$$ GROUP BY 1,2,3,4,5 ) AS foo ) AS rs ON cc.relname = rs.relname AND nn.nspname = rs.nspname LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml order by wastedbytes desc limit 10;

建议

检查SQL是否有优化空间, 配合auto_explain插件在csvlog中观察LONG SQL的执行计划是否正确.

2. 数据库统计信息, 回滚比例, 命中比例, 数据块读写时间, 死锁, 复制冲突

database snap_ts rollback_ratio hit_ratio blk_read_time blk_write_time conflicts deadlocks
db0 2016-11-24 11:08:50.448082+08 0.00 % 0.00 % 0 0 0 0
db0 2016-11-24 11:10:13.201455+08 0.00 % 0.00 % 0 0 0 0
postgres 2016-11-24 11:08:50.448082+08 0.00 % 99.94 % 0 0 0 0
postgres 2016-11-24 11:10:13.201455+08 0.00 % 99.98 % 0 0 0 0
template0 2016-11-24 11:08:50.448082+08 0.00 % 0.00 % 0 0 0 0
template0 2016-11-24 11:10:13.201455+08 0.00 % 0.00 % 0 0 0 0
template1 2016-11-24 11:08:50.448082+08 0.00 % 0.00 % 0 0 0 0
template1 2016-11-24 11:10:13.201455+08 0.00 % 0.00 % 0 0 0 0

建议

回滚比例大说明业务逻辑可能有问题, 命中率小说明shared_buffer要加大, 数据块读写时间长说明块设备的IO性能要提升, 死锁次数多说明业务逻辑有问题, 复制冲突次数多说明备库可能在跑LONG SQL.

3. 检查点, bgwriter 统计信息

checkpoints_timed checkpoints_req checkpoint_write_time checkpoint_sync_time buffers_checkpoint buffers_clean maxwritten_clean buffers_backend buffers_backend_fsync buffers_alloc
1 38 152252 1949 2510995 0 0 33459 0 30984

说明

checkpoints_timed , 统计周期内, 发生了多少次调度检查点.

checkpoints_req , 统计周期内, 发生了多少次人为执行检查点.

checkpoint_write_time , 检查点过程中, write系统调用的耗时ms.

checkpoint_sync_time , 检查点过程中, fsync系统调用的耗时ms.

buffers_checkpoint , 检查点过程中, ckpt进程写出(write)了多少buffer pages.

buffers_clean , 统计周期内, bgwriter进程写出(write)了多少buffer pages.

maxwritten_clean , 统计周期内, bgwriter被打断了多少次(由于write的pages超过一个bgwriter调度周期内的阈值).

buffers_backend , 统计周期内, 有多少pages是被backend process直接write out的.

buffers_backend_fsync , 统计周期内, 有多少pages是被backend process直接fsync的.

buffers_alloc , 统计周期内, 指派了多少个pages.

建议

checkpoint_write_time多说明检查点持续时间长, 检查点过程中产生了较多的脏页.

checkpoint_sync_time代表检查点开始时的shared buffer中的脏页被同步到磁盘的时间, 如果时间过长, 并且数据库在检查点时性能较差, 考虑一下提升块设备的IOPS能力.

buffers_backend_fsync太多说明需要加大shared buffer 或者 减小bgwriter_delay参数.

maxwritten_clean太多说明需要减小调大bgwriter_lru_maxpages和bgwriter_lru_multiplier参数.

4. 归档统计信息

archived_count last_archived_wal last_archived_time failed_count last_failed_wal last_failed_time now_insert_xlog_file
0 - 1970-01-01 00:00:00+08 728 00000001000000820000002D 2016-11-24 11:10:13.476726+08 000000010000008700000003

建议

last_archived_wal和now_insert_xlog_file相差很多, 说明失败的归档很多.

五、数据库年龄分析

1. 数据库年龄

database snap_ts age age_remain
db0 2016-11-24 11:08:50.448082+08 67739317 2079744331
db0 2016-11-24 11:10:13.201455+08 68949292 2078534356
postgres 2016-11-24 11:08:50.448082+08 67739317 2079744331
postgres 2016-11-24 11:10:13.201455+08 68949292 2078534356
template0 2016-11-24 11:08:50.448082+08 67739317 2079744331
template0 2016-11-24 11:10:13.201455+08 68949292 2078534356
template1 2016-11-24 11:08:50.448082+08 67739317 2079744331
template1 2016-11-24 11:10:13.201455+08 68949292 2078534356

建议

数据库的年龄正常情况下应该小于vacuum_freeze_table_age, 如果剩余年龄小于2亿, 建议人为干预, 将LONG SQL或事务杀掉后, 执行vacuum freeze.

2. 长事务, 2PC

snap_ts database user query xact_start xact_duration query_start query_duration state

snap_ts | name | statement | prepare_time | duration | parameter_types | from_sql ---|---|---|---|---|---|---|---|---

建议

长事务过程中产生的垃圾, 无法回收, 建议不要在数据库中运行LONG SQL, 或者错开DML高峰时间去运行LONG SQL. 2PC事务一定要记得尽快结束掉, 否则可能会导致数据库膨胀.

参考: http://blog.163.com/digoal@126/blog/static/1638770402015329115636287/

六、数据库安全或潜在风险分析

1. 用户密码到期时间

snap_ts | rolname | rolvaliduntil ---|---|---|--- 2016-11-24 11:10:13.201455+08 | a | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | b | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | dba | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | digoal | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | pg_signal_backend | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | postgres | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | test | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | test1 | 9999-01-01 00:00:00+08

建议

到期后, 用户将无法登陆, 记得修改密码, 同时将密码到期时间延长到某个时间或无限时间, alter role ... VALID UNTIL 'timestamp'.