报告时间段: `2016-11-24 10:57:04.081798` ~ `2016-11-24 11:10:13.201455`

一、数据库定制参数信息

1. 用户或数据库级别定制参数

database	role	snap_ts	setconfig

建议

定制参数需要关注, 优先级高于数据库的启动参数和配置文件中的参数, 特别是排错时需要关注.

二、数据库空间使用分析

1. 表空间使用情况

tablespace	tbs_location	snap_ts	size
pg_default	`-`	`2016-11-24 11:08:50.448082+08`	108 GB
pg_default	`-`	`2016-11-24 11:10:13.201455+08`	108 GB
pg_global	`-`	`2016-11-24 11:08:50.448082+08`	537 kB
pg_global	`-`	`2016-11-24 11:10:13.201455+08`	537 kB

建议

注意检查表空间所在文件系统的剩余空间, (默认表空间在$PGDATA/base目录下), IOPS分配是否均匀, OS的sysstat包可以观察IO使用率.

2. 数据库使用情况

database	snap_ts	size
`db0`	`2016-11-24 11:08:50.448082+08`	69 MB
`db0`	`2016-11-24 11:10:13.201455+08`	69 MB
`postgres`	`2016-11-24 11:08:50.448082+08`	108 GB
`postgres`	`2016-11-24 11:10:13.201455+08`	108 GB
`template0`	`2016-11-24 11:08:50.448082+08`	7225 kB
`template0`	`2016-11-24 11:10:13.201455+08`	7225 kB
`template1`	`2016-11-24 11:08:50.448082+08`	7225 kB
`template1`	`2016-11-24 11:10:13.201455+08`	7225 kB

建议

注意检查数据库的大小, 是否需要清理历史数据.

三、数据库连接分析

1. 活跃度

state	snap_ts	connections
active	`2016-11-24 11:08:50.448082+08`	1
active	`2016-11-24 11:10:13.201455+08`	16
idle	`2016-11-24 11:10:13.201455+08`	2
idle in transaction	`2016-11-24 11:10:13.201455+08`	16

建议

如果active状态很多, 说明数据库比较繁忙. 如果idle in transaction很多, 说明业务逻辑设计可能有问题. 如果idle很多, 可能使用了连接池, 并且可能没有自动回收连接到连接池的最小连接数.

2. 剩余连接数

snap_ts	max_enabled_connections	used	res_for_super	res_for_normal
`2016-11-24 11:08:50.448082+08`	100	1	3	96
`2016-11-24 11:10:13.201455+08`	100	34	3	63

建议

给超级用户和普通用户设置足够的连接, 以免不能登录数据库.

3. 用户连接数限制

rolename	snap_ts	conn_limit	connects
`postgres`	`2016-11-24 11:08:50.448082+08`	-1	1
`postgres`	`2016-11-24 11:10:13.201455+08`	-1	34

建议

给用户设置足够的连接数, alter role ... CONNECTION LIMIT .

4. 数据库连接限制

database	snap_ts	conn_limit	connects
`postgres`	`2016-11-24 11:08:50.448082+08`	-1	1
`postgres`	`2016-11-24 11:10:13.201455+08`	-1	34

建议

给数据库设置足够的连接数, alter database ... CONNECTION LIMIT .

四、数据库性能分析

1. TOP 10 SQL : total_cpu_time

rolename	database	calls	total_ms	min_ms	max_ms	mean_ms	stddev_ms	rows	shared_blks_hit	shared_blks_read	shared_blks_dirtied	temp_blks_read	temp_blks_written	query
`postgres`	`postgres`	3257431	801212.218000147	0.0155	661.5295	0.228815984402593	2.67856748675768	3257431	126748340	2100	12197	0	0	`UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;`
`postgres`	`postgres`	3257443	398825.853000225	0.023	648.646	0.124109951147548	4.6639211699234	3257443	17604282	2921	2476560	0	0	`UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;`
`postgres`	`postgres`	3257439	357644.714999923	0.0145	653.85	0.102480527524248	1.36301121326012	3257439	114801162	905	27190	0	0	`UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;`
`postgres`	`postgres`	3257425	76471.4549991722	0.006	462.702	0.0232362901295874	1.77897215668638	3257425	3365375	20746	23380	0	0	`INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP);`
`postgres`	`postgres`	3257439	55327.0259991886	0.009	18.772	0.0170636195451918	0.040562028155207	3257439	13227016	0	0	0	0	`SELECT abalance FROM pgbench_accounts WHERE aid = $1;`
`postgres`	`postgres`	1	19017.764	19017.764	19017.764	19017.764	0	1	2631936	2100	2130	14163	14081	`select snap_database();`
`postgres`	`postgres`	1	7511.361	7511.361	7511.361	7511.361	0	0	816723	3	6	225	211	create table IF NOT EXISTS snap_pg_rel_space_bucket as select 1::int8 snap_id, now() snap_ts, current_database(), buk this_buk_no, cnt rels_in_this_buk, pg_size_pretty(min) buk_min, pg_size_pretty(max) buk_max from ( select row_number() over (partition by buk order by tsize), tsize, buk, min(tsize) over (partition by buk),max(tsize) over (partition by buk), count(*) over (partition by buk) cnt from ( select pg_relation_size(a.oid) tsize, width_bucket(pg_relation_size(a.oid),tmin-1,tmax+1,10) buk from ( select min(pg_relation_size(a.oid)) tmin, max(pg_relation_size(a.oid)) tmax from pg_class a, pg_namespace c where a.relnamespace=c.oid and nspname !~ $$^pg_$$ and nspname<>$$information_schema$$ ) t, pg_class a, pg_namespace c where a.relnamespace=c.oid and nspname !~ $$^pg_$$ and nspname<>$$information_schema$$ ) t ) t where row_number=1;
`postgres`	`postgres`	1	3931.596	3931.596	3931.596	3931.596	0	0	909503	1	1	0	0	`create table IF NOT EXISTS snap_pg_hash_idx as select 1::int8 snap_id, now() snap_ts, current_database(),pg_get_indexdef(oid) from pg_class where relkind=$$i$$ and pg_get_indexdef(oid) ~ $$USING hash$$;`
`postgres`	`postgres`	3	2363.977	392.25	1006.846	787.992333333333	280.356036853538	1043	27870	0	0	19604	19604	`select snap_report_database(?,?);`
`postgres`	`postgres`	1	2158.239	2158.239	2158.239	2158.239	0	0	223555	3	4	4682	4666	create table IF NOT EXISTS snap_pg_table_bloat as select 1::int8 snap_id, now() snap_ts, current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta, ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages, CASE WHEN relpages < otta THEN 0 ELSE bs(sml.relpages-otta)::bigint END AS wastedbytes, CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs(relpages-otta))::bigint \|\| $$ bytes$$ END AS wastedsize, iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta, ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN 0 ELSE bs(ipages-iotta) END AS wastedibytes, CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs(ipages-iotta))::bigint \|\| $$ bytes$$ END AS wastedisize, CASE WHEN relpages < otta THEN CASE WHEN ipages < iotta THEN 0 ELSE bs(ipages-iotta::bigint) END ELSE CASE WHEN ipages < iotta THEN bs(relpages-otta::bigint) ELSE bs(relpages-otta::bigint + ipages-iotta::bigint) END END AS totalwastedbytes FROM ( SELECT nn.nspname AS schemaname, cc.relname AS tablename, COALESCE(cc.reltuples,0) AS reltuples, COALESCE(cc.relpages,0) AS relpages, COALESCE(bs,0) AS bs, COALESCE(CEIL((cc.reltuples((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta, COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples(datahdr-12))/(bs-20::float)),0) AS iotta FROM pg_class cc JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$ LEFT JOIN ( SELECT ma,bs,foo.nspname,foo.relname, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT ns.nspname, tbl.relname, hdr, ma, bs, SUM((1-coalesce(null_frac,0))coalesce(avg_width, 2048)) AS datawidth, MAX(coalesce(null_frac,0)) AS maxfracsum, hdr+( SELECT 1+count()/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname ) AS nullhdr FROM pg_attribute att JOIN pg_class tbl ON att.attrelid = tbl.oid JOIN pg_namespace ns ON ns.oid = tbl.relnamespace LEFT JOIN pg_stats s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname, ( SELECT (SELECT current_setting($$block_size$$)::numeric) AS bs, CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$) IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants WHERE att.attnum > 0 AND tbl.relkind=$$r$$ GROUP BY 1,2,3,4,5 ) AS foo ) AS rs ON cc.relname = rs.relname AND nn.nspname = rs.nspname LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml order by wastedbytes desc limit 10;

建议

检查SQL是否有优化空间, 配合auto_explain插件在csvlog中观察LONG SQL的执行计划是否正确.

2. 数据库统计信息, 回滚比例, 命中比例, 数据块读写时间, 死锁, 复制冲突

database	snap_ts	hit_ratio
`db0`	`2016-11-24 11:08:50.448082+08`	0.00 %
`db0`	`2016-11-24 11:10:13.201455+08`	0.00 %
`postgres`	`2016-11-24 11:08:50.448082+08`	99.94 %
`postgres`	`2016-11-24 11:10:13.201455+08`	99.98 %
`template0`	`2016-11-24 11:08:50.448082+08`	0.00 %
`template0`	`2016-11-24 11:10:13.201455+08`	0.00 %
`template1`	`2016-11-24 11:08:50.448082+08`	0.00 %
`template1`	`2016-11-24 11:10:13.201455+08`	0.00 %

建议

回滚比例大说明业务逻辑可能有问题, 命中率小说明shared_buffer要加大, 数据块读写时间长说明块设备的IO性能要提升, 死锁次数多说明业务逻辑有问题, 复制冲突次数多说明备库可能在跑LONG SQL.

3. 检查点, bgwriter 统计信息

checkpoints_timed	checkpoints_req	checkpoint_write_time	checkpoint_sync_time	buffers_checkpoint	buffers_clean	maxwritten_clean	buffers_backend	buffers_backend_fsync	buffers_alloc
1	38	152252	1949	2510995	0	0	33459	0	30984

说明

checkpoints_timed , 统计周期内, 发生了多少次调度检查点.

checkpoints_req , 统计周期内, 发生了多少次人为执行检查点.

checkpoint_write_time , 检查点过程中, write系统调用的耗时ms.

checkpoint_sync_time , 检查点过程中, fsync系统调用的耗时ms.

buffers_checkpoint , 检查点过程中, ckpt进程写出(write)了多少buffer pages.

buffers_clean , 统计周期内, bgwriter进程写出(write)了多少buffer pages.

maxwritten_clean , 统计周期内, bgwriter被打断了多少次(由于write的pages超过一个bgwriter调度周期内的阈值).

buffers_backend , 统计周期内, 有多少pages是被backend process直接write out的.

buffers_backend_fsync , 统计周期内, 有多少pages是被backend process直接fsync的.

buffers_alloc , 统计周期内, 指派了多少个pages.

建议

checkpoint_write_time多说明检查点持续时间长, 检查点过程中产生了较多的脏页.

checkpoint_sync_time代表检查点开始时的shared buffer中的脏页被同步到磁盘的时间, 如果时间过长, 并且数据库在检查点时性能较差, 考虑一下提升块设备的IOPS能力.

buffers_backend_fsync太多说明需要加大shared buffer 或者减小bgwriter_delay参数.

maxwritten_clean太多说明需要减小调大bgwriter_lru_maxpages和bgwriter_lru_multiplier参数.

4. 归档统计信息

archived_count	last_archived_wal	last_archived_time	failed_count	last_failed_wal	last_failed_time	now_insert_xlog_file
0	-	`1970-01-01 00:00:00+08`	728	00000001000000820000002D	`2016-11-24 11:10:13.476726+08`	000000010000008700000003

建议

last_archived_wal和now_insert_xlog_file相差很多, 说明失败的归档很多.

五、数据库年龄分析

1. 数据库年龄

database	snap_ts	age	age_remain
`db0`	`2016-11-24 11:08:50.448082+08`	67739317	2079744331
`db0`	`2016-11-24 11:10:13.201455+08`	68949292	2078534356
`postgres`	`2016-11-24 11:08:50.448082+08`	67739317	2079744331
`postgres`	`2016-11-24 11:10:13.201455+08`	68949292	2078534356
`template0`	`2016-11-24 11:08:50.448082+08`	67739317	2079744331
`template0`	`2016-11-24 11:10:13.201455+08`	68949292	2078534356
`template1`	`2016-11-24 11:08:50.448082+08`	67739317	2079744331
`template1`	`2016-11-24 11:10:13.201455+08`	68949292	2078534356

建议

数据库的年龄正常情况下应该小于vacuum_freeze_table_age, 如果剩余年龄小于2亿, 建议人为干预, 将LONG SQL或事务杀掉后, 执行vacuum freeze.

2. 长事务, 2PC

snap_ts	database	user	query	xact_start	xact_duration	query_start	query_duration	state

snap_ts | name | statement | prepare_time | duration | parameter_types | from_sql ---|---|---|---|---|---|---|---|---

建议

长事务过程中产生的垃圾, 无法回收, 建议不要在数据库中运行LONG SQL, 或者错开DML高峰时间去运行LONG SQL. 2PC事务一定要记得尽快结束掉, 否则可能会导致数据库膨胀.

参考: http://blog.163.com/digoal@126/blog/static/1638770402015329115636287/

六、数据库安全或潜在风险分析

1. 用户密码到期时间

snap_ts | rolname | rolvaliduntil ---|---|---|--- 2016-11-24 11:10:13.201455+08 | a | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | b | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | dba | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | digoal | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | pg_signal_backend | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | postgres | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | test | 9999-01-01 00:00:00+08 2016-11-24 11:10:13.201455+08 | test1 | 9999-01-01 00:00:00+08

建议

到期后, 用户将无法登陆, 记得修改密码, 同时将密码到期时间延长到某个时间或无限时间, alter role ... VALID UNTIL 'timestamp'.

Files

20161123_01_pdf_002.md

Latest commit

History