Skip to content

Database monitoring

Dheepak Ramanathan edited this page May 28, 2026 · 37 revisions

Database monitoring and troubleshooting

Note: Almost all the sql queries here(except few which requires superuser) can be executed as non-super-user.

Monitoring queries

Identifying most expensive queries

-- Based on stats from last 30 days.
SELECT
  round(total_exec_time::numeric, 2)  AS total_exec_ms,
  round(total_plan_time::numeric, 2)  AS total_plan_ms,
  round(mean_exec_time::numeric, 2)  AS mean_plan_ms,
  round(mean_plan_time::numeric, 2)  AS mean_exec_ms,
  *
FROM pg_stat_statements
WHERE stats_since >= now() - interval '30 days' 
ORDER BY total_exec_time DESC, total_plan_time DESC, calls DESC
LIMIT 10;

Monitoring tables

SELECT
  t.schemaname AS schema_name,
  t.relname AS table_name,
  pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
  t.n_dead_tup AS stat_dead_estimate,
  pst.dead_tuple_count AS actual_dead_count,
  pst.dead_tuple_percent,
  pst.free_percent,
  t.last_autovacuum,
  t.last_autoanalyze
FROM pg_stat_user_tables t
CROSS JOIN LATERAL pgstattuple(t.relid) AS pst
ORDER BY pst.dead_tuple_count DESC, pst.dead_tuple_percent DESC, pst.free_percent DESC
LIMIT 20;

Fixing bloated tables

Remediation for bloated tables can be done in 3 possible ways

  • Tuning the auto-vacuum settings so that the tables get auto-vacuumed regularly.(Recommended regular process)
  • Running the vacuum manually(lightweight and non-disruptive process) when there is an immediate need to remediate a bloated table.(Only when needed).
  • Running a full vacuum(It is a disruptive process which will lock the entire table during the process) if a table-space needs to be reduced after a vacuum.(This situation is not applicable to SIMS as we do not purge tables and expecting the table not to grow again).

Tuning the auto-vacuum settings

Listed below are the global auto-vacuum settings for all tables. image This can be retrieved by

SELECT 
    name, 
    setting, 
    unit, 
    short_desc 
FROM pg_settings 
WHERE name LIKE '%autovacuum%';

Monitoring indexes

All the indexes in SIMS database are btree indexes except for sims.user_first_name_last_name which is a gin index. The strategy to find bloated indexes is different for btree and gin indexes.

Identifying bloated Btree indexes(Requires extension pgstattuple)

select
	i.schemaname,
	i.relname as table_name,
	i.indexrelname as index_name,
	pg_size_pretty(pg_relation_size(i.indexrelid)) as index_size_pretty,
	stat.avg_leaf_density,
	stat.leaf_fragmentation,
	stat.empty_pages,
	stat.deleted_pages,
	case
		when stat.leaf_fragmentation > 50 then 'REINDEX NOW'
		when stat.leaf_fragmentation > 30 then 'REINDEX SOON'
		when stat.avg_leaf_density < 50 then 'BLOATED'
		else 'HEALTHY' end 
	as index_bloat_status
	from
		pg_stat_user_indexes i
	join 
    pg_class c on
		i.indexrelid = c.oid
	join 
    pg_am am on
		c.relam = am.oid
	cross join lateral
    pgstatindex(i.indexrelid) stat
	where
		am.amname = 'btree'
		and pg_relation_size(i.indexrelid) > 1024 * 1024
		-- Filter only indexes > 1MB
	order by
		stat.leaf_fragmentation desc;

Fixing bloated Btree indexes

--Run as nonsuperuser
REINDEX INDEX CONCURRENTLY index_name; 

--Optional: If needed to re-index all the indexes in a table
REINDEX TABLE CONCURRENTLY table_name;

Identifying bloated Gin indexes(Requires extension pgstattuple)

As we have only one GIN index sims.user_first_name_last_name, the steps to identify are very brief here.

The pgstattuple extension provides pgstatginindex(), but it only measures unflushed entries, not the actual empty space left behind by dead tuples. Because of this, GIN indexes suffer from two distinct types of bloat, and you have to check for them differently: Pending List Bloat and Structural Bloat.

Identifying Gin indexes with structural bloat

-- If the index_to_table_size_percentage is more than 100% then it can be considered as structural bloat.
-- Even after re-index if the index size still remains over 100% then it is the true size of the index without bloating.
SELECT
    i.schemaname,
    i.relname AS table_name,
    i.indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(i.relid)) AS table_size,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    ROUND(
        CAST(pg_relation_size(i.indexrelid) AS NUMERIC) /
        NULLIF(pg_relation_size(i.relid), 0) * 100,
        2
    ) AS index_to_table_size_percentage
FROM
    pg_stat_user_indexes i
JOIN 
    pg_class c ON i.indexrelid = c.oid
JOIN 
    pg_am am ON c.relam = am.oid
WHERE
    am.amname = 'gin'
    AND pg_relation_size(i.indexrelid) > 1024 * 1024 -- Only check indexes > 10MB
ORDER BY
    index_to_table_size_percentage DESC;

Fixing Gin indexes with structural bloat

--Run as nonsuperuser
REINDEX INDEX CONCURRENTLY index_name; 

--Optional: If needed to re-index all the indexes in a table
REINDEX TABLE CONCURRENTLY table_name;

Identifying Gin indexes with pending list bloat

--Check for pending pages or pending tuples
select * from pgstatginindex('sims.user_first_name_last_name');

Fixing Gin indexes with pending pages/tuples

Note: Fix for the pending list must be executed only when there is no structural bloat. If structural bloat is present, then the reindex process will also clean up the pending list.

-- Option 1: VACUUM to flush pending list
VACUUM your_table;

-- Option 2: manual flush of pending list only
SELECT gin_clean_pending_list('sims.user_first_name_last_name');

Clone this wiki locally