Support Script to Eliminate Duplicate SQL Rows #112

jonc101 · 2018-04-04T16:38:08Z

In case accidentally run an insert script multiple times, will end up with duplicate rows. Notes below on how to find and eliminate them. Requires some manual work to check which table and which column combinations define a unique row. Would be convenient to have a quick support script that can take that information as input and then just generate / execute the respective SQL queries.

PostgreSQL duplicate removal (Trick is ctid system column that tracks unique physical location for every entry)

delete from stride_order_results A using
( select min(ctid) as ctid, order_proc_id, line
from stride_order_results
group by order_proc_id, line having count(*) > 1
) B
where A.order_proc_id = B.order_proc_id and A.line = B.line
and A.ctid <> B.ctid

https://stackoverflow.com/questions/6583916/delete-duplicate-records-in-postgresql

Another in place option, but requires a unique ID column to exist

delete
from patient_item as test
where exists ( select 'x'
from patient_item as i
where i.patient_id = test.patient_id
and i.clinical_item_id = test.clinical_item_id
and i.item_date = test.item_date
and i.patient_item_id < test.patient_item_id
)
limit 10;

jonc101 added this to To Do in DevOps via automation Apr 4, 2018

jonc101 added the Priority - 3 Low Nice to have bonus item. Not immediately essential label Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Script to Eliminate Duplicate SQL Rows #112

Support Script to Eliminate Duplicate SQL Rows #112

jonc101 commented Apr 4, 2018

Support Script to Eliminate Duplicate SQL Rows #112

Support Script to Eliminate Duplicate SQL Rows #112

Comments

jonc101 commented Apr 4, 2018

PostgreSQL duplicate removal (Trick is ctid system column that tracks unique physical location for every entry)

Another in place option, but requires a unique ID column to exist