Skip to content

Support Script to Eliminate Duplicate SQL Rows #112

Open
@jonc101

Description

@jonc101

In case accidentally run an insert script multiple times, will end up with duplicate rows. Notes below on how to find and eliminate them. Requires some manual work to check which table and which column combinations define a unique row. Would be convenient to have a quick support script that can take that information as input and then just generate / execute the respective SQL queries.

PostgreSQL duplicate removal (Trick is ctid system column that tracks unique physical location for every entry)

delete from stride_order_results A using
( select min(ctid) as ctid, order_proc_id, line
from stride_order_results
group by order_proc_id, line having count(*) > 1
) B
where A.order_proc_id = B.order_proc_id and A.line = B.line
and A.ctid <> B.ctid

https://stackoverflow.com/questions/6583916/delete-duplicate-records-in-postgresql

Another in place option, but requires a unique ID column to exist

delete
from patient_item as test
where exists ( select 'x'
from patient_item as i
where i.patient_id = test.patient_id
and i.clinical_item_id = test.clinical_item_id
and i.item_date = test.item_date
and i.patient_item_id < test.patient_item_id
)
limit 10;

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority - 3 LowNice to have bonus item. Not immediately essential

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions