# `Avoiding Duplicate Entries PSQL with Unique`

# <font color=red>Mr Fugu Data Science</font>

# (◕‿◕✿)

# `UNIQUE INDEX` or `Constraint` *`(Use with Existing DB)`*

`-----------------------------------------------`

# **`Unique Constraint:`**

+ If you decide to use this `constraint` understand that you can apply it to
`1 or more columns` but you `cannot have rows with repeating information` and `NULL` values are viewed as `not equal`.
    + There is a provision to account for the `Null` values. 
        + Use `NULLS NOT DISTINCT` to treat them as equal
+ If using with `multiple columns` the combined rows values cannot be repeats
+ Creating a Unique Constraint also creates a unique index on the backend

There are two types: `Table constraint` and `Column Constraint` based on notations will show the difference

**What is the difference from a** `primary key` and a `unique constraint`?

    
|         Parameters        | Primary_Key |  Unique_Constraint |
|:-------------------------:|:-----------:|:------------------:|
|        Number/Table       |    Only 1   |      Multiple      |
|        Null Values        | Not Allowed |       Allowed      |
| Alter:  Update or Deletes | Not Allowed |       Allowed      |
|       Auto-Increment      |   Allowed   |     Not Allowed    |
|           Index           |  Clustered  |    non-clustered   |
|            Use            | unique rows | prevent duplicates |



`---------------------------------------------------`
+ **`When to USE:`**
    + If you have `columns` that are `called often` in queries
    + When a `Where` clause or `Join` are performed often for these columns
+ **`Avoid IF:`**
    + Columns are `UPDATED` often
    + Small tables size
    + Columns `Not used often`
    + This DOES create overhead for `INSERT` or `UPDATES` so take into consideration

`---------------------------------------------------`

# **`Unique Index:`**
+ Creates unique indexed columns
+ An `Index` is used for quick lookups, typically as a btree while aiding in data integrity
+ Great to use if we can subset the data to a small number of rows from a given table

`------------------------------------------------`

+ **`Your Index will also create a lock for writing operations to DB`**
+ **`AVOID IF:`**
    + You have a small table
    + The result returned is large compared to table size (these 2 examples result in sequential scans!)
    + Dont add index if you have column with large number of NULL values
    
The above two ideas result in `I/O` issues and turn your `index` into a waste of time. It turns out that you will have more speed doing a sequential scan than the slowdown you get from trying the index here.


Creating an index has artifacts that we don't pay attention to for example the table will become locked for doing writes. 
+ This occurs because a table scan is performed and updating the index for each row. Now you will not allow inserts, updates or deletes until the scan is performed to create index.
+ Avoiding this `lock` can be performed using `concurrently` option
    + If you use this there is a consideration because the table will be scanned 2 times now and need to wait to build the index until all transactions finish.

`Caveats:` there are times when `REINDEX`ing are needed due to possible fragmentation
+ if you have a `updates often`
+ as well as `deletions`
    
    
**Interesting thought**: if you are having multi-threading and using a `constraint` then you may have instances where `inserts or updates` will access the same data at the same time. This can cause issues for uniqueness.

`------------------------------------------------`

# EX. 1) `Create Table with Unique Constraint 3 variants`

**1A.) `Column Constraint`**

`CREATE TABLE Customer_orders ( Transaction_id integer PRIMARY KEY,
User_name VARCHAR(255) NOT NULL,
Order_date date,
Quantity_Items integer,
Order_notes varchar(200),`
**`UNIQUE User_name`**
`);`

**1B.) `Table Constraint`**

`CREATE TABLE Customer_orders ( Transaction_id integer PRIMARY KEY,
User_name VARCHAR(255) NOT NULL,
Order_date date,
Quantity_Items integer,
Order_notes varchar(200),`
**`UNIQUE (User_name)`**
`);`


**1C.) `Naming Constraint and having a Table Constraint`**

`CREATE TABLE Customer_orders ( Transaction_id integer PRIMARY KEY,
User_name VARCHAR(255) NOT NULL,
Order_date date,
Quantity_Items integer,
Order_notes varchar(200),`
**`CONSTRAINT User_unique UNIQUE (User_name)`**
`);`

*If we were to add multiple columns just add a comma inside the parenthesis and your additional columns*

+ The `Table Constraint` allows you to have uniqueness on a combination of columns
    + while individual columns can have repeats
+ `Column Constraint` does not allow repeating row information

+ *`side Note:`* if you do not name constraint, PK, index, PSQL will do default naming for you!

`--------------------------------------------------------`

# EX. 2) `Alter Table:` Use this if you have an existing table


`ALTER TABLE Customer_orders
ADD CONSTRAINT User_unique UNIQUE (User_name);`

`-------------`

# EX. 3) `Delete "Drop" Constraint`

`ALTER TABLE Customer_orders`
**`DROP CONSTRAINT`** `User_unique;`

`-------------`

# EX. 4) `Unique Index`


**`CREATE UNIQUE INDEX`** `idx_transaction_id
ON Customer_orders(Transaction_id);`

`-------------`

# Ex. 5) **`Show Indexes for Current Database`**



`SELECT
table_name,
index_name,
index_def
FROM
pg_indexes
WHERE
schemaname = 'public'
ORDER BY
table_name,
index_name;`

`---------------Output---------------`


|    table_name    |      index_name     |                                              index_def                                              |
|:---------------:|:------------------:|:--------------------------------------------------------------------------------------------------:|
| Refunds         | Refund_id_pkey     | CREATE UNIQUE INDEX Refund_id_pkey ON public.Refunds USING btree(Refund_id)                        |
| Customer_Orders | Customer_orders_pk |   CREATE UNIQUE INDEX Customer_orders_pkey ON public.Customer_orders USING btree(Transaction_id)   |
| Customer_Orders |   User_unique_key  | CREATE UNIQUE INDEX Customer_orders_User_name_key ON public.Customer_orders USING btree(User_name) |
| Pizza_Orders    | User_name_key      | CREATE UNIQUE INDEX Pizza_orders_User_name_key ON public.Pizza_orders USING btree(User_name)       |



# Ex. 6) **`Show Indexes for Current Table`**


`SELECT
indexname,
indexdef
FROM
pg_indexes 
WHERE
tablename = 'Customer_orders';`

`--------------Output--------------`


|    tablename    |      indexname     |                                              indexdef                                              |
|:---------------:|:------------------:|:--------------------------------------------------------------------------------------------------:|
| Customer_Orders | Customer_orders_pk |  CREATE UNIQUE INDEX Customer_orders_pkey ON public.Customer_orders USING btree(Transaction_id)    |
| Customer_Orders |   User_unique_key  | CREATE UNIQUE INDEX Customer_orders_User_name_key ON public.Customer_orders USING btree(User_name) |

`------------------------------------------------`

# Ex. 7) `Tricky Example:`

Assume we have a Table with a State, City and Country. We want to make 2 or more columns unique. How can we think about setting this up without error? 

**`Ex 7a.)`** 

`CREATE TABLE Country_table_Ex (Country text NOT NULL,
City text NOT NULL` **`UNIQUE,`**
`State text NOT NULL` **`UNIQUE,`**
`Population int`
`);`

`Question:` 
+ How does this work?
+ Is there a situation where this will fail but it should pass?

**`Ex 7b.)`**

`CREATE TABLE Country_table_Ex_2 (Country text NOT NULL,
    City text NOT NULL,
    State text NOT NULL,
    Population int`
    **`UNIQUE(City,State)`**
`);`

`How is this different and what is it doing?`

**`Answers Ex. 7a,7b:`**

`7a)` This example will only allow distinct values for each column. If you have two states with same city name there will be errors because now you cannot input these as the table is currently

`INSERT INTO Country_table_Ex VALUES (
    'USA',
    'Portland', 'Oregon'
);`

This will create an insert: 

`INSERT 0 1`

But, if we do:

`INSERT INTO Country_table_Ex VALUES (
    'USA',
    'Portland', 'Maine'
);`

**we get an error:**

`ERROR:  duplicate key value violates unique constraint "Country_table_Ex_City_key"`

`DETAIL:  Key (City)=(Portland) already exists`

`------------------------------------------------------------`

`7b)` To rectify this we place the constraint differently allowing the rows to contain same city names while the states change. Therefore, same name for cities but different states

`INSERT INTO Country_table_Ex_2 VALUES (
    'USA',
    'Portland', 'Oregon'
);`

`INSERT 0 1`

`INSERT INTO Country_table_Ex VALUES (
    'USA',
    'Portland', 'Maine'
);`

`INSERT 0 1`

**this was a success because, you are using a combination of the two columns!**

# `Last TIP: no one gets this far.. so this is our surprise`

**Deferring** a transaction this is not discussed here but can help you if you have a situation where your constraint or index may intefer with processes. You may need to instead run your transactions and at the end do your checks or whatever process you want to do at the end to avoid conflicts.



**`Command Line:`**

**`\di`** this will output all the Indexes you have for the current DB

**`d your_table_name_here`** this shows what you have for a specific table (schema, indexes,keys, constraints, etc)



`Let me know if there is any content you are interested. Thanks for watching`

# Like, Share & <font color=red>SUB</font>scribe

# `Citations`

# ◔̯◔

https://tomcam.github.io/postgres/

https://aws.amazon.com/blogs/database/hidden-dangers-of-duplicate-key-violations-in-postgresql-and-how-to-avoid-them/

https://subscription.packtpub.com/book/data/9781803248974/5/ch05lvl1sec63/preventing-duplicate-rows

https://codingsight.com/sql-insert-into-select-5-easy-ways-to-handle-duplicates/

https://www.geeksforgeeks.org/multiple-indexes-vs-multi-column-indexes/ (Index info)

https://www.postgresql.org/docs/current/indexes-multicolumn.html (Index key points)

https://devcenter.heroku.com/articles/postgresql-indexes (Index info 2)

https://rbranson.medium.com/10-things-i-hate-about-postgresql-20dbab8c2791 (issues with PSQL)

https://database.guide/2-ways-to-delete-duplicate-rows-in-postgresql-ignoring-the-primary-key/ (good examples)

https://www.c-sharpcorner.com/article/different-ways-to-find-and-delete-duplicate-rows-from-a-table-in-sql-server/

https://copyprogramming.com/howto/sql-delete-duplicate-combined-rows-in-postgresql (interesting read)

https://medium.com/flatiron-engineering/uniqueness-in-postgresql-constraints-versus-indexes-4cf957a472fd

https://www.geeksforgeeks.org/postgresql-list-indexes/

https://www.geeksforgeeks.org/difference-between-primary-key-and-unique-key/

https://www.postgresql.org/docs/current/sql-createindex.html

https://leopard.in.ua/2016/09/20/safe-and-unsafe-operations-postgresql

https://www.prisma.io/dataguide/postgresql/column-and-table-constraints#:~:text=Table%20constraints%20can%20express%20any,any%20of%20the%20table's%20columns

https://blog.quest.com/a-guide-to-using-postgres-indexes/

`Deferring Transactions:`

https://medium.com/the-missing-bit/keeping-an-ordered-collection-in-postgresql-9da0348c4bbe

https://github.com/prisma/prisma/discussions/8789

https://github.com/prisma/prisma/issues/8807

https://brunoscheufler.com/blog/2022-03-20-understanding-deferred-foreign-key-constraints-in-postgresql

https://www.alibabacloud.com/blog/postgresql-deferrable-constraints-unique-primary-key-foreign-key-and-exclude_597717
