## Entity Integrity and Check Constraints

Amongst other requirements, three that are **vital** for a database are:

  * **C**onfidentiality
  * **I**ntegrity
  * **A**vailability 
  
these are shortly known as the CIA requirements.  

If you lookup the meaning of the word _integrity_ in a dictionary amongst the definitions you will see "internal consistency or lack of corruption in electronic data".  You will also see synonyms such as "scrupulousness, sincerity, truthfulness, trustworthiness". 

There are several ways in which the integrity of a database can be compromised and equally several preventive measures to protect integrity.  In this notebook we will look at **entity integrity**, one of the simplest and foundational types of integrity, and **check constraints**.


### Entity Integrity

_Entity Integrity_ simply means that no two records in the database can be identical in the values of all their fields.  The reason we desire this is to avoid ambiguity.  We need a way of uniquely refering to any record in a database.

##### Load the SQL Jupyter notebook extension

In [1]:
%load_ext sql 

##### Connect to the PG database
1. Ensure PG is running first :-)
2. If you are using Windows you need to specify the password for the roll
`%sql postgres://isdb16:PASSWORD@localhost/postgres`

Since I use a Mac, my connection command below doesn't specify a password.

In [2]:
%sql postgres://isdb16@localhost/postgres 

'Connected: isdb16@postgres'

##### Create a very simple table

In [3]:
%%sql 

DROP TABLE IF EXISTS Students;

CREATE TABLE Students (
   name text, 
   gpa  numeric(3,2)
); 

Done.
Done.


[]

##### Insert some sample records

In [4]:
%%sql 

INSERT INTO Students (name, gpa)
     VALUES ('Jack', 3.5),
            ('Jill', 3.8);

2 rows affected.


[]

In [5]:
%sql SELECT * FROM Students;

2 rows affected.


name,gpa
Jack,3.5
Jill,3.8


##### Insert couple more records

In [6]:
%%sql 

INSERT INTO Students (name, gpa)
     VALUES ('Bill', 3.5),
            ('Pat', 3.8);

2 rows affected.


[]

Now if you look at the contents of the Students table we see all four records as expected

In [7]:
%sql SELECT * FROM Students;

4 rows affected.


name,gpa
Jack,3.5
Jill,3.8
Bill,3.5
Pat,3.8


##### The execution of the following statement violates entity integrity i.e., we will have two records that are identical in the values of all their attributes.

In [8]:
%%sql 

INSERT INTO Students (name, gpa)
     VALUES ('Jack', 3.5);

1 rows affected.


[]

In [9]:
%sql SELECT * FROM Students;

5 rows affected.


name,gpa
Jack,3.5
Jill,3.8
Bill,3.5
Pat,3.8
Jack,3.5


##### We obtain entity integrity by using PRIMARY KEYs.  

Once we declare a particular attribute to be a primary key, SQL will flag Integrity Violation errors if we try to insert a record into the table with a pre-existing primary key.

Lets drop the original table and re-create it, this time specifying a primary key.  There are a number of syntatic ways of specifying the primary key.  We will conform to the guidelines given in in the SQL style guide http://www.sqlstyle.guide/ .  

Do spent time going over the style guide and familiarizing yourself with its recommendations.  We expect all SQL deliverables to conform to the guideline.

In [10]:
%%sql 

DROP TABLE IF EXISTS Students;

CREATE TABLE Students (
   PRIMARY KEY(name), 
   name  text, 
   gpa   numeric(3,2)
);

Done.
Done.


[]

In [11]:
%%sql 

INSERT INTO Students (name, gpa)
     VALUES ('Jack', 3.5),
            ('Jill', 3.8);

2 rows affected.


[]

In [12]:
%sql SELECT * FROM Students;

2 rows affected.


name,gpa
Jack,3.5
Jill,3.8


##### Now, if we try to insert a new record with a pre-existing primary key we get an integrity violation error.

Note that the error message is a bit long as we get two error messages (1) one from SQL and (2) one from the Python interface to SQL

### Check Constraints

In the Students table we have an attribute for gpa which has been declared to be of type numeric with a total of three digits and two digits to the right of the decimal points.  Data types are our first line of defense against improper data being entered.  For example, we can not assign a string such as "hi" to the gpa attribute.

_But_, at the moment nothing prevents us from entering GPA of 9.7, which business logic wise is not correct.  How can we ensure that values entered into the GPA field at potentially valid GPAs?  SQL provides _check constraints_ to guard against this type of integrity violation. 

##### Lets go back to a clean slate by droping the Students table and creating it afresh, but whic time with a check constraint

We specify that values in the gpa field have got to be in [0,4].  Syntactically there are a couple of ways we can specify this.  To check if a value lies in a given range, it is usually conceptually more clear to use `between` as opposed to a conjunction.

In [None]:
%%sql 

DROP TABLE IF EXISTS Students;

CREATE TABLE Students (
   PRIMARY KEY( name), 
   name text, 
   gpa numeric(3,2)
       -- check( 0 <= gpa and gpa <= 4 )
        check( gpa between 0 and 4)
);

Lets insert our example students, Jack ad Jill

In [None]:
%%sql 

INSERT INTO Students (name, gpa)
     VALUES ('Jack', 3.5),
            ('Jill', 3.8);
        
SELECT *
  FROM Students;

Everything is fine.

But if we now try to insert a student whose GPA is outside the interval [0,4] an integrity violation is flagged.

In [None]:
%%sql
--raw

INSERT INTO Students (name, gpa)
     VALUES ('Bill', 5.5);
