## Enumerated Data Types and Row Assertions

In continuing our discussion of _integrity_, in this NB we look are two additional constructs provided by SQL.

**The Problem:** We motivate the need for these two constructs with the following situation.  Suppose you have a table of `Students` with three attributes: `names, status (UG or G), gpa`.  You also have the domain constraint that undergraduate GPAs are in the range `[0,4]` where as graduate GPAs are in the range `[0,5]`. How can you ensure that when  student information is inserted in the table the domain constraints are not violated (i.e., integrity is not compromised).

### Enumerated Data Types

When we define a _data type_ we are defining two things (1) a set of values and (2) the set of operators possible on those set of values.  For example the integer data type is a set of values from (-2^31..0..2^31-1) and a set of arithmetic operators.  The data type with the smallest set of values is Boolean which has only two values.

All languages provide a number of builtin data types and also provide facilities for defining quite sophisticated data structures. PostgreSQL  provides a very rich set of data types (above and beyond those required by the SQL:2011 standard.  You can read more about the builtin data types at https://www.postgresql.org/docs/current/static/datatype.html 

In the given problem, consider the attribute status.  In can only have two values `undergraduate` or `graduate`.  How can we ensure this constraint?

We can achieve this in multiple three ways:
1. Use check constraints
2. Use an enumerated data type
3. Use a foreign key

We will look at the first two ways in this NB

In [None]:
%load_ext sql

# Windows users do specify your password
%sql postgres://isdb16@localhost/postgres

The reason we need to do a 'cascaded drop' at this point will be explained later in the course

In [None]:
%sql DROP TABLE IF EXISTS Students CASCADE;

We could define our Students table using `text` data types.  For simplicity we don't specify a primary key.

In [None]:
%%sql
CREATE TABLE Students (
    name    text,
    status  text,
    gpa     numeric(3,2) default 0
);

The obvious problem is that the `status` is unconstrainted and we could enter **any** text for `status` where as it should take on only one of two values.

In [None]:
%%sql
INSERT INTO Students (name, status)
     VALUES ('Jack', 'part time');
    
select * from Students;

How can this be avoided?

### Option 1: check constraints

In [None]:
%%sql
DROP TABLE IF EXISTS Students CASCADE;

CREATE TABLE Students (
    name    text,
    status  text 
            check( status in ('undergraduate', 'graduate')),
            -- ** come back here **
    gpa     numeric(3,2) default 0
);

The below inserts will work fine

In [None]:
%%sql

INSERT INTO Students (name, status, gpa)
     VALUES ('Jack', 'undergraduate', 3.6),
            ('Jill', 'graduate', 4.2);
    
SELECT * FROM Students;

But, if we try to insert a student with a different status, an exception will be raised.  The below cell is in 'raw' format and hence is not executable.  Convert to 'code' format to execute and then convert back to 'raw' format

### Option 2: Enumerated Data Types

We create our own data type by specifying the values it can take.  Details are at https://www.postgresql.org/docs/current/static/datatype-enum.html

In [None]:
%%sql

DROP TYPE IF EXISTS Student_Status;
CREATE TYPE Student_Status as enum ('undergraduate', 'graduate');

In [None]:
%%sql

DROP TABLE IF EXISTS Students CASCADE;

CREATE TABLE Students (
    name    text,
    status  Student_Status, 
    gpa     numeric(3,2) default 0
);

The below executes fine

In [None]:
%%sql

INSERT INTO Students (name, status, gpa)
     VALUES ('Jack', 'undergraduate', 3.6),
            ('Jill', 'graduate', 4.2);
    
SELECT * FROM Students;

again, the below will be flagged

### Option 3: Foreign Keys
We will look at this option later.

### Question: what are the pros and cons of using check constraints vs. enumerated data types? 
[raja pg_enum, pg_type]
--solution

1. 
2. 
3. 

# Lab Exercise 

_Suppose we have the restriction that the GPA of UG students can be in [0,4] where as the GPA for graduate students can be in [0,5]._ 

*How can we ensure these dual constraints on both the `status` and `gpa`? This is termed a tuple or row constraint. Start with the below code. Hint: use (a) the enumerated data type `Student_Status` and (b) a check clause which will test a Boolean expression involving both `status` and `gpa` (multiple `and`s and one `or`) . If a student is inserted with out explicitly specifying a GPA enter a default GPA of 0.*

In [None]:
%%sql 

DROP TABLE IF EXISTS Students;

CREATE TABLE Students (
    name    text,
    status  text,
    gpa     numeric(3,2)
);

Place your solution in the below cell.  Note that the cell is currently in 'raw' format

After you created the newly constrained `Students` table, the following should execute fine

In [None]:
%%sql

INSERT INTO Students (name, gpa, status)
     VALUES ('Jack', 3.3, 'undergraduate'),
            ('Jill', 3.9, 'graduate');

SELECT * FROM Students;

Suppose we insert Bill without specifying a GPA for Bill, we should be ok

In [None]:
%%sql

INSERT INTO Students (name, status)
     VALUES ('Bill', 'graduate');

SELECT * FROM Students;

Since Bill is a Graduate student, if we set Bill's GPA to 4.8 we should be fine

In [None]:
%%sql

 UPDATE Students
    SET gpa = 4.8
  WHERE name='Bill';

SELECT * FROM students;

But if we now increment Bill's GPA to 5.8 an error should be raised