# Better data quality with constraints

#### Integrity constraints
1. Attribute constraints, e.g. data types on columns (Chapter 2)\
2. Key constraints, e.g. primary keys (Chapter 3)\
3. Referential integrity constraints, enforced through foreign keys (Chapter 4)

#### Why constraints? 
- Constraints give the data structure
- Constraints help with consistency, and thus data quality
- Data quality is a business advantage / data science prerequisite
- Enforcing is difficult, but PostgreSQL helps


### Dealing with data types (casting)
CREATE TABLE weather (\
    temperature integer,\
    wind_speed text);

SELECT temperature * wind_speed AS wind_chill\
FROM weather;

operator does not exist: integer * textHINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.

#### The right query 
SELECT temperature * CAST(wind_speed AS integer) AS wind_chill\
FROM weather;


### Conforming with data types
For demonstration purposes, I created a fictional database table that only holds three records. The columns have the data types date, integer, and text, respectively.

CREATE TABLE transactions (\
 transaction_date date, \
 amount integer,\
 fee text
);\
Have a look at the contents of the transactions table.

The transaction_date accepts date values. According to the PostgreSQL documentation, it accepts values in the form of YYYY-MM-DD, DD/MM/YY, and so forth.

Both columns amount and fee appear to be numeric, however, the latter is modeled as text – which you will account for in the next exercise.

In [1]:
# -- Let's add a record to the table
t = '''INSERT INTO transactions (transaction_date, amount, fee) 
VALUES ('2018-09-24', 5454, 30);

-- Doublecheck the contents
SELECT *
FROM transactions;'''

- Good work. You can see that data types provide certain restrictions on how data can be entered into a table. This may be tedious at the moment of insertion, but saves a lot of headache in the long run.

### Type CASTs
In the video, you saw that type casts are a possible solution for data type issues. If you know that a certain column stores numbers as text, you can cast the column to a numeric form, i.e. to integer.

SELECT CAST(some_column AS integer)\
FROM table;\
Now, the some_column column is temporarily represented as integer instead of text, meaning that you can perform numeric calculations on the column.

In [2]:
# -- Calculate the net amount as amount + fee
c = '''SELECT transaction_date, CAST(amount AS INTEGER) + CAST(fee AS INTEGER) AS net_amount
FROM transactions;'''

- Good job! You saw how, sometimes, type casts are necessary to work with data. However, it is better to store columns in the right data type from the first place. You'll learn how to do this in the next exercises.

# Working with data types

#### Working with data types
- Enforced on columns (i.e. attributes)
- Define the so-called "domain" of a column
- Define what operations are possible
- Enforce consistent storage of values

#### The most common types
- text: character strings of any length
- varchar [ (x) ]: a maximum of x characters
- char [ (x) ]: a fixed-length string of x characters
- boolean: can only take three states, e.g. TRUE, FALSE and NULL (unknown)\
From the PostgreSQL documentation.


#### The most common types (cont'd.)
- date, time and timestamp: various formats for date and time calculations
- numeric: arbitrary precision numbers, e.g. 3.1457
- integer: whole numbers in the range of -2147483648 and +2147483647 \
From the PostgreSQL documentation.


#### Specifying types upon table creation

In [1]:
s = '''CREATE TABLE students (
    ssn integer, 
    name varchar(64), 
    dob date, 
    average_grade numeric(3, 2), -- e.g. 5.54 tuition_paid boolean);'''

In [3]:
# Alter types after table creation

a = '''ALTER TABLE students
ALTER COLUMN name 
TYPE varchar(128);'''

al = '''ALTER TABLE students
ALTER COLUMN average_grade
TYPE integer 
-- Turns 5.54 into 6, not 5, before type conversion 
USING ROUND(average_grade);'''

#### Change types with ALTER COLUMN
The syntax for changing the data type of a column is straightforward. The following code changes the data type of the column_name column in table_name to varchar(10):

ALTER TABLE table_name\
ALTER COLUMN column_name\
TYPE varchar(10)\
Now it's time to start adding constraints to your database.

In [4]:
# -- Select the university_shortname column
d = '''SELECT DISTINCT(university_shortname) 
FROM professors;'''

In [5]:
# -- Specify the correct fixed-length character type
ap = '''ALTER TABLE professors
ALTER COLUMN university_shortname
TYPE CHAR(3);'''

In [7]:
# -- Change the type of firstname
apf = '''ALTER TABLE professors
ALTER COLUMN firstname
TYPE VARCHAR(64);'''

#### Convert types USING a function
If you don't want to reserve too much space for a certain varchar column, you can truncate the values before converting its type.

For this, you can use the following syntax:

ALTER TABLE table_name\
ALTER COLUMN column_name\
TYPE varchar(x)\
USING SUBSTRING(column_name FROM 1 FOR x)\
You should read it like this: Because you want to reserve only x characters for column_name, you have to retain a SUBSTRING of every value, i.e. the first x characters of it, and throw away the rest. This way, the values will fit the varchar(x) requirement.

In [8]:
# -- Convert the values in firstname to a max. of 16 characters
al = '''ALTER TABLE professors 
ALTER COLUMN firstname 
TYPE varchar(16)
USING SUBSTRING(firstname FROM 1 FOR 16);'''

- Perfect! However, it's best not to truncate any values in your database, so we'll revert this column to varchar(64). Now it's time to move on to the next set of attribute constraints!

# The not-null and unique constraints

#### What does NULL mean? An example

CREATE TABLE students ( \
    ssn integernotnull,\
    lastname varchar(64) notnull,\
    home_phone integer,\
    office_phone integer);

NULL != NULL

#### How to add or remove a not-null constraint
- When creating a table...

CREATE TABLE students ( ssn integernotnull,\
                       lastname varchar(64) notnull,\
                       home_phone integer, \
                       office_phone integer);

- After the table has been created...

ALTER TABLE students \
ALTER COLUMN home_phone\
SET NOT NULL;

ALTER TABLE students \
ALTER COLUMN ssn \
DROP NOT NULL;


#### Adding unique constraints
CREATE TABLE table_name ( column_name UNIQUE);

ALTER TABLE table_name\
ADD CONSTRAINT some_name UNIQUE(column_name);


In [9]:
# -- Disallow NULL values in firstname
ap = '''ALTER TABLE professors 
ALTER COLUMN firstname SET NOT NULL;'''

In [10]:
# -- Disallow NULL values in lastname
al = '''ALTER TABLE professors
ALTER COLUMN lastname SET NOT NULL;'''

- Good job – it is no longer possible to add professors which have either their first or last name set to NULL. Likewise, it is no longer possible to update an existing professor and setting their first or last name to NULL.

#### Make your columns UNIQUE with ADD CONSTRAINT
As seen in the video, you add the UNIQUE keyword after the column_name that should be unique. This, of course, only works for new tables:

CREATE TABLE table_name (\
 column_name UNIQUE
);\
If you want to add a unique constraint to an existing table, you do it like that:

ALTER TABLE table_name\
ADD CONSTRAINT some_name UNIQUE(column_name);\
Note that this is different from the ALTER COLUMN syntax for the not-null constraint. Also, you have to give the constraint a name some_name.

In [11]:
# -- Make universities.university_shortname unique
au = '''ALTER TABLE universities
ADD CONSTRAINT university_shortname_unq UNIQUE(university_shortname);'''

In [12]:
# -- Make organizations.organization unique
ao = '''ALTER TABLE organizations
ADD CONSTRAINT organization_unq UNIQUE(organization);'''

- Perfect. Making sure universities.university_shortname and organizations.organization only contain unique values is a prerequisite for turning them into so-called primary keys – the subject of the next chapter!