# Integrity Constraints with SQL
This question is almost identical to the second problem of the Datalog/IC assignment. We will find integrity constraint violations in the `publications` dataset, but now we are using SQL. The same rules for executing SQL in the Jupyter Notebook apply: connect to the database first and write your SQL answer after the SQL magic line.

![Publication](Publication_Table.png "Publication")

In [1]:
%reload_ext sql
%reload_ext lib.sqlite.sqlite_evaluate_magic
import os

### Connecting to the database

In [2]:
# The following command will connect you to the database.
# Any query that you will run after this cell will be run on the publications.db database.
publications_db_url = 'sqlite:///' + os.path.expanduser('~/data_readonly/sqlite/databases/publications.db')
%sql $publications_db_url

'Connected: @/home/jovyan/data_readonly/sqlite/databases/publications.db'

The database has two tables: (1) `Publication` and (2) `Cites`.  
The header of the output of the following two queries will tell you the column names in these tables.

In [3]:
%%sql 
select * from Publication limit 1;

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.


pid,authors,year,title,journal,vol,no,fp,lp,publisher
6755,hyatt,1872,fossil,bullmcz,5,5,91,9,publisher1


In [4]:
%%sql 
select * from Cites limit 1;

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.


citing,cited
4711,2020


## We will now write various queries to find "bad" (i.e., inconsistent) data. 
If the output format is not clear from the question's wording, look at the expected output and make your query returns results in the expected form!

### [12 points] Question 1: Key Constraints

* **The key attribute ID should uniquely determine all other attributes.**

In DENIAL form, report all IC violations, i.e., where at least two rows have the same ID but have different attribute values.The output should include all the columns associated with the violated publications

In [7]:
%%sql
Problem2a_FD_1 <<
-- Your goes query here. Don't change variable name.

select p1.pid, p1.authors, p1.year, p1.title, p1.journal, p1.vol, p1.no, p1.fp, p1.lp, p1.publisher
from publication as p1, publication as p2

where p1.pid == p2.pid and p1.authors != p2.authors

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.
Saving data to local variable Problem2a_FD_1['result']
Saving query to local variable Problem2a_FD_1['query']


pid,authors,year,title,journal,vol,no,fp,lp,publisher
4407,kummel,1969,ammonoids,bullmcz,137,3,476,,publisher2
4407,doe,2015,foobar,bullmcz,10,1,10,1.0,


In [8]:
# Run this cell to see the expected output of the previous query
%sql_expected_output Problem2a_FD_1

pid,authors,year,title,journal,vol,no,fp,lp,publisher
4407,kummel,1969,ammonoids,bullmcz,137,3,476,,publisher2
4407,doe,2015,foobar,bullmcz,10,1,10,1.0,


In [9]:
# Test Q1
%sql_evaluate Problem2a_FD_1

### [11 points] Question 2: Functional Dependency
* **A journal has a single publisher, i.e., FD: Journal --> Publisher**

In DENIAL form, report the journals having multiple publishers, i.e., two or more publishers recorded in the table.The Output should include only journals and publisher’s details. 

In [18]:
%%sql
Problem2a_FD_2 <<
-- Your query goes here. Don't change variable name.

select s.journal, s.publisher 
from (select distinct p1.publisher, p1.journal
from publication as p1, publication as p2 where p1.journal == p2.journal
and ((p1.publisher != p2.publisher) 
     or (p1.publisher is null and p2.publisher is not null) 
     or (p1.publisher is not null and p2.publisher is null))) as s;

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.
Saving data to local variable Problem2a_FD_2['result']
Saving query to local variable Problem2a_FD_2['query']


journal,publisher
bullmcz,publisher1
bullmcz,publisher2
bullmcz,


In [19]:
# Run this cell to see the expected output of the previous query
%sql_expected_output Problem2a_FD_2

journal,publisher
bullmcz,publisher1
bullmcz,publisher2
bullmcz,


In [20]:
# Test Q2
%sql_evaluate Problem2a_FD_2

### [11 points] Question 3: Semantic Constraint

* **The last page number cannot be smaller than the first page number.**

In DENIAL form, report those publications for which their last page is smaller than the first page.The output should include all the attribute columns for the publications

In [29]:
%%sql
Problem2a_NC_1 <<
-- Your query goes here. Don't change variable name.

select distinct p1.pid, p1.authors, p1.year, p1.title, p1.journal, p1.vol, p1.no, p1.fp, p1.lp, p1.publisher
from publication as p1, publication as p2

where p1.pid == p2.pid and p1.lp < p2.fp

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.
Saving data to local variable Problem2a_NC_1['result']
Saving query to local variable Problem2a_NC_1['query']


pid,authors,year,title,journal,vol,no,fp,lp,publisher
6755,hyatt,1872,fossil,bullmcz,5,5,91,9,publisher1
4407,doe,2015,foobar,bullmcz,10,1,10,1,


In [30]:
# Run this cell to see expected output of previous query
%sql_expected_output Problem2a_NC_1

pid,authors,year,title,journal,vol,no,fp,lp,publisher
6755,hyatt,1872,fossil,bullmcz,5,5,91,9,publisher1
4407,doe,2015,foobar,bullmcz,10,1,10,1,


In [31]:
# Test Q3
%sql_evaluate Problem2a_NC_1

### [11 points] Question 4: Inclusion Dependency: 
- **Every cited publication in `CITES` also occurs in `PUBLICATION`.**

In DENIAL form, report those publication IDs which existed in the `CITES` table but **not** in the `PUBLICATION` table.The output should include only publication IDs. 

In [35]:
%%sql
Problem2b_ID <<
-- Your query goes here. Don't change variable name.

select cited from Cites where cited not in (select pid from Publication)

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.
Saving data to local variable Problem2b_ID['result']
Saving query to local variable Problem2b_ID['query']


cited
2020
3799


In [36]:
# Run this cell to see expected output of previous query
%sql_expected_output Problem2b_ID

cited_but_not_in_Publication
2020
3799


In [37]:
# Test Q4
%sql_evaluate Problem2b_ID

### [11 points] Question 5: Semantic Constraint

- **If P1 cites P2 then P2's year of publication cannot be greater than P1.**

In [42]:
%%sql
Problem2b_NC_2 <<
-- Your query goes here. Don't change the variable name.

select citing, cited, p1.year, p2.year 
from Cites

join publication as p1 on p1.pid == Cites.citing
join publication as p2 on p2.pid == Cites.cited

where p1.year < p2.year

 * @/home/jovyan/data_readonly/sqlite/databases/publications.db
 * sqlite:////home/jovyan/data_readonly/sqlite/databases/publications.db
Done.
Saving data to local variable Problem2b_NC_2['result']
Saving query to local variable Problem2b_NC_2['query']


citing,cited,year,year_1
2044,2580,1934,1962


In [43]:
# Run this cell to see the expected output of the previous query
%sql_expected_output Problem2b_NC_2

citing,cited,citing_pub_year,cited_pub_year
2044,2580,1934,1962


In [44]:
# Test Q5
%sql_evaluate Problem2b_NC_2