# SQL Set operations and subqueries
This notebook provides commands from the SQL2 lecture


## Initialization

Run the next cell to set up PostgreSQL

In [None]:
# install
!pip install psycopg2-binary
!apt install postgresql postgresql-contrib &>log

## Connect to a database 


In [None]:
# Set your database configuation information
# set this value to your database's username
dbuser = "netId"
# set this value to your database's name
dbName = "netIddb"
# set this value to your database's port
port = 5432
# set this value to your database's endpoint
endpoint = "postgres.clear.rice.edu"

# build the connection string
def make_conn_str(dbuser, password, endpoint, port, dbName):

    return f"postgresql+psycopg2://{dbuser}:{password}@{endpoint}:{port}/{dbName}"

        
import getpass
password = getpass.getpass()

In [None]:
# set connection
%load_ext sql
conn_str = make_conn_str(dbuser, password,endpoint, port, dbName)
# Limit queries to 100 results. Increase this value if needed, but recognize that your JN will increase in size as well. 
%config SqlMagic.displaylimit=100
%sql $conn_str

### Create the tables


In [None]:
%%sql
DROP TABLE IF EXISTS Frequents;
CREATE TABLE Frequents
(
    drinker VARCHAR(50) NOT NULL,
    cafe VARCHAR(50) NOT NULL,
    PRIMARY KEY (drinker, cafe)
);

DROP TABLE IF EXISTS Likes;
CREATE TABLE Likes
(
    drinker VARCHAR(50) NOT NULL,
    coffee VARCHAR(50) NOT NULL,
    PRIMARY KEY (drinker, coffee)
);


DROP TABLE IF EXISTS Serves;
CREATE TABLE Serves
(
    cafe VARCHAR(50) NOT NULL,
    coffee VARCHAR(50)  NOT NULL,
    PRIMARY KEY (cafe, coffee)
);

DROP TABLE IF EXISTS Rates;
CREATE TABLE Rates
(
    drinker VARCHAR(50) NOT NULL,
    coffee VARCHAR(50) NOT NULL,
    score INTEGER NOT NULL
);



Load some data

In [None]:
%%sql
DELETE FROM Frequents;
DELETE FROM Likes;
DELETE FROM Serves;
DELETE FROM Rates;

INSERT INTO Frequents VALUES 
('Chris', 'A Cafe'),
('Chris', 'Double Trouble'),
('Risa', 'Brew Joint'),
('Risa', 'Java Lava'),
('Ying', 'Java Lava'),
('Risa', 'Double Trouble');

INSERT INTO Likes VALUES 
('Chris', 'Drip'),
('Chris', 'Espresso'),
('Risa', 'Cold Brew'),
('Risa', 'Drip'),
('Risa', 'Espresso'),
('Carlos', 'Cappuccino'),
('Ying', 'Cold Brew'),
('Ying', 'Drip'),
('Ying', 'Espresso'),
('Ying', 'Cappuccino');

INSERT INTO Serves VALUES 
('A Cafe', 'Drip'),
('A Cafe', 'Espresso'),
('A Cafe', 'Cold Brew'),
('Brew Joint', 'Espresso'),
('Double Trouble', 'Espresso'),
('Double Trouble', 'Cold Brew');

INSERT INTO Rates VALUES
('Risa', 'Cold Brew', 5),
('Risa', 'Drip', 3),
('Risa', 'Espresso', 4),
('Chris', 'Espresso', 2),
('Chris', 'Drip', 1),
('Ying', 'Drip', 1);



## UNION and UNION ALL

## UNION 

In [None]:
%%sql
SELECT f.Drinker
FROM Frequents f
UNION 
SELECT l.Drinker 
FROM Likes l;


In [None]:
%%sql
SELECT f.Drinker
FROM Frequents f
UNION ALL
SELECT l.Drinker 
FROM LIKES l;

What is the difference?

your thoughts here

## Intersection and Difference

* Intersection - Implemented via ```INNER JOIN```

* Difference - Implemented via ```EXCEPT``` 

Write a query to find the intersection of drinkers in Likes and Frequents

In [None]:
%%sql
-- your code here


Write a query to find the drinkers in Likes - (difference) the drinkers in Frequents

In [None]:
%%sql
-- your code here


## Subqueries

### Subqueries returning a single value / scalar

Is the highest score for drinker Chris higher than 4?


In [None]:
%%sql
SELECT (SELECT r.score FROM RATES r WHERE r.drinker = 'Chris' ORDER BY r.score DESC LIMIT 1) > 4 gt4

### Subqueries in the WHERE clause

Who goes to a cafe that serves 'Cold Brew'?

Here is the query using ```EXISTS```

In [None]:
%%sql
SELECT DISTINCT f.drinker
FROM Frequents f
WHERE EXISTS (
    SELECT s.cafe
    FROM Serves s
    WHERE f.cafe = s.cafe
      AND s.coffee = 'Cold Brew')


Rewrite this query using a join

In [None]:
%%sql
SELECT DISTINCT f.drinker
FROM Frequents f
WHERE 
   -- your code here



### Correlated subqueries

Who frequents cafes that serve at least 3 different coffees?

In [None]:
%%sql
SELECT DISTINCT f.drinker
FROM Serves s JOIN Frequents f ON s.cafe = f.cafe
WHERE (SELECT COUNT(DISTINCT coffee)
        FROM Serves s2 WHERE s2.cafe = s.cafe) >= 3


## ```IN```
Who likes 'Cold Brew' and 'Espresso'?

In [None]:
%%sql
SELECT DISTINCT l.drinker
FROM Likes l
WHERE l.coffee = 'Cold Brew' 
    AND l.drinker IN (
        SELECT l2.drinker 
        FROM Likes l2 
        WHERE l2.coffee = 'Espresso')


Sometimes it's easier to write these queries with a JOIN

In [None]:
%%sql
SELECT DISTINCT l1.drinker
FROM Likes l1, Likes l2
WHERE l1.drinker = l2.drinker 
     AND l1.coffee = 'Cold Brew'     
    AND l2.coffee = 'Espresso'


## Who likes all of the coffees that Risa likes?

Start by writing a query that returns all of the coffees that Risa likes

In [None]:
%%sql 
SELECT l2.coffee
FROM Likes l2
WHERE l2.drinker = 'Risa' 


Who likes all of the coffees that Risa likes?

```
SELECT DISTINCT l1.drinker
FROM Likes l1
WHERE NOT EXISTS ({a coffee Risa likes that is not also liked by l1.drinker})
```


Further expanding:

```
SELECT DISTINCT l1.drinker
FROM Likes l1
WHERE NOT EXISTS (
    SELECT l2.coffee 
    FROM Likes l2
    WHERE l2.drinker = 'Risa' AND l2.coffee NOT IN (
        {the set of coffees liked by l1.drinker}))
```


In [None]:
%%sql
SELECT DISTINCT l1.drinker
FROM Likes l1
WHERE NOT EXISTS (
  SELECT l2.coffee 
  FROM Likes l2
  WHERE l2.drinker = 'Risa' 
    AND l2.coffee NOT IN (
      SELECT l3.coffee
      FROM LIKES l3
      WHERE l3.drinker = l1.drinker))


### SOME / ANY

 returns ```TRUE``` if there is at least 1 tuple in the subquery can make the boolean operation evaluate to true.
 
 Of the coffees Risa has rated, list the coffees that are not Risa’s favorite.
 
 Start with a basic query that returns the coffees that Risa likes

In [None]:
%%sql
SELECT r.coffee, r.score
FROM Rates r
WHERE r.drinker = 'Risa'

Next, compare the scores to Risa's other scores

In [None]:
%%sql
SELECT r.coffee
FROM Rates r
WHERE r.drinker = 'Risa' AND r.SCORE < SOME (
  SELECT r2.score 
  FROM Rates r2
  WHERE r2.drinker = 'Risa' )

What's going on here?

* The subquery returns the multiset of all the scores Risa has given to coffees
* The r.SCORE < SOME clause evaluates to TRUE if the multiset is not empty


### ALL

$<$expression$>$ $<$boolOp$>$ ALL (subquery)

Similar to ```SOME```

$<$boolOp$>$ must evaluate to true for **everything** in the subquery

RATES (DRINKER, COFFEE, SCORE)


In [None]:
%%sql
SELECT DISTINCT r.drinker
FROM Rates r
WHERE r.score < ALL (
 SELECT r2.score
  FROM Rates r2
  WHERE r2.drinker = 'Risa')

What does this query return? 

Check the Rates table to figure it out

In [None]:
%%sql
SELECT r.*
FROM Rates r;

## Subqueries in the ```SELECT``` clause

Are all cafes frequented?

In [None]:
%%sql
SELECT (SELECT COUNT(DISTINCT cafe) FROM Frequents) 
    = (SELECT COUNT(DISTINCT cafe) FROM Serves) AS allFrequented


### Subqueries in the FROM Clause

* Can have a subquery in FROM clause
* Treated as a temporary table
* MUST be assigned an alias


Who goes to a cafe that serves 'Cold Brew'?

Old way:

In [None]:
%%sql
SELECT DISTINCT f.drinker
FROM Frequents f, Serves s
WHERE f.cafe = s.cafe 
    AND s.coffee = 'Cold Brew'


With a subquery in the FROM clause:

In [None]:
%%sql
SELECT DISTINCT f.drinker
FROM Frequents f, 
   (-- your code here) s2
WHERE f.cafe = s2.cafe


## VIEWS

Can make SQL much easier to read

In [None]:
%%sql
CREATE VIEW CB_CAFE AS
SELECT s.cafe FROM Serves s 
    WHERE s.coffee = 'Cold Brew';

SELECT DISTINCT f.drinker
FROM Frequents f, CB_CAFE c
WHERE f.cafe = c.cafe



### List the coffees that are not Risa's favorite.

Create a VIEW that returns the coffees that Risa has rated

In [None]:
%%sql

Now use that VIEW to list the coffees that are not Risa's favorite

In [None]:
%%sql



In [None]:
%%sql


## Clean up -- remove any views