In [1]:
%load_ext sql
%sql duckdb://

# GROUP BY and HAVING
Explanation:
In this code snippet, we demonstrate the usage of `GROUP BY` and `HAVING` clauses in SQL for advanced queries.

The `GROUP BY` clause is used to group rows based on one or more columns. In the first example, we group the employees by department and calculate the total salary for each department using the `SUM` function.

The `HAVING` clause is used to filter the grouped results based on a condition. In the second example, we retrieve the departments with a total salary greater than 6000.

In the third example, we calculate the average salary and the number of employees for each department using the `AVG` and `COUNT` functions. The `HAVING` clause is then used to filter the departments with more than 2 employees and an average salary greater than 5000.

These examples demonstrate how `GROUP BY` and `HAVING` can be used to perform advanced queries and apply conditions to grouped results.

In [2]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, department, salary)
VALUES
    (1, 'John Doe', 'IT', 5000),
    (2, 'Jane Smith', 'HR', 6000),
    (3, 'Mike Johnson', 'IT', 5500),
    (4, 'Emily Brown', 'Finance', 7000),
    (5, 'David Lee', 'IT', 4500);

Count


In [3]:
%%sql

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department;

department,total_salary
IT,15000.0
HR,6000.0
Finance,7000.0


In [4]:
%%sql

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department
HAVING SUM(salary) > 6000;

department,total_salary
IT,15000.0
Finance,7000.0


In [5]:
%%sql

SELECT department, AVG(salary) AS average_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

department,average_salary,employee_count
IT,5000.0,3
HR,6000.0,1
Finance,7000.0,1


In [7]:
%%sql

SELECT department, AVG(salary) AS average_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 2 AND AVG(salary) >= 5000;

department,average_salary,employee_count
IT,5000.0,3


# Subqueries and Derived Tables
In this code snippet, we demonstrate the usage of subqueries and derived tables in SQL.

1. Example 1 shows a subquery in the SELECT statement. It calculates the average salary of all employees and displays it for each employee.

2. Example 2 demonstrates a subquery in the WHERE clause. It selects employees whose salary is higher than the average salary.

3. Example 3 uses the EXISTS operator with a subquery. It selects employees who work in the 'IT' department and have a higher salary than any other employee in the 'IT' department.

4. Example 4 showcases a derived table. It calculates the average salary using a subquery and joins it with the employees table to display the average salary for each employee.

Subqueries and derived tables are powerful tools in SQL that allow us to perform complex queries and calculations. They can be used in various scenarios to filter, aggregate, or join data. Understanding and utilizing these features can greatly enhance the capabilities of SQL queries.
```

## Base Table

In [8]:
%%sql

CREATE OR REPLACE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  department VARCHAR(50),
  salary INT
);

INSERT INTO employees (id, name, department, salary)
VALUES (1, 'John Doe', 'IT', 5000),
       (2, 'Jane Smith', 'HR', 6000),
       (3, 'Mike Johnson', 'IT', 5500),
       (4, 'Emily Davis', 'Finance', 7000),
       (5, 'David Brown', 'IT', 4500);

Count


## Subquery as Column

In [9]:
%%sql

SELECT name, department, salary,
       (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;

name,department,salary,avg_salary
John Doe,IT,5000,5600.0
Jane Smith,HR,6000,5600.0
Mike Johnson,IT,5500,5600.0
Emily Davis,Finance,7000,5600.0
David Brown,IT,4500,5600.0


## Subquery in WHERE

In [10]:
%%sql

SELECT name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

name,department,salary
Jane Smith,HR,6000
Emily Davis,Finance,7000


## Subquery with EXISTS Operator

In [11]:
%%sql

SELECT name, department
FROM employees e
WHERE EXISTS (
  SELECT 1
  FROM employees
  WHERE department = 'IT' AND salary > e.salary
);

name,department
John Doe,IT
David Brown,IT


## Derived Table

This doesn't appear to work in DuckDB/JupySQL.

In [12]:
%%sql

SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN (
  SELECT AVG(salary) AS avg_salary
  FROM employees
) d;

RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(duckdb.ParserException) Parser Error: syntax error at or near ";"
LINE 6: ) d;
           ^
[SQL: SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN (
  SELECT AVG(salary) AS avg_salary
  FROM employees
) d;]
(Background on this error at: https://sqlalche.me/e/20/f405)

If you need help solving this issue, send us a message: https://ploomber.io/community


# UNION, INTERSECT, EXCEPT
Explanation:
- The code snippet demonstrates the usage of the `UNION`, `INTERSECT`, and `EXCEPT` operators in SQL.
- Two tables, `table1` and `table2`, are created with similar structures.
- Data is inserted into both tables.
- The `UNION` operator combines the rows from both tables, removing duplicates. The result is a single set of rows with unique values.
- The `INTERSECT` operator returns only the rows that appear in both tables. It returns a set of rows that have common values in both tables.
- The `EXCEPT` operator returns only the rows that appear in the first table but not in the second table. It returns a set of rows that are unique to the first table.
- Each query is executed and the results are printed to demonstrate the behavior of each operator.

Expected Output:
- UNION:
```
id | name
---+------
 1 | John
 2 | Jane
 3 | Alice
 4 | Bob
```
- INTERSECT:
```
id | name
---+------
 2 | Jane
 3 | Alice
```
- EXCEPT:
```
id | name
---+------
 1 | John
```

In [14]:
%%sql

CREATE OR REPLACE TABLE table1 (
    id INT,
    name VARCHAR(50)
);

CREATE OR REPLACE TABLE table2 (
    id INT,
    name VARCHAR(50)
);

INSERT INTO table1 (id, name)
VALUES (1, 'John'),
       (2, 'Jane'),
       (3, 'Alice');

INSERT INTO table2 (id, name)
VALUES (2, 'Jane'),
       (3, 'Alice'),
       (4, 'Bob');

Count


## Union

In [15]:
%%sql

SELECT id, name
FROM table1
UNION
SELECT id, name
FROM table2;

id,name
1,John
2,Jane
3,Alice
4,Bob


## Intersect

In [16]:
%%sql

SELECT id, name
FROM table1
INTERSECT
SELECT id, name
FROM table2;

id,name
2,Jane
3,Alice


## Except

In [17]:
%%sql

SELECT id, name
FROM table1
EXCEPT
SELECT id, name
FROM table2;

id,name
1,John
