In [2]:
%load_ext sql
%sql duckdb://

# GROUP BY and HAVING

In this code snippet, we demonstrate the usage of `GROUP BY` and `HAVING` clauses in SQL for advanced queries.

The `GROUP BY` clause is used to group rows based on one or more columns. In the first example, we group the employees by department and calculate the total salary for each department using the `SUM` function.

The `HAVING` clause is used to filter the grouped results based on a condition. In the second example, we retrieve the departments with a total salary greater than 6000.

In the third example, we calculate the average salary and the number of employees for each department using the `AVG` and `COUNT` functions. The `HAVING` clause is then used to filter the departments with more than 2 employees and an average salary greater than 5000.

These examples demonstrate how `GROUP BY` and `HAVING` can be used to perform advanced queries and apply conditions to grouped results.

In [2]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, department, salary)
VALUES
    (1, 'John Doe', 'IT', 5000),
    (2, 'Jane Smith', 'HR', 6000),
    (3, 'Mike Johnson', 'IT', 5500),
    (4, 'Emily Brown', 'Finance', 7000),
    (5, 'David Lee', 'IT', 4500);

Count


In [3]:
%%sql

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department;

department,total_salary
IT,15000.0
HR,6000.0
Finance,7000.0


In [4]:
%%sql

SELECT department, SUM(salary) AS total_salary
FROM employees
GROUP BY department
HAVING SUM(salary) > 6000;

department,total_salary
IT,15000.0
Finance,7000.0


In [5]:
%%sql

SELECT department, AVG(salary) AS average_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

department,average_salary,employee_count
IT,5000.0,3
HR,6000.0,1
Finance,7000.0,1


In [7]:
%%sql

SELECT department, AVG(salary) AS average_salary, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 2 AND AVG(salary) >= 5000;

department,average_salary,employee_count
IT,5000.0,3


# GROUP BY on two columns

You can group by multiple columns at the same time as below, but it's as if the two columns are one thing (not a secondary ordering).

In [24]:
%%sql 

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, department, salary)
VALUES
    (1, 'John Doe', 'Sales', 5500),
    (2, 'Jane Smith', 'Marketing', 6000),
    (3, 'Mike Johnson', 'Sales', 5500),
    (4, 'Emily Brown', 'Marketing', 6500),
    (5, 'David Lee', 'IT', 7000),
    (6, 'Sarah Wilson', 'IT', 7500);

Count


In [25]:
%%sql

SELECT department, salary, COUNT(*)
FROM employees
GROUP BY department, salary;

department,salary,count_star()
Sales,5500.0,2
Marketing,6000.0,1
Marketing,6500.0,1
IT,7000.0,1
IT,7500.0,1


# Subqueries and Derived Tables
In this code snippet, we demonstrate the usage of subqueries and derived tables in SQL.

1. Example 1 shows a subquery in the SELECT statement. It calculates the average salary of all employees and displays it for each employee.

2. Example 2 demonstrates a subquery in the WHERE clause. It selects employees whose salary is higher than the average salary.

3. Example 3 uses the EXISTS operator with a subquery. It selects employees who work in the 'IT' department and have a higher salary than any other employee in the 'IT' department.

4. Example 4 showcases a derived table. It calculates the average salary using a subquery and joins it with the employees table to display the average salary for each employee.

Subqueries and derived tables are powerful tools in SQL that allow us to perform complex queries and calculations. They can be used in various scenarios to filter, aggregate, or join data. Understanding and utilizing these features can greatly enhance the capabilities of SQL queries.
```

## Base Table

In [8]:
%%sql

CREATE OR REPLACE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  department VARCHAR(50),
  salary INT
);

INSERT INTO employees (id, name, department, salary)
VALUES (1, 'John Doe', 'IT', 5000),
       (2, 'Jane Smith', 'HR', 6000),
       (3, 'Mike Johnson', 'IT', 5500),
       (4, 'Emily Davis', 'Finance', 7000),
       (5, 'David Brown', 'IT', 4500);

Count


## Subquery as Column

In [9]:
%%sql

SELECT name, department, salary,
       (SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;

name,department,salary,avg_salary
John Doe,IT,5000,5600.0
Jane Smith,HR,6000,5600.0
Mike Johnson,IT,5500,5600.0
Emily Davis,Finance,7000,5600.0
David Brown,IT,4500,5600.0


## Subquery in WHERE

In [10]:
%%sql

SELECT name, department, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

name,department,salary
Jane Smith,HR,6000
Emily Davis,Finance,7000


## Subquery with EXISTS Operator

In [11]:
%%sql

SELECT name, department
FROM employees e
WHERE EXISTS (
  SELECT 1
  FROM employees
  WHERE department = 'IT' AND salary > e.salary
);

name,department
John Doe,IT
David Brown,IT


## Derived Table

This doesn't appear to work in DuckDB/JupySQL.

In [12]:
%%sql

SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN (
  SELECT AVG(salary) AS avg_salary
  FROM employees
) d;

RuntimeError: If using snippets, you may pass the --with argument explicitly.
For more details please refer: https://jupysql.ploomber.io/en/latest/compose.html#with-argument


Original error message from DB driver:
(duckdb.ParserException) Parser Error: syntax error at or near ";"
LINE 6: ) d;
           ^
[SQL: SELECT e.name, e.department, e.salary, d.avg_salary
FROM employees e
JOIN (
  SELECT AVG(salary) AS avg_salary
  FROM employees
) d;]
(Background on this error at: https://sqlalche.me/e/20/f405)

If you need help solving this issue, send us a message: https://ploomber.io/community


# UNION, INTERSECT, EXCEPT

- The code snippet demonstrates the usage of the `UNION`, `INTERSECT`, and `EXCEPT` operators in SQL.
- Two tables, `table1` and `table2`, are created with similar structures.
- Data is inserted into both tables.
- The `UNION` operator combines the rows from both tables, removing duplicates. The result is a single set of rows with unique values.
- The `INTERSECT` operator returns only the rows that appear in both tables. It returns a set of rows that have common values in both tables.
- The `EXCEPT` operator returns only the rows that appear in the first table but not in the second table. It returns a set of rows that are unique to the first table.
- Each query is executed and the results are printed to demonstrate the behavior of each operator.

Expected Output:
- UNION:
```
id | name
---+------
 1 | John
 2 | Jane
 3 | Alice
 4 | Bob
```
- INTERSECT:
```
id | name
---+------
 2 | Jane
 3 | Alice
```
- EXCEPT:
```
id | name
---+------
 1 | John
```

In [14]:
%%sql

CREATE OR REPLACE TABLE table1 (
    id INT,
    name VARCHAR(50)
);

CREATE OR REPLACE TABLE table2 (
    id INT,
    name VARCHAR(50)
);

INSERT INTO table1 (id, name)
VALUES (1, 'John'),
       (2, 'Jane'),
       (3, 'Alice');

INSERT INTO table2 (id, name)
VALUES (2, 'Jane'),
       (3, 'Alice'),
       (4, 'Bob');

Count


## Union

In [15]:
%%sql

SELECT id, name
FROM table1
UNION
SELECT id, name
FROM table2;

id,name
1,John
2,Jane
3,Alice
4,Bob


## Intersect

In [16]:
%%sql

SELECT id, name
FROM table1
INTERSECT
SELECT id, name
FROM table2;

id,name
2,Jane
3,Alice


## Except

In [17]:
%%sql

SELECT id, name
FROM table1
EXCEPT
SELECT id, name
FROM table2;

id,name
1,John


# ORDER BY (single column)

In SQL, the `ORDER BY` clause is used to sort the result set based on one or more columns. The `ORDER BY` clause can be used with a single column or multiple columns.

In the provided code snippet, we first create a table called `employees` to store employee information. Then, we insert some sample data into the table.

To demonstrate the subtopic "ORDER BY (single column)", we perform several queries:

1. The first query retrieves all employees ordered by their names in ascending order using `ORDER BY name ASC`.
2. The second query retrieves all employees ordered by their ages in descending order using `ORDER BY age DESC`.
3. The third query retrieves employee names and salaries ordered by salary in descending order using `ORDER BY salary DESC`.
4. The fourth query retrieves employee names and ages ordered by age in ascending order, and if there are multiple employees with the same age, it orders them by name in descending order using `ORDER BY age ASC, name DESC`.

Each query is followed by the expected output, which demonstrates the sorting behavior based on the specified column(s).

The `ORDER BY` clause is a powerful tool in SQL that allows us to sort query results in various ways, providing flexibility in result presentation and analysis.

In [26]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, age, salary)
VALUES (1, 'John Doe', 30, 5000),
       (2, 'Jane Smith', 25, 6000),
       (3, 'Mike Johnson', 35, 4500),
       (4, 'Emily Davis', 28, 5500),
       (5, 'David Brown', 32, 5200);

Count


In [27]:
%%sql

SELECT * FROM employees
ORDER BY name ASC;

id,name,age,salary
5,David Brown,32,5200.0
4,Emily Davis,28,5500.0
2,Jane Smith,25,6000.0
1,John Doe,30,5000.0
3,Mike Johnson,35,4500.0


In [28]:
%%sql

SELECT * FROM employees
ORDER BY age DESC;

id,name,age,salary
3,Mike Johnson,35,4500.0
5,David Brown,32,5200.0
1,John Doe,30,5000.0
4,Emily Davis,28,5500.0
2,Jane Smith,25,6000.0


In [29]:
%%sql

SELECT name, salary FROM employees
ORDER BY salary DESC;

name,salary
Jane Smith,6000.0
Emily Davis,5500.0
David Brown,5200.0
John Doe,5000.0
Mike Johnson,4500.0


In [30]:
%%sql

SELECT name, age FROM employees
ORDER BY age ASC, name DESC;

name,age
Jane Smith,25
Emily Davis,28
John Doe,30
David Brown,32
Mike Johnson,35


# SELECT...CASE...WHEN in Queries

In SQL, the `CASE...WHEN` statement allows you to perform conditional logic within a query. It is often used to perform different actions based on different conditions.

In the first example, we have a table called `employees` with columns `id`, `name`, `age`, and `salary`. We use the `CASE...WHEN` statement to calculate the bonus for each employee based on their age. The result is displayed in the `bonus` column.

In the second example, we calculate the salary after applying the bonus for each employee. The result is displayed in the `salary_with_bonus` column.

The `CASE...WHEN` statement works by evaluating each condition in order and returning the corresponding result when a condition is met. If none of the conditions are met, an optional `ELSE` clause can be used to specify a default result.

The `CASE...WHEN` statement can be used in various scenarios, such as data transformation, conditional aggregation, and more. It provides a flexible way to handle complex logic within SQL queries.

In [31]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT,
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, age, salary)
VALUES (1, 'John Doe', 30, 5000),
       (2, 'Jane Smith', 25, 6000),
       (3, 'Mike Johnson', 35, 7000),
       (4, 'Emily Brown', 28, 5500);

Count


In [32]:
%%sql

SELECT name,
       age,
       CASE
           WHEN age < 25 THEN 'No Bonus'
           WHEN age >= 25 AND age < 30 THEN '5% Bonus'
           WHEN age >= 30 AND age < 35 THEN '10% Bonus'
           ELSE '15% Bonus'
       END AS bonus
FROM employees;

name,age,bonus
John Doe,30,10% Bonus
Jane Smith,25,5% Bonus
Mike Johnson,35,15% Bonus
Emily Brown,28,5% Bonus


In [33]:
%%sql
SELECT name,
       salary,
       CASE
           WHEN age < 25 THEN salary
           WHEN age >= 25 AND age < 30 THEN salary * 1.05
           WHEN age >= 30 AND age < 35 THEN salary * 1.10
           ELSE salary * 1.15
       END AS salary_with_bonus
FROM employees;

name,salary,salary_with_bonus
John Doe,5000.0,5500.0
Jane Smith,6000.0,6300.0
Mike Johnson,7000.0,8050.0
Emily Brown,5500.0,5775.0


# IF...ELSE..ENDIF in Queries

This is a procedural concept (used for functions and stored procedures) and not applicable to JupySQl/DuckDB.

# IF EXISTS/IF NOT EXISTS

In [35]:
%%sql

CREATE TABLE IF NOT EXISTS employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT,
    salary DECIMAL(10, 2)
);

Count


In [36]:
%%sql

DROP TABLE IF EXISTS employees;

Success


# Clause Order of SELECT statement

[SELECT [DISTINCT]] [FROM] [JOIN] [WHERE] [ORDER BY] [LIMIT] [JOIN] [GROUP BY] [HAVING]

# Execution Order of SELECT statement

[FROM] [WHERE] [GROUP BY] [HAVING] [SELECT] [DISTINCT] [ORDER BY] [LIMIT/OFFSET]

# Distinct

`DISTINCT` deduplicates result rows.

In [4]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, age, salary)
VALUES (1, 'John Doe', 30, 5000),
       (2, 'Jane Smith', 25, 6000),
       (3, 'Mike Johnson', 35, 4500),
       (4, 'Emily Davis', 28, 5500),
       (5, 'David Brown', 32, 5200),
       (6, 'John Doe', 30, 5500)

Count


In [8]:
%sql SELECT DISTINCT name, age FROM employees;

name,age
John Doe,30
Jane Smith,25
Mike Johnson,35
Emily Davis,28
David Brown,32


# Limit/Offset

Both take a numeric parameter.
`LIMIT` limits the number of rows returned while `OFFSET` skips the first few rows.

In [9]:
%%sql

CREATE OR REPLACE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    salary DECIMAL(10, 2)
);

INSERT INTO employees (id, name, age, salary)
VALUES (1, 'John Doe', 30, 5000),
       (2, 'Jane Smith', 25, 6000),
       (3, 'Mike Johnson', 35, 4500),
       (4, 'Emily Davis', 28, 5500),
       (5, 'David Brown', 32, 5200),
       (6, 'John Doe', 30, 5500)

Count


In [10]:
%sql SELECT * FROM employees LIMIT 2 OFFSET 1;

id,name,age,salary
2,Jane Smith,25,6000.0
3,Mike Johnson,35,4500.0
