## Exam DS201: Data Management in SQL; Data Management, Modeling, and Programming in R or Python

# 1.1 Perform data extraction, joining and aggregation tasks

* Aggregate numeric, categorical variables and dates by groups using PostgreSQL.

In [None]:
SELECT
  category,
  AVG(value) AS average_value,
  MAX(value) AS maximum_value,
  MIN(value) AS minimum_value,
  COUNT(*) AS total_count
FROM
  table_name
GROUP BY
  category;


In [None]:
SELECT
    column1,       -- Categorical variable 1
    column2,       -- Categorical variable 2
    MAX(numeric_column) as max_numeric,   -- Aggregate function (e.g., MAX, MIN, SUM, AVG)
    MIN(numeric_column) as min_numeric,
    AVG(numeric_column) as avg_numeric,
    COUNT(*) as count_rows,              -- Count of rows in each group
    COUNT(DISTINCT some_column) as count_distinct,  -- Count of distinct values in a column
    DATE_TRUNC('day', date_column) as truncated_date,  -- Truncate the date to the day level
    EXTRACT(YEAR FROM date_column) as year   -- Extract the year from the date

FROM
    your_table_name

GROUP BY
    column1, column2, DATE_TRUNC('day', date_column), EXTRACT(YEAR FROM date_column)

ORDER BY
    column1, column2, DATE_TRUNC('day', date_column), EXTRACT(YEAR FROM date_column);


* Interpret a database schema and combine multiple tables by rows or columns using
PostgreSQL.

1. **Combining Tables by Rows (Union)**:
   When you want to stack the rows of multiple tables on top of each other, you can use the `UNION` or `UNION ALL` operator. The difference between them is that `UNION` removes duplicate rows, while `UNION ALL` includes all rows, even if they are duplicates.

   For example, if you have two tables with similar structures:

   ```sql
   -- Table 1: employees
   employee_id | first_name | last_name | department
   -------------------------------------------------
   1           | John       | Doe       | HR
   2           | Jane       | Smith     | IT

   -- Table 2: contractors
   contractor_id | first_name | last_name | project
   -----------------------------------------------
   101           | Mike       | Johnson   | Web Dev
   102           | Lisa       | Brown     | Data Analysis
   ```

   You can combine them using `UNION` like this:

   ```sql
   SELECT employee_id, first_name, last_name, department, NULL AS contractor_id, NULL AS project
   FROM employees
   UNION
   SELECT NULL, first_name, last_name, NULL, contractor_id, project
   FROM contractors;
   ```

2. **Combining Tables by Columns (Join)**:
   When you want to combine data from multiple tables based on a common column or key, you can use `JOIN` clauses. There are different types of joins, such as `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, and `FULL JOIN`, which control how the rows from both tables are matched.

   For example, if you have two tables with a common column `department_id`:

   ```sql
   -- Table 1: employees
   employee_id | first_name | last_name | department_id
   ----------------------------------------------------
   1           | John       | Doe       | 1001
   2           | Jane       | Smith     | 1002

   -- Table 2: departments
   department_id | department_name
   ------------------------------
   1001          | HR
   1002          | IT
   ```

   You can combine them using an `INNER JOIN` like this:

   ```sql
   SELECT employee_id, first_name, last_name, department_name
   FROM employees
   INNER JOIN departments ON employees.department_id = departments.department_id;
   ```

   The result will be:

   ```plaintext
   employee_id | first_name | last_name | department_name
   ------------------------------------------------------
   1           | John       | Doe       | HR
   2           | Jane       | Smith     | IT
   ```

These are just basic examples, and you can perform more complex operations depending on your specific schema and requirements. Understanding the relationships between tables and the purpose of combining them will help you choose the appropriate method (Union or Join) and write effective SQL queries in PostgreSQL.

* Extract data based on different conditions using PostgreSQL.

To extract data based on different conditions using PostgreSQL, you can use the `SELECT` statement along with various clauses like `WHERE`, `AND`, `OR`, `IN`, `BETWEEN`, and more. These clauses allow you to filter and retrieve specific data from the database that matches the given conditions.

Here are some common examples of extracting data based on different conditions:

1. **Basic Condition using WHERE**:
   This is the most common way to extract data based on a specific condition.

   ```sql
   SELECT column1, column2, ...
   FROM your_table_name
   WHERE condition;
   ```

   For example, if you have a table called "employees" and you want to retrieve all the employees whose department is "HR":

   ```sql
   SELECT employee_id, first_name, last_name
   FROM employees
   WHERE department = 'HR';
   ```

2. **Multiple Conditions using AND and OR**:
   You can combine multiple conditions using `AND` and `OR` operators.

   ```sql
   SELECT column1, column2, ...
   FROM your_table_name
   WHERE condition1 AND condition2;
   ```

   For example, if you want to retrieve employees whose department is "HR" and their salary is greater than $50,000:

   ```sql
   SELECT employee_id, first_name, last_name
   FROM employees
   WHERE department = 'HR' AND salary > 50000;
   ```

3. **Matching Values in a List using IN**:
   You can use the `IN` operator to retrieve rows where a column's value matches any of the specified values in a list.

   ```sql
   SELECT column1, column2, ...
   FROM your_table_name
   WHERE column1 IN (value1, value2, ...);
   ```

   For example, to get employees from specific departments:

   ```sql
   SELECT employee_id, first_name, last_name
   FROM employees
   WHERE department IN ('HR', 'IT', 'Marketing');
   ```

4. **Range of Values using BETWEEN**:
   The `BETWEEN` operator allows you to retrieve rows where a column's value falls within a specified range.

   ```sql
   SELECT column1, column2, ...
   FROM your_table_name
   WHERE column1 BETWEEN value1 AND value2;
   ```

   For example, to get employees whose salary is between $40,000 and $60,000:

   ```sql
   SELECT employee_id, first_name, last_name
   FROM employees
   WHERE salary BETWEEN 40000 AND 60000;
   ```

5. **Pattern Matching using LIKE**:
   The `LIKE` operator allows you to retrieve rows where a column's value matches a specific pattern.

   ```sql
   SELECT column1, column2, ...
   FROM your_table_name
   WHERE column1 LIKE 'pattern';
   ```

   For example, to retrieve employees with a last name starting with "Smi":

   ```sql
   SELECT employee_id, first_name, last_name
   FROM employees
   WHERE last_name LIKE 'Smi%';
   ```

* Use subqueries to reference a second table (e.g. a different table, an aggregated table)
within a query in PostgreSQL

In PostgreSQL, you can use subqueries to reference a second table (or a different table) or an aggregated table within a query. Subqueries allow you to nest one query inside another, and they can be used in the `SELECT`, `FROM`, `WHERE`, `HAVING`, and `JOIN` clauses.


1. **Subquery in the SELECT Clause**:
   You can use a subquery in the `SELECT` clause to retrieve data from a second table and include it as a column in the main query.

   For example, let's say you have two tables, "employees" and "departments," and you want to retrieve the department name for each employee:

   ```sql
   SELECT employee_id, first_name, last_name, (SELECT department_name FROM departments WHERE departments.department_id = employees.department_id) AS department
   FROM employees;
   ```

   The subquery `(SELECT department_name FROM departments WHERE departments.department_id = employees.department_id)` fetches the corresponding department name for each employee based on the `department_id` in the "departments" table.

2. **Subquery in the FROM Clause**:
   You can use a subquery in the `FROM` clause to create a derived table that acts as a temporary table for the main query.

   For example, if you want to get the total number of employees in each department:

   ```sql
   SELECT department_id, department_name, total_employees
   FROM (
       SELECT department_id, COUNT(*) AS total_employees
       FROM employees
       GROUP BY department_id
   ) AS employee_counts
   JOIN departments ON departments.department_id = employee_counts.department_id;
   ```

   Here, the subquery `(SELECT department_id, COUNT(*) AS total_employees FROM employees GROUP BY department_id)` calculates the total number of employees in each department, and the main query then joins this derived table with the "departments" table to get the department names.

3. **Subquery in the WHERE Clause**:
   You can use a subquery in the `WHERE` clause to filter rows based on a condition from a different table.

   For example, if you want to retrieve employees who work in departments with more than 50 employees:

   ```sql
   SELECT employee_id, first_name, last_name, department_id
   FROM employees
   WHERE department_id IN (SELECT department_id FROM employees GROUP BY department_id HAVING COUNT(*) > 50);
   ```

   The subquery `(SELECT department_id FROM employees GROUP BY department_id HAVING COUNT(*) > 50)` identifies the department IDs with more than 50 employees, and the main query filters the employees based on those departments.
