Welcome to my SQL learning repository. This is a personal space where I practice and organize everything I learn about SQL, especially advanced topics. I use simple examples, real-world projects, and clear comments in the code to help me understand each concept step by step. My goal is to improve my SQL skills, learn how to write better queries. You will find topics like window functions, subqueries, joins, functions, triggers, and more.
-
🛢️ Game Store Project
-
🧩 Window Functions
-
🧩 Subqueries and CTEs
-
🧩 CTE - Common Table Expressions
-
🧩 Set Operations
-
🧩 Function and Procedure Structure
-
🧩 Trigger
-
🧩 Conditional Logic
-
🧩 Text and Cleaning Functions
-
🧩 Query Performance
-
🧩 Views
-
📊 Analysis
-
🛠️ Tools Used
-
🎯 My Goal
-
🤝 Contributing
- Data with Baraa
- SQL for Data Analysis - Book
- Technology websites
The Game Store Project is a small database I created to help me learn SQL in a practical way. It simulates a simple video game store, with tables for customers, employees, products, purchases, and categories. I use this project to practice creating tables, writing queries, using foreign keys, and analyzing data like sales and customer behavior. This project helps me apply what I learn in real situations and makes studying more interesting.
Customer
– People who buy gamesAddress
– Customer addressEmployee
– Store workersPurchase
– Orders from customersPurchase_item
– Items in each orderProduct
– Games in the storeCategory
– Type of game (Action, Adventure, etc.)
You can find the SQL code in the Game Store folder
A window function is a special SQL function. It does a calculation across rows, but it keeps all the rows in the result.
It does not reduce the number of rows like GROUP BY
does.
Use a window function when you want to:
- Give a row number to each row →
ROW_NUMBER()
- Find the rank of each row →
RANK()
,DENSE_RANK()
,PERCENT_RANK()
- Find the relative position inside a group →
CUME_DIST()
,NTILE()
- Compare a row to the next or previous →
LEAD()
,LAG()
- Get the first or last value in a group →
FIRST_VALUE()
,LAST_VALUE()
- Make a running total →
SUM() OVER(...)
- Find a running average →
AVG() OVER(...)
- Get the highest or lowest value in a group →
MAX() OVER(...)
,MIN() OVER(...)
- Count how many rows in a group → COUNT() OVER(...)
- Use aggregate functions but keep row details → any function with
OVER(...)
SELECT
purchase_id,
purchase_date,
-- Previous and next values
LAG(total_price, 2, 0) OVER (ORDER BY purchase_date, purchase_id) AS total_2_rows_before,
total_price AS total_per_purchase,
LEAD(total_price, 1, 0) OVER (ORDER BY purchase_date, purchase_id) AS total_1_row_after,
-- Total per date
SUM(total_price) OVER (PARTITION BY purchase_date) AS total_per_day,
ROW_NUMBER() OVER (PARTITION BY purchase_date ORDER BY total_price DESC) AS rank_by_daily_total,
-- Cumulative total up to the current row
SUM(total_price) OVER (
ORDER BY purchase_date, purchase_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_total,
-- Grand total
SUM(total_price) OVER () AS grand_total
FROM purchase
WHERE purchase_date >= '2024-01-01'
ORDER BY purchase_date, purchase_id
LIMIT 1000;
🔎 Find the code here
🧑💻 Lear more about in Data with Baraa
Subqueries are queries written inside other queries. They work like nested boxes - the inner query (subquery) runs first and provides results to the outer query (main query).
SELECT column_name
FROM table_name
WHERE column_name > (SELECT AVG(column_name) FROM table_name);
- Return one result (number, text, date)
- Used with operators: =, >, <, >=, <=, <>
- Example: Find products above average price
- Return multiple results
- Used with: IN, ANY, ALL, EXISTS
- Example: Find customers in cities with stores
Location | Purpose | Example Use |
---|---|---|
WHERE | Filter data | Find records above average |
SELECT | Add calculated columns | Show percentage of total |
FROM | Use results as temp table | Complex data combinations |
- Subqueries execute before the main query
- Can be nested multiple levels deep
- May impact performance with large datasets
- Often replaceable with JOINs for better speed
- Useful for breaking complex problems into steps
- Keep subqueries simple when possible
- Consider JOIN alternatives for performance
- Test with small data sets first
- Use meaningful aliases for readability
🔎 Find the code here
🧑💻 Lear more about in Data with Baraa
CTEs stands for "Common Table Expressions." They are a useful feature in SQL databases that let you create temporary named result sets within a query. Used to simplify writing complex queries, it helps break down logic into smaller parts making them easier to understand.
Basic Concept
WITH cte_name AS (
SELECT column1, column2
FROM table_name
)
SELECT *
FROM cte_name;
- Non-Recursive CTE: Used to structure and simplify queries.
- Recursive CTE: Used for hierarchical or tree-structured data.
- Nested CTE: When a CTE dependend on the other CTE
Purpose | Example Use |
---|---|
Simplify complex queries | Break down into readable steps |
Reuse a result set | Join same result multiple times |
Recursive queries | Hierarchical data like org charts, trees. |
Temporary aggregations | Pre-compute totals, counts, rankings |
- CTE is temporary, valid only for that query
- Defined with WITH keyword before the main query
- Can use multiple CTEs, separated by commas
- Recursive CTEs must include UNION ALL
- CTEs often replace subqueries or derived tables
- Rethink and refactor your CTEs before starting a new one.
- Don't use more than 5 CTEs in one query; otherside, your code will be hard to understand and maintain.
- Favor CTEs over deeply nested subqueries.
- Test performance with large datasets.
- Use descriptive names for CTEs.
We don’t use the ORDER BY clause directly in CTEs unless it's combined with a TOP or LIMIT clause, or you're using it inside a subquery that supports ordering.
🔎 Find the code here
🧑💻 Learn more about it in Data with Baraa
SQL set operations allow you to combine results from multiple SELECT statements into a single result set. These operations work similarly to mathematical set operations and are used to merge data from different queries
- UNION
- Combines results from two or more SELECT statements
- Removes duplicate rows
- Syntax:
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
- UNION ALL
- Combines results from multiple SELECT statements
- Includes all rows, including duplicates
- More efficient than UNION if duplicates are acceptable
- Syntax:
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;
- INTERSECT
- Returns only rows that appear in both result sets.
- Removes duplicates.
- Syntax:
SELECT column1, column2 FROM table1
INTERSECT
SELECT column1, column2 FROM table2;
- EXCEPT (or MINUS in some databases)
- Returns rows from the first query that aren't in the second query.
- Removes duplicates.
- Syntax:
SELECT column1, column2 FROM table1
EXCEPT
SELECT column1, column2 FROM table2;
- All SELECT statements must have the same number of columns
- Corresponding columns must have compatible data types
- Column names in the result set are taken from the first query
🔎 Find the code here
Functions and procedures are both database objects that contain reusable SQL code, but they have important differences in how they work and when to use them.
- Must return a value (can be scalar or table-valued)
- Can be used in SQL statements (SELECT, WHERE, HAVING, etc.)
- Cannot modify database state (with some exceptions)
- Run within the calling transaction
- No transaction control (cannot COMMIT/ROLLBACK)
- Syntax:
CREATE OR REPLACE FUNCTION function_name()
RETURNS return_type AS $$
BEGIN
-- function logic here
RETURN ...;
END;
$$ LANGUAGE plpgslq;
- Return a single value or table
- Use the result in a larger SQL query
- Need read-only operations
- Don't return values (but can have OUT parameters)
- Can modify database state
- Can control transactions (COMMIT/ROLLBACK allowed)
- Called with EXECUTE or CALL
- Cannot be used directly in SQL statements
- Syntax:
CREATE OR REPLACE PROCEDURE procedure_name(parameters)
AS $$
BEGIN
-- Procedure logic
END;
$$ LANGUAGE plpgsql;
- Perform data modification with transaction control
- Execute multiple SQL statements as a unit
- Don't need to use the result in a SQL statement
Feature | Function | Procedure |
---|---|---|
Return value | Required | Optional |
SQL statement use | Yes | No |
Transaction control | No | Yes |
Calling syntax | SELECT func() | CALL proc() |
Exception handling | Yes | Yes |
🔎 Find the code here
A trigger is a database object that automatically executes (or "fires") in response to specific events on a table or view, such as INSERT
, UPDATE
, DELETE
, or TRUNCATE
. Triggers are used to enforce business rules, maintain data integrity, audit changes, and automate tasks.
Triggers are powerful tools for automating database actions, but they should be used carefully to avoid performance issues and maintainability problems.
- Based on Timing
BEFORE
: Execute before the triggering event (e.g., before a row is inserted).AFTER
: Execute after the triggering event (e.g., after a row is updated)INTEAD OF
: Used with views to modify data when direct DML operations are not possible.
- Based on Event
INSERT
: Fires when a new row is inserted.UPDATE
: Fires when a row is modified.DELETE
:Fires when a row is deleted.TRUNCATE
: Fires when a table is truncated (not supported in all databases)
Trigger Syntax (PostgreSQL Example)
CREATE OR REPLACE TRIGGER trigger_name
{BEFORE | AFTER | INSTEAD OF} {INSERT | UPDATE | DELETE | TRUNCATE}
ON table_name
[FOR EACH ROW | FOR EACH STATEMENT]
[WHEN (condition)]
EXECUTE FUNCTION trigger_function();
trigger_name
: Name of the trigger.BEFORE/AFTER/INSTEAD OF
: When the trigger executes.INSERT/UPDATE/DELETE/TRUNCATE
: The event that fires the trigger.FOR EACH ROW
: Executes once per affected row.FOR EACH STATEMENT
: Executes once per SQL statement (default).WHEN (condition)
: Optional condition to control trigger execution.EXECUTE FUNCTION
Calls a function that contains the trigger logic.
🔎 Find the code here
Conditional logic allows you to control the flow of your SQL queries based on specified conditions. PostgreSQL offers several constructs for this:
- CASE Statement
-
The
CASE
expression allows conditional logic similar to if-then-else. -
Syntax:
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE resultN
END
- COALESCE
- Returns the first
non-null
value in the list. - Syntax:
COALESCE(valor1, valor2, ..., valorN)
- NULLIF
- Returns
NULL
if the two arguments are equal; otherwise returns the first argument. - Syntax:
NULLIF(valor1, valor2)
- IF-THEN-ELSE
- Inside functions or stored procedures, you can use
IF-THEN-ELSE
logic. - Syntax:
IF condição THEN
-- comandos
ELSIF outra_condição THEN
-- comandos
ELSE
-- comandos
END IF;
- Boolean Expressions
- Used directly in WHERE, JOIN, ON, etc.
- Syntax:
SELECT ...
FROM tabela
WHERE condição_booleana;
🔎 Find the code here
Text and cleaning functions in PostgreSQL help standardize and prepare string data for querying, analysis, and reporting. These functions are especially useful when working with product names, customer records, or address fields.
Function | Description | Example |
---|---|---|
UPPER(text) |
Converts text to uppercase | UPPER('gamepad') → 'GAMEPAD' |
LOWER(text) |
Converts text to lowercase | LOWER('GAMER') → 'gamer' |
INITCAP(text) |
Capitalizes the first letter of each word | INITCAP('john doe') → 'John Doe' |
TRIM(text) |
Removes spaces from both ends | TRIM(' game ') → 'game' |
REPLACE(text, from, to) |
Replaces all occurrences of a substring | REPLACE('XBOX_ONE', '_', ' ') → 'XBOX ONE' |
SUBSTRING(text, from, len) |
Extracts a substring | SUBSTRING('controller', 1, 4) → 'cont' |
SPLIT_PART(text, delimiter, n) |
Returns the n-th part of a split string | SPLIT_PART('PS5-DUALSENSE', '-', 2) → 'DUALSENSE' |
LENGTH(text) |
Returns length of string | LENGTH('keyboard') → 8 |
REGEXP_REPLACE |
Replace using regular expressions | REGEXP_REPLACE('a1b2', '[0-9]', '', 'g') → 'ab' |
REGEXP_MATCHES |
Search using regular expressions | REGEXP_MATCHES('abc123', '\d+') |
POSITION(sub IN text) |
Find position of a substring | POSITION('b' IN 'abc') → 2 |
🔎 Find the code here
SQL query optimization is the process of improving the efficiency of SQL queries to achieve faster execution times and reduce resource consumption, thereby enhancing overall database performance. This is crucial for ensuring applications run smoothly, especially as data volumes grow and queries become more complex
- Faster Query Execution: Optimized queries retrieve and manipulate data more quickly, improving application responsiveness and user experience.
- Reduced Resource Usage: Efficient queries minimize CPU, memory, and I/O load, enabling the database to handle more concurrent users.
- Cost Savings and Scalability: Optimized queries reduce infrastructure costs and ensure performance remains stable as data grows.
- Improved Productivity: Faster queries allow developers and analysts to work more effectively without long wait times
Creating indexes on columns frequently used in WHERE
, JOIN
, ORDER BY
, and GROUP BY
clauses drastically reduces data scanning time. However, over-indexing can slow down write operations and consume extra storage, so indexes should be applied judiciously on critical columns.
CREATE INDEX nome_do_indice
ON nome_da_tabela (coluna1 [, coluna2, ...]);
Avoid using SELECT *
; instead, specify only the necessary columns. This reduces the volume of data processed and transferred, speeding up query execution.
Choosing the right type of join (e.g., INNER JOIN
vs. LEFT JOIN
) based on the data retrieval needs can improve performance. INNER JOIN
s are generally faster when only matching rows are needed.
Techniques such as predicate pushdown (applying filters early), subquery unnesting (replacing subqueries with joins), join elimination, and expression simplification help the query optimizer generate better execution plans.
Analyzing execution plans (e.g., via EXPLAIN
) helps identify bottlenecks and inefficiencies in queries, guiding targeted optimizations.
EXPLAIN
SELECT coluna1, coluna2
FROM tabela
WHERE condição;
Precompiled stored procedures reduce parsing and compilation overhead for frequently executed queries, improving performance.
Proper normalization reduces data redundancy, while strategic denormalization can speed up complex queries. Table partitioning can also help by limiting data access to relevant partitions.
Using clauses like LIMIT
or TOP
restricts the number of rows returned, which is especially useful for large datasets and improves response times.
Keeping database statistics current ensures the query optimizer makes informed decisions, avoiding suboptimal execution plans.
Caching frequent query results reduces database load and speeds up response times
🧑💻 Learn more about it in DataCamp, AccelData, SeveralNines
Views are virtual tables in SQL that don't store data themselves but instead display data from one or more underlying tables. They act as saved SQL queries that you can reference like regular tables.
- Virtual Tables: Views don't physically store data (except materialized views)
- Dynamic: Data is fetched from base tables when the view is queried
- Customizable: Can show selected columns, calculated fields, or filtered data
- Security Layer: Can restrict access to sensitive data
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
CREATE OR REPLACE VIEW view_name AS
SELECT new_columns...
FROM tables...
WHERE new_conditions...;
DROP VIEW view_name;
🔎 Find the code here
🧑💻 Learn more about it in Data with Baraa
This project aims to build a Data Warehouse using PostgreSQL to support sales analysis for a video game store. The architecture follows an ETL pipeline and a Star Schema dimensional model, allowing data visualization through BI tools like Power BI or Tableau.
- Identify the best-selling products
- Evaluate shipping performance (delivery time, delays)
- Understand customer behavior
- Explore regional sales performance (region)
- Bronze: raw - normalized relational database.
- Silver: Extract, Transform and Load the data into the data warehouse.
- Gold Layer: Star Schema with fact and dimension tables.
- Analisy the data and Ask the question.
- BI: Dashboard built with Power BI / Tableau.
Explore the SQL scripts here
- PostgreSQL – SQL database
- DBeaver / pgAdmin – Tools to manage the database
- VS Code – Editor for SQL code
I want to:
- Practice English and SQL
- Create small but real projects
- Learn step by step with fun ideas
This is a study project.
But you can give ideas or tips.
Feel free to open an issue!