# SQL Practice: HR System (Employees & Salaries)
This notebook contains SQL + Python (SQLite + pandas) practice tasks.
Follow the instructions in each task and write your code in the empty code cells.


## Beginner

# 📝 Task 1 — Create a SQLite database `hr.db`
### Instructions:
1. Create a new SQLite database called **`hr.db`**.  
2. Create a table `employees (id, name, department, base_salary)`.  
3. Insert at least 5 employees.  
4. Select all rows to verify data.

In [13]:
import sqlite3
import pandas as pd

employees_data = [
    (1, "Alice",   "HR",          4000),
    (2, "Bob",     "HR",          4200),
    (3, "Charlie", "IT",          5500),
    (4, "Diana",   "IT",          6000),
    (5, "Ethan",   "IT",          5800),
    (6, "Fiona",   "Finance",     5000),
    (7, "George",  "Finance",     5200),
    (8, "Hannah",  "Marketing",   4500),
    (9, "Ivan",    "Marketing",   4700),
    (10, "Julia",  "Sales",       4800),
]


conn = sqlite3.connect('db/hr.db')
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS employees
    (
        id INTEGER PRIMARY KEY,
        name TEXT,
        department TEXT,
        base_salary REAL
    );
    
""")
cursor.executemany(
    "INSERT OR IGNORE INTO employees VALUES (?, ?, ?, ?);",
    employees_data
)
conn.commit()
print(pd.read_sql("SELECT * FROM employees",conn))

conn.close()

   id     name department  base_salary
0   1    Alice         HR       4000.0
1   2      Bob         HR       4200.0
2   3  Charlie         IT       5500.0
3   4    Diana         IT       6000.0
4   5    Ethan         IT       5800.0
5   6    Fiona    Finance       5000.0
6   7   George    Finance       5200.0
7   8   Hannah  Marketing       4500.0
8   9     Ivan  Marketing       4700.0
9  10    Julia      Sales       4800.0


# 📝 Task 2 — Load data into pandas
### Instructions:
1. Write a query `SELECT * FROM employees`.  
2. Load it into a pandas DataFrame.  
3. Print the DataFrame.

In [14]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('db/hr.db')
print(pd.read_sql("SELECT * FROM employees;", conn))
conn.close()

   id     name department  base_salary
0   1    Alice         HR       4000.0
1   2      Bob         HR       4200.0
2   3  Charlie         IT       5500.0
3   4    Diana         IT       6000.0
4   5    Ethan         IT       5800.0
5   6    Fiona    Finance       5000.0
6   7   George    Finance       5200.0
7   8   Hannah  Marketing       4500.0
8   9     Ivan  Marketing       4700.0
9  10    Julia      Sales       4800.0


## Intermediate

# 📝 Task 3 — Create salaries table
### Instructions:
1. Create a table `salaries (id, employee_id, bonus, date)`.  
2. Insert demo salary records.  
3. JOIN `employees` and `salaries` to show `name` and total salary.  
4. Load into pandas.

In [16]:
import sqlite3
import pandas as pd

salary_data = [
    (1, 1, 500.0, "2025-01-15"),   # Alice (HR)
    (2, 2, 600.0, "2025-01-15"),   # Bob (HR)
    (3, 3, 800.0, "2025-02-01"),   # Charlie (IT)
    (4, 4, 1000.0, "2025-02-01"),  # Diana (IT)
    (5, 6, 700.0, "2025-03-01"),   # Fiona (Finance)
    (6, 8, 550.0, "2025-03-15"),   # Hannah (Marketing)
    (7, 10, 650.0, "2025-03-20"),  # Julia (Sales)
]

conn = sqlite3.connect('db/hr.db')
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE IF NOT EXISTS salaries 
    (
        id INTEGER PRIMARY KEY,
        employee_id INTEGER,
        bonus REAL,
        date TEXT,
        CONSTRAINT fk_employee FOREIGN KEY(employee_id)
        REFERENCES employees(id)
    );    
""")
cursor.executemany(
    "INSERT OR IGNORE INTO salaries VALUES (?, ?, ?, ?);",
    salary_data
)
conn.commit()
query = """
    SELECT e.name, e.base_salary+s.bonus AS total_salary
    FROM salaries s
    INNER JOIN employees e ON s.employee_id = e.id
"""
df = pd.read_sql(query, conn)
print(df)
conn.close()

      name  total_salary
0    Alice        4500.0
1      Bob        4800.0
2  Charlie        6300.0
3    Diana        7000.0
4    Fiona        5700.0
5   Hannah        5050.0
6    Julia        5450.0


# 📝 Task 4 — Average salary by department
### Instructions:
1. Use GROUP BY to calculate average salary per department.  
2. Show only departments with average salary > 5000.

In [20]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('db/hr.db')

query = """
    SELECT department, AVG(base_salary) AS avg_salary
    FROM employees
    GROUP BY department 
    HAVING AVG(base_salary) > 5000
"""
df = pd.read_sql(query, conn)
print(df)
conn.close()

  department   avg_salary
0    Finance  5100.000000
1         IT  5766.666667


## Advanced

# 📝 Task 5 — CTE for department averages
### Instructions:
1. Create a CTE to calculate average salary per department.  
2. Select only departments with average salary > 5000.

In [22]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('db/hr.db')

query = """
    WITH dept_avg AS (
    SELECT department, AVG(base_salary) AS avg_salary
    FROM employees
    GROUP BY department 
    )
    SELECT * FROM dept_avg WHERE avg_salary > 5000
"""

df = pd.read_sql(query, conn)
print(df)
conn.close()

  department   avg_salary
0    Finance  5100.000000
1         IT  5766.666667


# 📝 Task 6 — Subquery for above-average salaries
### Instructions:
1. Use a subquery to find employees with total salary above company average.

In [24]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('db/hr.db')

query = """
    SELECT name, base_salary
    FROM employees
    WHERE base_salary > 
    (
        SELECT AVG(base_salary) FROM employees
    )
"""

df = pd.read_sql(query, conn)
print(df)

conn.close()

      name  base_salary
0  Charlie       5500.0
1    Diana       6000.0
2    Ethan       5800.0
3    Fiona       5000.0
4   George       5200.0


# 📝 Task 7 — Index and RANK
### Instructions:
1. Create an index on `employee_id` in `salaries`.  
2. Use RANK() to find the top-3 employees per department.

In [25]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('db/hr.db')
cursor = conn.cursor()
cursor.execute("""
    CREATE INDEX IF NOT EXISTS idx_salaries_employee_id 
    ON salaries(employee_id)
""")
conn.commit()
query = """
WITH ranked AS (
    SELECT 
        department,
        name,
        base_salary,
        RANK() OVER (
            PARTITION BY department
            ORDER BY base_salary DESC
        ) AS rnk
    FROM employees
)
SELECT department, name, base_salary, rnk
FROM ranked
WHERE rnk <= 3;
"""
df = pd.read_sql(query, conn)
print(df)

conn.close()

  department     name  base_salary  rnk
0    Finance   George       5200.0    1
1    Finance    Fiona       5000.0    2
2         HR      Bob       4200.0    1
3         HR    Alice       4000.0    2
4         IT    Diana       6000.0    1
5         IT    Ethan       5800.0    2
6         IT  Charlie       5500.0    3
7  Marketing     Ivan       4700.0    1
8  Marketing   Hannah       4500.0    2
9      Sales    Julia       4800.0    1
