# SQL Interview Questions - Lab

## Introduction

In this lab, we'll test our SQL skills against some real-world interview questions from major companies!

## Objectives

You will be able to:

* Write SQL queries to filter and order results
* Decide and perform whichever type of join is best for retrieving desired data
* Write subqueries to decompose complex queries

## Getting Started

In this lab, we'll see four different interview questions that test your SQL knowledge. We didn't write these questions -- instead, we found them out in the real-world. These are questions that have been used in the past by major technology companies such as Facebook, Amazon, and Twitter. Our goal here isn't to memorize the questions or anything like that -- after all, it's extremely unlikely that these questions are still in use, now that they've become publicly available on the interwebs. Instead, our goal is to treat these questions as if they are the real thing, and give us some insight into the types of questions we'll need to be able to answer in order pass an interview involving SQL. 


### A Note on Answering These Questions

Since these are interview questions, they'll almost always be posed as hypotheticals. This means that you won't have a real database to work with and test your code on. This also means that there are multiple different solutions to any given problem listed here. Be sure to doublecheck the code you write for bugs and errors. It's much harder to write bug-free code when you aren't able to test it against a database!

If these questions seem hard, that's normal. These are real questions that have been reported to online forums from job seekers at major companies. Obviously, it's unlikely that they're still in use at these companies, but they still represent a great way for us to test our skills against the kinds of questions we can expect to be asked in an interview!

## Question 1

From Facebook:

Assume we have a table of employee information, which includes salary information. Write a query to find the names and salaries of the top 5 highest paid employees, in descending order.

In [1]:
# Your code here
import sqlite3
import pandas as pd

# Connect (or create a mock db for testing)
conn = sqlite3.connect(":memory:")  

#  Mock employee table just for testing 
conn.execute("""
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT,
    salary INTEGER
);
""")

# Insert some sample data
conn.executemany("INSERT INTO employees (name, salary) VALUES (?, ?)", [
    ("Alice", 120000),
    ("Bob", 90000),
    ("Charlie", 150000),
    ("Diana", 110000),
    ("Eve", 95000),
    ("Frank", 175000),
    ("Grace", 140000)
])
conn.commit()

#  Query: Top 5 highest salaries 
q1 = """
SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
"""

pd.read_sql(q1, conn)


Unnamed: 0,name,salary
0,Frank,175000
1,Charlie,150000
2,Grace,140000
3,Alice,120000
4,Diana,110000


## Question 2

From Amazon:

Assume we have two SQL tables: `authors` and `books`. The authors table has a few million rows, and looks like this: 

| author_name | book_name |
|:-----------:|:---------:|
|   author_1  |   book_1  |
|   author_1  |   book_2  |
|   author_2  |   book_3  |
|   author_2  |   book_4  |
|   author_2  |   book_5  |
|   author_3  |   book_6  |

The books dataset also has a few million rows, and looks like this:

| book_name | copies_sold |
|:---------:|:-----------:|
|   book_1  |    10000    |
|   book_2  |     2575    |
|   book_3  |    60000    |
|   book_4  |    98000    |
|   book_5  |     5250    |
|   book_6  |    19775    |

Write an SQL query that shows the top 3 authors who sold the most total books. 

In [2]:
# Your code here
import sqlite3
import pandas as pd

# Connect to in-memory database (or use 'books.sqlite' if you want a file)
conn = sqlite3.connect(":memory:")

# Create Authors table
conn.execute("""
CREATE TABLE authors (
    author_name TEXT,
    book_name TEXT
);
""")

# Insert sample data for authors
conn.executemany("""
INSERT INTO authors (author_name, book_name)
VALUES (?, ?)
""", [
    ("author_1", "book_1"),
    ("author_1", "book_2"),
    ("author_2", "book_3"),
    ("author_2", "book_4"),
    ("author_2", "book_5"),
    ("author_3", "book_6"),
])

#  Create Books table 
conn.execute("""
CREATE TABLE books (
    book_name TEXT,
    copies_sold INTEGER
);
""")

# Insert sample data for books
conn.executemany("""
INSERT INTO books (book_name, copies_sold)
VALUES (?, ?)
""", [
    ("book_1", 10000),
    ("book_2", 2575),
    ("book_3", 60000),
    ("book_4", 98000),
    ("book_5", 5250),
    ("book_6", 19775),
])

conn.commit()

# Query: Top 3 authors by total sales
q2 = """
SELECT a.author_name, SUM(b.copies_sold) AS total_sold
FROM authors a
JOIN books b
  ON a.book_name = b.book_name
GROUP BY a.author_name
ORDER BY total_sold DESC
LIMIT 3;
"""

pd.read_sql(q2, conn)


Unnamed: 0,author_name,total_sold
0,author_2,163250
1,author_3,19775
2,author_1,12575


## Question 3

From Amazon:

Assume you have two tables, `customers` and `orders`. Write a SQL query to select all customers who purchased at least 2 items on two separate days. 

In [3]:
# Your code here

import sqlite3
import pandas as pd

# Reuse the same connection (or new one if needed)
conn = sqlite3.connect(":memory:")

#  Customers Table 
conn.execute("""
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    customer_name TEXT
);
""")

# Insert sample data
conn.executemany("""
INSERT INTO customers (customer_id, customer_name)
VALUES (?, ?)
""", [
    (1, "Alice"),
    (2, "Bob"),
    (3, "Charlie"),
    (4, "Diana")
])

#  Orders Table 
conn.execute("""
CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    order_date TEXT,
    items INTEGER,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
""")

# Insert sample orders (customer 1 ordered 2+ items on 2 days)
conn.executemany("""
INSERT INTO orders (order_id, customer_id, order_date, items)
VALUES (?, ?, ?, ?)
""", [
    (1, 1, "2023-01-01", 3),   # Alice
    (2, 1, "2023-01-02", 2),   # Alice
    (3, 2, "2023-01-01", 1),   # Bob
    (4, 2, "2023-01-02", 2),   # Bob (only 1 day >=2 items)
    (5, 3, "2023-01-03", 5),   # Charlie
    (6, 4, "2023-01-03", 2),   # Diana
    (7, 4, "2023-01-04", 2)    # Diana (valid: 2 days)
])

conn.commit()

#  Query 
q3 = """
SELECT c.customer_id, c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.items >= 2
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(DISTINCT o.order_date) >= 2;
"""

pd.read_sql(q3, conn)


Unnamed: 0,customer_id,customer_name
0,1,Alice
1,4,Diana


## Question 4

From Twitter:

A company uses 2 data tables, `Employee` and `Department`, to store data about its employees and departments. 

Table Name: Employee   
Attributes:   
ID Integer,   
NAME String,   
SALARY Integer,   
DEPT_ID Integer   

Table Name: Department   
Attributes:   
DEPT_ID Integer,   
NAME String,   
LOCATION String   

Write a query to print the respective Department Name and number of employees for all departments in the Department table (even unstaffed ones). 

Sort your result in descending order of employees per department; if two or more departments have the same number of employees, then sort those departments alphabetically by Department Name.

In [4]:
# Your code here
# Department table
conn.execute("""
CREATE TABLE Department (
    DEPT_ID INTEGER PRIMARY KEY,
    NAME TEXT,
    LOCATION TEXT
);
""")

# Employee table
conn.execute("""
CREATE TABLE Employee2 (
    ID INTEGER PRIMARY KEY,
    NAME TEXT,
    SALARY INTEGER,
    DEPT_ID INTEGER,
    FOREIGN KEY (DEPT_ID) REFERENCES Department(DEPT_ID)
);
""")

# Insert sample departments
conn.executemany("""
INSERT INTO Department (DEPT_ID, NAME, LOCATION) VALUES (?, ?, ?)
""", [
    (1, "HR", "New York"),
    (2, "Engineering", "San Francisco"),
    (3, "Marketing", "Chicago"),
    (4, "Finance", "Boston")
])

# Insert sample employees
conn.executemany("""
INSERT INTO Employee2 (ID, NAME, SALARY, DEPT_ID) VALUES (?, ?, ?, ?)
""", [
    (1, "Alice", 70000, 2),
    (2, "Bob", 80000, 2),
    (3, "Charlie", 50000, 1),
    (4, "Diana", 60000, 3)
])

q4 = """
SELECT d.NAME AS DepartmentName,
       COUNT(e.ID) AS NumEmployees
FROM Department d
LEFT JOIN Employee2 e
       ON d.DEPT_ID = e.DEPT_ID
GROUP BY d.NAME
ORDER BY NumEmployees DESC, DepartmentName ASC;
"""
print("Q4: Department and Employee Counts")
print(pd.read_sql(q4, conn), "\n")

Q4: Department and Employee Counts
  DepartmentName  NumEmployees
0    Engineering             2
1             HR             1
2      Marketing             1
3        Finance             0 



## Summary

In this lab, we tested our knowledge of SQL queries against some real-world interview questions!