# SQL Interview Questions - Lab

## Introduction

In this lab, we'll test our SQL skills against some real-world interview questions from major companies!

## Objectives

You will be able to:

* Write SQL queries to filter and order results
* Decide and perform whichever type of join is best for retrieving desired data
* Write subqueries to decompose complex queries

## Getting Started

In this lab, we'll see four different interview questions that test your SQL knowledge. We didn't write these questions -- instead, we found them out in the real-world. These are questions that have been used in the past by major technology companies such as Facebook, Amazon, and Twitter. Our goal here isn't to memorize the questions or anything like that -- after all, it's extremely unlikely that these questions are still in use, now that they've become publicly available on the interwebs. Instead, our goal is to treat these questions as if they are the real thing, and give us some insight into the types of questions we'll need to be able to answer in order pass an interview involving SQL. 


### A Note on Answering These Questions

Since these are interview questions, they'll almost always be posed as hypotheticals. This means that you won't have a real database to work with and test your code on. This also means that there are multiple different solutions to any given problem listed here. Be sure to doublecheck the code you write for bugs and errors. It's much harder to write bug-free code when you aren't able to test it against a database!

If these questions seem hard, that's normal. These are real questions that have been reported to online forums from job seekers at major companies. Obviously, it's unlikely that they're still in use at these companies, but they still represent a great way for us to test our skills against the kinds of questions we can expect to be asked in an interview!

## Question 1

From Facebook:

Assume we have a table of employee information, which includes salary information. Write a query to find the names and salaries of the top 5 highest paid employees, in descending order.

In [2]:
# importing libraries

import sqlite3
import pandas as pd

# Creating an in-memory SQLite database
conn = sqlite3.connect(":memory:") 
cursor = conn.cursor()

In [3]:
# Your code here

cursor.execute("""
CREATE TABLE employee (
    id INTEGER PRIMARY KEY,
    name TEXT,
    salary INTEGER,
    dept_id INTEGER
);
""")

# Insert sample employee data
cursor.executemany("""
INSERT INTO employee (name, salary, dept_id) VALUES (?, ?, ?);
""", [
    ("Charity", 95000, 1),
    ("Allan", 120000, 2),
    ("Charles", 78000, 1),
    ("Ivet", 135000, 3),
    ("Viola", 110000, 2),
    ("Frank", 125000, 1)
])

conn.commit()

In [4]:
# Top 5 HighestPaid Employees
query = """
SELECT name, salary
FROM employee
ORDER BY salary DESC
LIMIT 5;
"""
pd.read_sql_query(query, conn)

Unnamed: 0,name,salary
0,Ivet,135000
1,Frank,125000
2,Allan,120000
3,Viola,110000
4,Charity,95000


## Question 2

From Amazon:

Assume we have two SQL tables: `authors` and `books`. The authors table has a few million rows, and looks like this: 

| author_name | book_name |
|:-----------:|:---------:|
|   author_1  |   book_1  |
|   author_1  |   book_2  |
|   author_2  |   book_3  |
|   author_2  |   book_4  |
|   author_2  |   book_5  |
|   author_3  |   book_6  |

The books dataset also has a few million rows, and looks like this:

| book_name | copies_sold |
|:---------:|:-----------:|
|   book_1  |    10000    |
|   book_2  |     2575    |
|   book_3  |    60000    |
|   book_4  |    98000    |
|   book_5  |     5250    |
|   book_6  |    19775    |

Write an SQL query that shows the top 3 authors who sold the most total books. 

In [6]:
# Your code here

cursor.execute("""
CREATE TABLE authors (
    author_name TEXT,
    book_name TEXT
);
""")

cursor.executemany("""
INSERT INTO authors VALUES (?, ?);
""", [
    ("Author_1", "Book_1"),
    ("Author_1", "Book_2"),
    ("Author_2", "Book_3"),
    ("Author_2", "Book_4"),
    ("Author_2", "Book_5"),
    ("Author_3", "Book_6"),
])

cursor.execute("""
CREATE TABLE books (
    book_name TEXT,
    copies_sold INTEGER
);
""")

cursor.executemany("""
INSERT INTO books VALUES (?, ?);
""", [
    ("Book_1", 10000),
    ("Book_2", 2575),
    ("Book_3", 60000),
    ("Book_4", 98000),
    ("Book_5", 5250),
    ("Book_6", 19775),
])

conn.commit()

In [7]:
#Top 3 Authors with Highest Book Sales
query = """
SELECT a.author_name, SUM(b.copies_sold) AS total_sales
FROM authors a
JOIN books b ON a.book_name = b.book_name
GROUP BY a.author_name
ORDER BY total_sales DESC
LIMIT 3;
"""
pd.read_sql_query(query, conn)

Unnamed: 0,author_name,total_sales
0,Author_2,163250
1,Author_3,19775
2,Author_1,12575


## Question 3

From Amazon:

Assume you have two tables, `customers` and `orders`. Write a SQL query to select all customers who purchased at least 2 items on two separate days. 

In [9]:
# Your code here

# Creating the 'orders' table
create_table_query = """
CREATE TABLE IF NOT EXISTS orders (
    order_id INTEGER,
    customer_id INTEGER,
    order_date DATE,
    item_count INTEGER
);
"""
conn.execute(create_table_query)
conn.commit()

In [10]:
# Inserting sample data
insert_data_query = """
INSERT INTO orders (order_id, customer_id, order_date, item_count) VALUES
(1, 101, '2024-03-01', 2),
(2, 101, '2024-03-05', 1),
(3, 102, '2024-03-01', 3),
(4, 102, '2024-03-02', 2),
(5, 103, '2024-03-02', 1),
(6, 103, '2024-03-05', 2),
(7, 101, '2024-03-06', 2);
"""
conn.execute(insert_data_query)
conn.commit()

In [11]:
# Customers Who Purchased At Least 2 Items on Two Separate Days
query = """
SELECT customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(DISTINCT order_date) >= 2
AND SUM(CASE WHEN item_count >= 2 THEN 1 ELSE 0 END) >= 2;
"""

customers = pd.read_sql_query(query, conn)
print(customers)

   customer_id
0          101
1          102


## Question 4

From Twitter:

A company uses 2 data tables, `Employee` and `Department`, to store data about its employees and departments. 

Table Name: Employee   
Attributes:   
ID Integer,   
NAME String,   
SALARY Integer,   
DEPT_ID Integer   

Table Name: Department   
Attributes:   
DEPT_ID Integer,   
NAME String,   
LOCATION String   

Write a query to print the respective Department Name and number of employees for all departments in the Department table (even unstaffed ones). 

Sort your result in descending order of employees per department; if two or more departments have the same number of employees, then sort those departments alphabetically by Department Name.

In [13]:
# Your code here

cursor.execute("""
CREATE TABLE department (
    dept_id INTEGER PRIMARY KEY,
    name TEXT,
    location TEXT
);
""")

cursor.executemany("""
INSERT INTO department VALUES (?, ?, ?);
""", [
    (1, "Engineering", "New York"),
    (2, "Marketing", "San Francisco"),
    (3, "Sales", "Chicago"),
    (4, "HR", "Los Angeles")  # No employees in this department
])

conn.commit()

In [14]:
query = """
SELECT d.name AS department_name, 
       COUNT(e.id) AS employee_count
FROM department d
LEFT JOIN employee e ON d.dept_id = e.dept_id
GROUP BY d.name
ORDER BY employee_count DESC, department_name ASC;
"""
pd.read_sql_query(query, conn)


Unnamed: 0,department_name,employee_count
0,Engineering,3
1,Marketing,2
2,Sales,1
3,HR,0


## Summary

In this lab, we tested our knowledge of SQL queries against some real-world interview questions!