# Populating a Relational Database
## DSA Interview Questions
In this example we will build and populate a SQLite database with DSA interview questions.

## Creating the initial database
Let's create our initial tables. First, we will populate the smaller question type and question level tables. Then, we will populate the larger *question* table.

In [4]:
import pandas as pd
import pyarrow as pa
import sqlite3
import os

if os.path.exists("dsa.db"):
    os.remove("dsa.db")

conn = sqlite3.connect("dsa.db")

In [5]:
# Create a question type table
conn.execute('''
CREATE TABLE IF NOT EXISTS question_type (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
)
'''
)

# Populate it with the following: Binary Search, Graph, Two Pointers, Dynamic Programming
conn.execute('''
INSERT INTO question_type (name) VALUES
    ('Binary Search'),
    ('Graph'),
    ('Two Pointers'),
    ('Dynamic Programming')
'''
)

# Create a question level table
conn.execute('''
CREATE TABLE IF NOT EXISTS question_level (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
)
'''
)

# Populate it with the following: Easy, Medium, Hard
conn.execute('''
INSERT INTO question_level (name) VALUES
    ('Easy'),
    ('Medium'),
    ('Hard')
'''
)

# Create a questions table
conn.execute('''
CREATE TABLE IF NOT EXISTS questions (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    link TEXT NOT NULL,
    question_type_id INTEGER,
    question_level_id INTEGER,
    FOREIGN KEY (question_type_id) REFERENCES question_type(id),
    FOREIGN KEY (question_level_id) REFERENCES question_level(id)
)
'''
)

conn.commit()

# Prepare the input to SwellDB
Since we need to populate the questions table, we need to create all the possible combinations of question types and levels.

In [6]:
# Get all combinations
combos = """
SELECT 
  qt.id as question_type_id,
  qt.name as question_type_name,
  ql.id as question_level_id,
  ql.name as question_level_name
FROM question_type as qt
CROSS JOIN question_level as ql
"""

In [7]:
combos = pd.read_sql(combos, conn)
combos

Unnamed: 0,question_type_id,question_type_name,question_level_id,question_level_name
0,1,Binary Search,1,Easy
1,1,Binary Search,2,Medium
2,1,Binary Search,3,Hard
3,2,Graph,1,Easy
4,2,Graph,2,Medium
5,2,Graph,3,Hard
6,3,Two Pointers,1,Easy
7,3,Two Pointers,2,Medium
8,3,Two Pointers,3,Hard
9,4,Dynamic Programming,1,Easy


In [8]:
# Convert the combinations to a pyarrow table
data: pa.Table = pa.Table.from_pandas(combos)

# SwellDB
Let's populate the *question* table with SwellDB.

In [9]:
import os
import logging

import datafusion

# SwellDB imports
from swelldb import SwellDB, OpenAILLM
from swelldb.swelldb import Mode

# Initialize a SwellDB instance
swelldb: SwellDB = SwellDB(llm=OpenAILLM(model="gpt-4o"))

## Create the names table

In [10]:
names = (
    swelldb.table_builder()
    .set_table_name("question")
    .set_content("A table that contains DSA question names (like LeetCode questions). Return as many rows as possible for each category/level")
    .set_schema("name str, question_type_name str, question_level_name str")
    .set_base_columns(["question_type_name", "question_level_name"])
    .set_table_gen_mode(Mode.LLM)
    .set_data(data)
).build()

## Create the links table

In [11]:
links = (
    swelldb.table_builder()
    .set_table_name("links")
    .set_content("A table that contains DSA question names and links")
    .set_schema("name str, link str")
    .set_base_columns(["name"])
    .set_table_gen_mode(Mode.SEARCH)
).build()

## Create a table chain

In [12]:
table_chain = names | links

table_chain.explain()

SearchEngineTable[schema=['name', 'link']
--LLMTable[schema=['name', 'question_type_name', 'question_level_name']
----CustomTable[schema=['question_type_id', 'question_type_name', 'question_level_id', 'question_level_name']


In [13]:
# Materialize the table
ds = table_chain.materialize()

sc = datafusion.SessionContext()
sc.deregister_table("questions")
sc.register_dataset("questions", pa.dataset.dataset(ds))

In [14]:
ds.to_pandas()

Unnamed: 0,name,link,question_type_name,question_level_name,question_type_id,question_level_id
0,Binary Search - Find First and Last Position o...,https://leetcode.com/problems/find-first-and-l...,Binary Search,Medium,1,2
1,Binary Search - Search Insert Position,https://leetcode.com/problems/search-insert-po...,Binary Search,Easy,1,1
2,Binary Search - Median of Two Sorted Arrays,https://leetcode.com/problems/median-of-two-so...,Binary Search,Hard,1,3
3,Graph - Number of Islands,https://leetcode.com/problems/number-of-islands/,Graph,Medium,2,2
4,Graph - Clone Graph,https://leetcode.com/problems/clone-graph/,Graph,Medium,2,2
5,Graph - Course Schedule,https://leetcode.com/problems/course-schedule/,Graph,Hard,2,3
6,Two Pointers - Two Sum II,https://leetcode.com/problems/two-sum-ii-input...,Two Pointers,Easy,3,1
7,Two Pointers - Container With Most Water,https://leetcode.com/problems/container-with-m...,Two Pointers,Medium,3,2
8,Two Pointers - Trapping Rain Water,https://leetcode.com/problems/trapping-rain-wa...,Two Pointers,Hard,3,3
9,Dynamic Programming - Climbing Stairs,https://leetcode.com/problems/climbing-stairs/,Dynamic Programming,Easy,4,1


In [15]:
sc.sql(""" 
SELECT *
FROM questions
""")

name,link,question_type_name,question_level_name,question_type_id,question_level_id
Binary Search - Find Firs  Binary Search - Find First and Last Position of Element  ...,https://leetcode.com/prob  https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/  ...,Binary Search,Medium,1,2
Binary Search - Search In  Binary Search - Search Insert Position  ...,https://leetcode.com/prob  https://leetcode.com/problems/search-insert-position/  ...,Binary Search,Easy,1,1
Binary Search - Median of  Binary Search - Median of Two Sorted Arrays  ...,https://leetcode.com/prob  https://leetcode.com/problems/median-of-two-sorted-arrays/  ...,Binary Search,Hard,1,3
Graph - Number of Islands,https://leetcode.com/prob  https://leetcode.com/problems/number-of-islands/  ...,Graph,Medium,2,2
Graph - Clone Graph,https://leetcode.com/prob  https://leetcode.com/problems/clone-graph/  ...,Graph,Medium,2,2
Graph - Course Schedule,https://leetcode.com/prob  https://leetcode.com/problems/course-schedule/  ...,Graph,Hard,2,3
Two Pointers - Two Sum II,https://leetcode.com/prob  https://leetcode.com/problems/two-sum-ii-input-array-is-sorted/  ...,Two Pointers,Easy,3,1
Two Pointers - Container Two Pointers - Container With Most Water  ...,https://leetcode.com/prob  https://leetcode.com/problems/container-with-most-water/  ...,Two Pointers,Medium,3,2
Two Pointers - Trapping R  Two Pointers - Trapping Rain Water  ...,https://leetcode.com/prob  https://leetcode.com/problems/trapping-rain-water/  ...,Two Pointers,Hard,3,3
Dynamic Programming - Cli  Dynamic Programming - Climbing Stairs  ...,https://leetcode.com/prob  https://leetcode.com/problems/climbing-stairs/  ...,Dynamic Programming,Easy,4,1


In [16]:
ds.to_pandas().drop(columns=["question_type_name", "question_level_name"]).to_sql("questions", conn, if_exists="append", index=False)

12

In [17]:
query = """
SELECT q.name AS name, qt.name as Type, ql.name As Level, q.link AS Link
FROM questions q, question_type qt, question_level ql
WHERE q.question_type_id = qt.id
AND q.question_level_id = ql.id
"""

pd.set_option('display.max_colwidth', 400)

df = pd.read_sql(query, conn)

In [18]:
df

Unnamed: 0,name,Type,Level,Link
0,Binary Search - Find First and Last Position of Element,Binary Search,Medium,https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/
1,Binary Search - Search Insert Position,Binary Search,Easy,https://leetcode.com/problems/search-insert-position/
2,Binary Search - Median of Two Sorted Arrays,Binary Search,Hard,https://leetcode.com/problems/median-of-two-sorted-arrays/
3,Graph - Number of Islands,Graph,Medium,https://leetcode.com/problems/number-of-islands/
4,Graph - Clone Graph,Graph,Medium,https://leetcode.com/problems/clone-graph/
5,Graph - Course Schedule,Graph,Hard,https://leetcode.com/problems/course-schedule/
6,Two Pointers - Two Sum II,Two Pointers,Easy,https://leetcode.com/problems/two-sum-ii-input-array-is-sorted/
7,Two Pointers - Container With Most Water,Two Pointers,Medium,https://leetcode.com/problems/container-with-most-water/
8,Two Pointers - Trapping Rain Water,Two Pointers,Hard,https://leetcode.com/problems/trapping-rain-water/
9,Dynamic Programming - Climbing Stairs,Dynamic Programming,Easy,https://leetcode.com/problems/climbing-stairs/
