## Import tool for qns-svc

Dataset is taken from:
- https://arxiv.org/abs/2504.14655
- https://github.com/newfacade/LeetCodeDataset/tree/main
- https://huggingface.co/datasets/newfacade/LeetCodeDataset

### Dependencies

In [11]:
import pandas as pd
import requests

In [None]:
QNS_SVC_URL="http://127.0.0.1:8000"

In [5]:
# Login using e.g. `huggingface-cli login` to access this dataset
splits = {'train': 'LeetCodeDataset-train.jsonl', 'test': 'LeetCodeDataset-test.jsonl'}
lc_df = pd.read_json("hf://datasets/newfacade/LeetCodeDataset/" + splits["train"], lines=True)

In [7]:
lc_df.head(2)

Unnamed: 0,task_id,question_id,difficulty,tags,problem_description,starter_code,estimated_date,prompt,completion,entry_point,test,input_output,query,response
0,two-sum,1,Easy,"[Array, Hash Table]",Given an array of integers nums and an integer...,"class Solution:\n def twoSum(self, nums: Li...",2015-08-07,import random\nimport functools\nimport collec...,"class Solution:\n def twoSum(self, nums: Li...",Solution().twoSum,def check(candidate):\n assert candidate(nu...,"[{'input': 'nums = [3,3], target = 6', 'output...",You are an expert Python programmer. You will ...,"To solve this problem efficiently, we can use ..."
1,add-two-numbers,2,Medium,"[Recursion, Linked List, Math]",You are given two non-empty linked lists repre...,# Definition for singly-linked list.\n# class ...,2015-08-07,import heapq\nimport itertools\nfrom sortedcon...,# Definition for singly-linked list.\n# class ...,Solution().addTwoNumbers,def check(candidate):\n assert is_same_list...,"[{'input': 'l1 = [9,8,7], l2 = [1,2,3]', 'outp...",You are an expert Python programmer. You will ...,```python\n# Definition for singly-linked list...


#### Process Difficulty

In [13]:
unique_difficulties = list(set(lc_df['difficulty']))
unique_difficulties

['Easy', 'Hard', 'Medium']

In [16]:
for difficulty in unique_difficulties:
    res = requests.post(f"{QNS_SVC_URL}/difficulty", json={"level": difficulty})
    print(res.json())

{'message': 'Difficulty level Easy created successfully'}
{'message': 'Difficulty level Hard created successfully'}
{'message': 'Difficulty level Medium created successfully'}


### Process Categories

In [17]:
unique_categories = list(set(lc_df['tags'].sum()))
unique_categories

['Array',
 'Breadth-First Search',
 'Hash Table',
 'Bucket Sort',
 'Line Sweep',
 'Eulerian Circuit',
 'Stack',
 'Probability and Statistics',
 'Rolling Hash',
 'String',
 'Quickselect',
 'Depth-First Search',
 'Number Theory',
 'Sliding Window',
 'Bit Manipulation',
 'Binary Tree',
 'Recursion',
 'Strongly Connected Component',
 'Hash Function',
 'Greedy',
 'Union Find',
 'Binary Search Tree',
 'Geometry',
 'Tree',
 'Two Pointers',
 'Segment Tree',
 'Monotonic Queue',
 'Brainteaser',
 'Merge Sort',
 'Game Theory',
 'Simulation',
 'Ordered Set',
 'Counting Sort',
 'Randomized',
 'Matrix',
 'Math',
 'Monotonic Stack',
 'Heap (Priority Queue)',
 'Interactive',
 'Combinatorics',
 'Suffix Array',
 'Prefix Sum',
 'Binary Search',
 'String Matching',
 'Radix Sort',
 'Enumeration',
 'Biconnected Component',
 'Minimum Spanning Tree',
 'Queue',
 'Backtracking',
 'Graph',
 'Bitmask',
 'Concurrency',
 'Counting',
 'Topological Sort',
 'Shortest Path',
 'Dynamic Programming',
 'Divide and Conquer'

In [18]:
success_imported_cat = []
failure_imported_cat = []
for cat in unique_categories:
    res = requests.post(f"{QNS_SVC_URL}/category", json={"name": cat})
    if res.ok:
        success_imported_cat.append(cat)
    else:
        failure_imported_cat.append(cat)

print(success_imported_cat)
print(failure_imported_cat)

['Array', 'Breadth-First Search', 'Hash Table', 'Bucket Sort', 'Line Sweep', 'Eulerian Circuit', 'Stack', 'Probability and Statistics', 'Rolling Hash', 'String', 'Quickselect', 'Depth-First Search', 'Number Theory', 'Sliding Window', 'Bit Manipulation', 'Binary Tree', 'Recursion', 'Strongly Connected Component', 'Hash Function', 'Greedy', 'Union Find', 'Binary Search Tree', 'Geometry', 'Tree', 'Two Pointers', 'Segment Tree', 'Monotonic Queue', 'Brainteaser', 'Merge Sort', 'Game Theory', 'Simulation', 'Ordered Set', 'Counting Sort', 'Randomized', 'Matrix', 'Math', 'Monotonic Stack', 'Heap (Priority Queue)', 'Interactive', 'Combinatorics', 'Suffix Array', 'Prefix Sum', 'Binary Search', 'String Matching', 'Radix Sort', 'Enumeration', 'Biconnected Component', 'Minimum Spanning Tree', 'Queue', 'Backtracking', 'Graph', 'Bitmask', 'Concurrency', 'Counting', 'Topological Sort', 'Shortest Path', 'Dynamic Programming', 'Divide and Conquer', 'Memoization', 'Sorting', 'Linked List', 'Binary Indexe

### Process Questions

In [19]:
lc_df.columns

Index(['task_id', 'question_id', 'difficulty', 'tags', 'problem_description',
       'starter_code', 'estimated_date', 'prompt', 'completion', 'entry_point',
       'test', 'input_output', 'query', 'response'],
      dtype='object')

In [None]:
failure_imported_qns = []

for index, row in lc_df.iterrows():
    qns_payload = {
        "title": row['task_id'],
        "description": row['problem_description'],
        "difficulty": row['difficulty'],
        "categories": row['tags'],
        "code_template": row['starter_code'],
        "solution_sample": row['completion']
    }
    print(f"{index}: {row['task_id']}")
    res = requests.post(f"{QNS_SVC_URL}/questions/", json=qns_payload)
    if not res.ok:
        failure_imported_qns.append((index, row['task_id']))

print(failure_imported_qns)

0: two-sum
1: add-two-numbers
2: longest-substring-without-repeating-characters
3: median-of-two-sorted-arrays
4: longest-palindromic-substring
5: zigzag-conversion
6: reverse-integer
7: string-to-integer-atoi
8: palindrome-number
9: regular-expression-matching
10: container-with-most-water
11: integer-to-roman
12: roman-to-integer
13: longest-common-prefix
14: 3sum
15: 3sum-closest
16: letter-combinations-of-a-phone-number
17: 4sum
18: remove-nth-node-from-end-of-list
19: valid-parentheses
20: merge-two-sorted-lists
21: generate-parentheses
22: merge-k-sorted-lists
23: swap-nodes-in-pairs
24: reverse-nodes-in-k-group
25: remove-duplicates-from-sorted-array
26: remove-element
27: find-the-index-of-the-first-occurrence-in-a-string
28: divide-two-integers
29: substring-with-concatenation-of-all-words
30: next-permutation
31: longest-valid-parentheses
32: search-in-rotated-sorted-array
33: find-first-and-last-position-of-element-in-sorted-array
34: search-insert-position
35: valid-sudoku
