Takes this Kaggle dataset 'leetcode-solutions'
https://www.kaggle.com/datasets/erichartford/leetcode-solutions, and turns them into basic
dialogue using a preset list of user prompt tempaltes.

In [6]:
ONE_STEP_TEMPLATES = [    
    "Can you write a program in ${lang} where\n${content}",
    "How would you implement a function in ${lang} that\n${content}",
    "Write a ${lang} function for\n${content}",
    "Can you create a ${lang} program that\n${content}",
    "Implement a function in ${lang} to\n${content}",
    "Write a ${lang} script for\n${content}",
    "How would you code a program in ${lang} to\n${content}",
    "Create a ${lang} function for\n${content}",
    "Write a ${lang} program that can\n${content}",
    "Can you implement a function in ${lang} that\n${content}"
]

In [9]:
import os
import kaggle
import pandas as pd
import random
from IPython.display import display
from datasets import Dataset

data_source = "https://www.kaggle.com/datasets/erichartford/leetcode-solutions"
output_dir = "data"
os.makedirs(output_dir, exist_ok=True)

In [10]:
kaggle.api.dataset_download_files("erichartford/leetcode-solutions", "data", unzip=True)

In [16]:
leetcode_solutions = pd.read_json("data/leetcode-solutions.jsonl", lines=True)

# Create dataframe with columns INSTRUCTION, RESPONSE, SOURCE
# The INSTRUCTION a random choice from ONE_STEP_TEMPLATES with the language and content filled in
# The RESPONSE is the answer to the question being posed
# The SOURCE is the URL of the dataset
oa_leet10k = []
for index, row in leetcode_solutions.iterrows():
    content = row["content"]
    for lang in ["c++", "java", "javascript", "python"]:
        if lang in row["answer"]:
            oa_leet10k.append(
                {
                    "INSTRUCTION": random.choice(ONE_STEP_TEMPLATES).replace("${lang}", lang).replace("${content}", content),
                    "RESPONSE": row["answer"][lang],
                    "SOURCE": data_source,
                }
            )    
oa_leet10k = pd.DataFrame(oa_leet10k)

# Print the first 5 rows of the dataframe with full width and newline characters correctly displayed in the RESPONSE column
with pd.option_context("display.max_colwidth", 80):
    # Assuming the variable df contains the relevant DataFrame
    display(
        oa_leet10k.head(5).style.set_properties(
            **{
                "text-align": "left",
                "white-space": "pre-wrap",
            }
        )
    )

Unnamed: 0,INSTRUCTION,RESPONSE,SOURCE
0,"How would you implement a function in c++ that Given an array of integers `nums` and an integer `target`, return _indices of the two numbers such that they add up to `target`_. You may assume that each input would have **_exactly_ one solution**, and you may not use the _same_ element twice. You can return the answer in any order. **Example 1:** **Input:** nums = \[2,7,11,15\], target = 9 **Output:** \[0,1\] **Explanation:** Because nums\[0\] + nums\[1\] == 9, we return \[0, 1\]. **Example 2:** **Input:** nums = \[3,2,4\], target = 6 **Output:** \[1,2\] **Example 3:** **Input:** nums = \[3,3\], target = 6 **Output:** \[0,1\] **Constraints:** * `2 <= nums.length <= 104` * `-109 <= nums[i] <= 109` * `-109 <= target <= 109` * **Only one valid answer exists.** **Follow-up:** Can you come up with an algorithm that is less than `O(n2)` time complexity?","```cpp #include #include std::vector twoSum(std::vector& nums, int target) {  std::unordered_map map;  for (int i = 0; i < nums.size(); i++) {  int complement = target - nums[i];  if (map.find(complement) != map.end()) {  return {map[complement], i};  }  map[nums[i]] = i;  }  return {}; } ```",https://www.kaggle.com/datasets/erichartford/leetcode-solutions
1,"Can you write a program in java where Given an array of integers `nums` and an integer `target`, return _indices of the two numbers such that they add up to `target`_. You may assume that each input would have **_exactly_ one solution**, and you may not use the _same_ element twice. You can return the answer in any order. **Example 1:** **Input:** nums = \[2,7,11,15\], target = 9 **Output:** \[0,1\] **Explanation:** Because nums\[0\] + nums\[1\] == 9, we return \[0, 1\]. **Example 2:** **Input:** nums = \[3,2,4\], target = 6 **Output:** \[1,2\] **Example 3:** **Input:** nums = \[3,3\], target = 6 **Output:** \[0,1\] **Constraints:** * `2 <= nums.length <= 104` * `-109 <= nums[i] <= 109` * `-109 <= target <= 109` * **Only one valid answer exists.** **Follow-up:** Can you come up with an algorithm that is less than `O(n2)` time complexity?","```java import java.util.HashMap; import java.util.Map; public int[] twoSum(int[] nums, int target) {  Map map = new HashMap<>();  for (int i = 0; i < nums.length; i++) {  int complement = target - nums[i];  if (map.containsKey(complement)) {  return new int[]{map.get(complement), i};  }  map.put(nums[i], i);  }  throw new IllegalArgumentException(""No two sum solution""); } ```",https://www.kaggle.com/datasets/erichartford/leetcode-solutions
2,"Implement a function in javascript to Given an array of integers `nums` and an integer `target`, return _indices of the two numbers such that they add up to `target`_. You may assume that each input would have **_exactly_ one solution**, and you may not use the _same_ element twice. You can return the answer in any order. **Example 1:** **Input:** nums = \[2,7,11,15\], target = 9 **Output:** \[0,1\] **Explanation:** Because nums\[0\] + nums\[1\] == 9, we return \[0, 1\]. **Example 2:** **Input:** nums = \[3,2,4\], target = 6 **Output:** \[1,2\] **Example 3:** **Input:** nums = \[3,3\], target = 6 **Output:** \[0,1\] **Constraints:** * `2 <= nums.length <= 104` * `-109 <= nums[i] <= 109` * `-109 <= target <= 109` * **Only one valid answer exists.** **Follow-up:** Can you come up with an algorithm that is less than `O(n2)` time complexity?","```javascript function twoSum(nums, target) {  const map = new Map();  for (let i = 0; i < nums.length; i++) {  const complement = target - nums[i];  if (map.has(complement)) {  return [map.get(complement), i];  }  map.set(nums[i], i);  }  return []; } ```",https://www.kaggle.com/datasets/erichartford/leetcode-solutions
3,"Can you write a program in python where Given an array of integers `nums` and an integer `target`, return _indices of the two numbers such that they add up to `target`_. You may assume that each input would have **_exactly_ one solution**, and you may not use the _same_ element twice. You can return the answer in any order. **Example 1:** **Input:** nums = \[2,7,11,15\], target = 9 **Output:** \[0,1\] **Explanation:** Because nums\[0\] + nums\[1\] == 9, we return \[0, 1\]. **Example 2:** **Input:** nums = \[3,2,4\], target = 6 **Output:** \[1,2\] **Example 3:** **Input:** nums = \[3,3\], target = 6 **Output:** \[0,1\] **Constraints:** * `2 <= nums.length <= 104` * `-109 <= nums[i] <= 109` * `-109 <= target <= 109` * **Only one valid answer exists.** **Follow-up:** Can you come up with an algorithm that is less than `O(n2)` time complexity?","```python def twoSum(nums, target):  map = {}  for i, num in enumerate(nums):  complement = target - num  if complement in map:  return [map[complement], i]  map[num] = i  return [] ```",https://www.kaggle.com/datasets/erichartford/leetcode-solutions
4,"Write a c++ script for You are given two **non-empty** linked lists representing two non-negative integers. The digits are stored in **reverse order**, and each of their nodes contains a single digit. Add the two numbers and return the sum as a linked list. You may assume the two numbers do not contain any leading zero, except the number 0 itself. **Example 1:** **Input:** l1 = \[2,4,3\], l2 = \[5,6,4\] **Output:** \[7,0,8\] **Explanation:** 342 + 465 = 807. **Example 2:** **Input:** l1 = \[0\], l2 = \[0\] **Output:** \[0\] **Example 3:** **Input:** l1 = \[9,9,9,9,9,9,9\], l2 = \[9,9,9,9\] **Output:** \[8,9,9,9,0,0,0,1\] **Constraints:** * The number of nodes in each linked list is in the range `[1, 100]`. * `0 <= Node.val <= 9` * It is guaranteed that the list represents a number that does not have leading zeros.","```cpp struct ListNode {  int val;  ListNode *next;  ListNode(int x) : val(x), next(nullptr) {} }; ListNode* addTwoNumbers(ListNode* l1, ListNode* l2) {  ListNode dummy(0);  ListNode* current = &dummy;  int carry = 0;  while (l1 || l2 || carry) {  int sum = (l1 ? l1->val : 0) + (l2 ? l2->val : 0) + carry;  carry = sum / 10;  current->next = new ListNode(sum % 10);  current = current->next;  if (l1) l1 = l1->next;  if (l2) l2 = l2->next;  }  return dummy.next; } ```",https://www.kaggle.com/datasets/erichartford/leetcode-solutions


In [18]:
# Upload dataset to HF
oa_leet10k.to_parquet("oa_leet10k.parquet", row_group_size=100, engine="pyarrow")
ds = Dataset.from_parquet("oa_leet10k.parquet")
# Uncomment to push dataset to HF
# ds.push_to_hub("dctanner/oa_recipes")

Downloading and preparing dataset parquet/default to /home/eric/.cache/huggingface/datasets/parquet/default-eefae5a05b69c25e/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 3155.98it/s]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 533.36it/s]
                                                                  

Dataset parquet downloaded and prepared to /home/eric/.cache/huggingface/datasets/parquet/default-eefae5a05b69c25e/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


