<a href="https://colab.research.google.com/github/CherpanovNazim/learn-llm/blob/main/notebooks/05_Code_generation_and_conversion_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Code generation and conversion
In this notebook, we'll solve some tasks that are related to code generation:

* Creating a python function according to description.
* Creating a function with additional constraints.
* Creating a function, from inputs-outputs example.
* Code convertion to optimize a function.
* Infilling the missing part of a function.

All tasks are solved using following steps:
- Generate initial implementation of a function.
- Generate unit tests using hints to steer the generation.
- Iteratively prompt LLM to fix the issues that appear when running unit tests.

Be carefull!
Running the code that was generated by LLM can be dangerous!
To minimize security risks, make sure you run it in an isolated sandbox environment!

<hr style="border-top: 1px solid rgb(230, 230, 230);">

* [GitHub Repo](https://github.com/CherpanovNazim/learn-llm)

<hr style="border-top: 1px solid rgb(230, 230, 230);">

In [None]:
!pip install -qU openai==1.35.14 langchain==0.2.13 pandas==2.2.2 vllm==0.5.4 transformers==4.44.0 langchain-community==0.2.12

In [None]:
# wait ~5 min for installations
%%time

import json
import random
import unittest

import openai
from tqdm import tqdm

!wget -q 'https://raw.githubusercontent.com/CherpanovNazim/learn-llm/main/notebooks/utils/llm_for_code.py' -O llm_for_code.py
!python3 llm_for_code.py

from llm_for_code import (
    contains_class,
    contains_function,
    extract_code,
    run_unit_tests,
    wrap_code,
)

#download widgets from GitHub
!wget -q 'https://raw.githubusercontent.com/CherpanovNazim/learn-llm/main/notebooks/utils/widgets.py' -O widgets.py
!python3 widgets.py

from widgets import LLMCallVisualiser


# Load the default model
DEFAULT_MODEL = {"model": "PrunaAI/ibm-granite-granite-8b-code-instruct-AWQ-4bit-smashed", "api_base": "http://localhost:8000/v1", "api_key": "EMPTY"}

#run VLLM
!nohup vllm serve {DEFAULT_MODEL['model']} --quantization awq --max-model-len=4096 > vllm.log &
!tail -f vllm.log | grep -q "Uvicorn running" && echo "Now you can start using the model"


In [None]:
# Set the base URL and API key.
# For production apps it's preferable to use some secret management system and don't store the key in git repo :)
client = openai.OpenAI(base_url=DEFAULT_MODEL["api_base"], api_key=DEFAULT_MODEL["api_key"])

def chat_completion(
    prompt,
    system_prompt: str = None,
    temperature: float = 0.,
    max_tokens: int = None,
    llm_call_visualiser: LLMCallVisualiser = None,
    **kwargs
) -> str:
    if system_prompt is None:
        # Here we define the output format
        system_prompt = "Generate some reasoning and then the python"\
                        " code wrapped into ```python\n...\n``` block."

    response = client.chat.completions.create(
        model=DEFAULT_MODEL["model"],
        temperature=temperature,
        max_tokens=max_tokens,
        frequency_penalty=1,
        timeout=120,
        messages=[{"role": "system", "content": system_prompt},
                  {"role": "user", "content": prompt}],
        stream=True,  # To see generation in real-time
        **kwargs)

    try:
        completion = ""
        for chunk in response:
            chunk_text = chunk.choices[0].delta.content
            if chunk_text:
                completion += chunk_text

                # Update llm_call_visualiser UI element with next
                #   chunk of streamed LLM output.
                if llm_call_visualiser is not None:
                    llm_call_visualiser.append_streamed_text(chunk_text)
    finally:
        if llm_call_visualiser is not None:
            llm_call_visualiser.finalize_streamed_text()

    return completion.strip(" \n")

# Defining code generation tasks

In [None]:
code_generation_tasks_examples = [
    # Example 0:
    # Creating a python function according to description.
    {
        # Description contains example of how versions look like,
        #   rather than describing the version composition rules.
        "function_generation_instruction":
            "Write version_compare function."
            " It must be able to compare versions like:"
            " version_compare('1.2.3', '1.2.13.7').",

        # Less capable models might miss some important types of checks.
        # We can explicitely ask which test cases to include if LLM can't come up with those itself.
        "unit_testing_hints":
            "Include tests with different version parts lengths comparison:"
            " version_compare('1.2.3', '1.2.13.7').",
    },

    # Example 1:
    # Creating a function, constraining not to wrap well-known implementation.
    {
        # If we want the full function code that is not simply a wraper for the
        #   base64 implementation, then we can add such a constraint.
        "function_generation_instruction":
            "Write str_to_base64 function. It must not use base64 lib.",

        # When generating unit tests, LLM might hallucinate the base64 encoded
        #   expected values. So we ask LLM to compare to the values that are
        #   being produced by the base64.b64encode function.
        # Set the hint to None to see if LLM can generate robust expected values without it.
        "unit_testing_hints":
            "The last line of each test case MUST be exactly (up to a single character):"
            " self.assertEqual(str_to_base64(test_str), base64.b64encode(test_str.encode()).decode())",
    },

    # Example 2:
    # Creating a function, from inputs-outputs example.
    {
        # Sometimes we might need LLM to propose the implementation, when
        #   only knowing the inputs-outputs of some function.
        "function_generation_instruction":
            "Write function implementation and name it appropriately. When inputs are invalid, raise exception.\n"
            "\n"
            "def name_this_fn_appropriately(a: list[list[float]],"
            " b: list[list[float]]) -> list[list[float]]\n"
            "Input : a = [[1, 7, 3],\n"
            "             [3, 5, 6],\n"
            "             [6, 8, 9]]\n"
            "        b = [[1, 1, 1, 2],\n"
            "            [6, 7, 3, 0],\n"
            "            [4, 5, 9, 1]]\n"
            "Output : [55, 65, 49, 5]\n"
            "         [57, 68, 72, 12]\n"
            "         [90, 107, 111, 21]",

        # When generating unit tests, LLM might generate incorrect
        #   expected values. So we hope to reduce those by recommending
        #   using numpy.
        "unit_testing_hints":
            "Use numpy to generate expected values in test cases.",
    },

    # Example 3:
    # Code conversion.
    {
        # Here we ask to vectorize the function.
        "function_generation_instruction":
            "Add vectorized function variant, so that function inputs, calculations"
            " and outputs are all vectorized. Name new function as vectorized_calculate,"
            " and output BOTH nonvectorized_calculate and vectorized_calculate functions:\n"
            "\n"
            "def nonvectorized_calculate(a, b):\n"
            "    if not a or not b:\n"
            "        return []\n"
            "\n"
            "    if len(a[0]) != len(b):\n"
            "        raise ValueError('Number of columns in the first matrix must be equal"
            " to the number of rows in the second matrix')\n"
            "\n"
            "    result = [[0 for _ in range(len(b[0]))] for _ in range(len(a))]\n"
            "\n"
            "    for i in range(len(a)):\n"
            "        for j in range(len(b[0])):\n"
            "            for k in range(len(b)):\n"
            "                result[i][j] += a[i][k] * b[k][j]\n"
            "\n"
            "    for i in range(len(result)):\n"
            "        for j in range(len(result[0])):\n"
            "            result[i][j] = result[i][j] * result[i][j]\n"
            "\n"
            "    return result\n",

        # We guide the model how to generate expected values.
        # Set the hint to None to see if LLM can generate robust expected values without it.
        "unit_testing_hints":
            "Function to test is vectorized_calculate. Generate expected values for"
            " testing it using nonvectorized_calculate function and NEVER change its code.",
    },

    # Example 4:
    # Infilling the missing part of a function.
    {
        # Sometimes we need to reconstruct only a part of the code.
        # But this is a relatively hard task for less capable LLMs which might
        #   change the code out of the MISSING PART.
        "function_generation_instruction":
            "Generate the MISSING part, but output the whole function"
            " keeping EXACT COPY of the rest of the code.\n"
            "\n"
            "def compare_if_nested_lists_are_equal(l1: list, l2: list) -> bool:\n"
            "    \"\"\"\n"
            "    Compares nested lists, which can have any depth (e.g. \n"
            "      list[list[list[float]]]) and arbitrary lengths of lists\n"
            "    \"\"\"\n"
            "    \n"
            "    result = None\n"
            "    \n"
            "    # MISSING PART\n"
            "    \n"
            "            else:\n"
            "                if l1[i] != l2[i]:\n"
            "                    result = False\n"
            "                    break\n"
            "    \n"
            "    return result\n",

        # We restrict the model not to generate too deep nested lists as expected values.
        # Otherwise long chains might be wrong. For instance,
        #   "[1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]" misses 1 closing "]".
        "unit_testing_hints":
            "Use maximum depth of 4 for nested lists in test cases.",
    },
]

# Select code generation task

In [None]:
# Here you can choose the example with the function generation task definition
#   from the code_generation_tasks_examples
#
task_example_id = 0  # Change this to select a different task
#
print(f"Selected code generation task example id:\n{'-' * 64}\n{task_example_id}\n\n")

code_generation_task = code_generation_tasks_examples[task_example_id]

function_generation_instruction = code_generation_task["function_generation_instruction"]
print(f"Function generation instruction:\n{'-' * 64}\n{function_generation_instruction}\n\n")

unit_testing_hints = code_generation_task["unit_testing_hints"]
print(f"Function unit testing hints:\n{'-' * 64}\n{unit_testing_hints}")

# Initial code generation

In [None]:
def generate_code(
    prompt: str,
) -> (str, str):
    """Generating code according to prompt"""

    # Output UI element that will visualize the LLM call
    llm_call_visualiser = LLMCallVisualiser(prompt=function_generation_instruction)
    llm_call_visualiser.display()

    llm_results = chat_completion(
        prompt,
        temperature=0.15,  # To make generations more diverse (creativity)
        llm_call_visualiser=llm_call_visualiser)

    # Here we only extract the generated function without imports section.
    # That's ok for us, since we'll have a bug-fix stage, which will recover those.
    generated_code = extract_code(
        llm_results,
        extract_only_funcs_and_classes=True,
        remove_starting_code=True)

    # Show generated code
    if llm_call_visualiser is not None:
        llm_call_visualiser.update_content(
            tab_name="Generated Code",
            text=generated_code,
            change_tab=True)

    return llm_results, generated_code

In [None]:
print(f"Generating function...")

# Initial function code generation is done here.
llm_results, generated_code = generate_code(
    prompt=function_generation_instruction)
assert contains_function(generated_code), "Function is missing!"\
                                          " Rerun the cell until assertion passes."

# Load the code into global context by running it.
# Generated function is not being called here, just defined.
exec(generated_code, globals())

# Creating unit tests using LLM

In [None]:
def create_unit_tests(
    code_to_be_tested: str,
    unit_testing_hints: str = None,
) -> (str, str):
    """
    Creating unit tests for code_to_be_tested taking into account unit_testing_hints.
    """

    if unit_testing_hints is None:
        unit_testing_hints = ""

    prompt = "\n".join([
        f"Write a perfectly runnable Test(unittest.TestCase) with several fully"
            f" implemented test cases including edge cases.",
        f"{unit_testing_hints}",
        f"",
        f"Function to be tested:",
        f"{wrap_code(code_to_be_tested)}",
    ])

    # Output UI element that will visualize the LLM call
    llm_call_visualiser = LLMCallVisualiser(prompt=prompt)
    llm_call_visualiser.display()

    llm_results = chat_completion(
        prompt,
        temperature=0.15,
        llm_call_visualiser=llm_call_visualiser)

    # We are interested only in code within the answer
    generated_code = extract_code(
        llm_results,
        extract_only_funcs_and_classes=True,
        remove_starting_code=True)

    # Show generated code
    if llm_call_visualiser is not None:
        llm_call_visualiser.update_content(
            tab_name="Generated Code",
            text=generated_code,
            change_tab=True)

    # Make sure we return a concatenation of tested code + unit tests
    if contains_function(generated_code):
        # If LLM included tested code along with unit tests, then ok.
        code_with_unit_tests = generated_code
    else:
        # Prepend the generated unit tests with the tested code.
        code_with_unit_tests = f"{code_to_be_tested}\n\n{generated_code}"

    return llm_results, code_with_unit_tests

In [None]:
print(f"Generating unit tests...")

# Initial unit tests code generation is done here.
llm_results, code_with_unit_tests = create_unit_tests(
    generated_code,
    unit_testing_hints)
assert contains_class(code_with_unit_tests), "Unit tests class is missing!"\
                                             " Rerun the cell until assertion passes."

# Make sure there is at least 1 issue after running unit tests

In [None]:
tests_were_runnable, unit_tests_issues = run_unit_tests(code_with_unit_tests, scope_namespace=globals())
if len(unit_tests_issues) == 0:
    # Introducing a simple bug by changing "def" to "ef"
    # We need at least 1 bug to demonstrate the bug-fix
    code_with_unit_tests = code_with_unit_tests[1:]
    tests_were_runnable, unit_tests_issues = run_unit_tests(code_with_unit_tests, scope_namespace=globals())

print(f"Unit tests were runnable:\n{'-' * 64}\n{tests_were_runnable}\n\n")
print(f"Issues count:\n{'-' * 64}\n{len(unit_tests_issues)}\n\n")
unit_tests_issues_str = "\n\n\n".join(
    [f"{i}. {issue}"
     for i, issue in enumerate(unit_tests_issues, start=1)]) or "None"
print(f"Issues:\n{'-' * 64}\n{unit_tests_issues_str}")

# Try fixing bugs using LLM

In [None]:
def try_to_fix_bugs(
    code: str,
    code_issues: list[str],
    generation_hints: str = None,
    max_issues_shown_to_llm: int = 2,
) -> (str, str):
    """
    Trying to fix bugs with 1 LLM call
    """

    code_issues_copy = code_issues.copy()
    random.shuffle(code_issues_copy)
    issues_shown_to_llm = code_issues_copy[:max_issues_shown_to_llm]
    issues_shown_to_llm_str = "\n\n\n".join(
        [f"{i}. {issue}"
         for i, issue in enumerate(issues_shown_to_llm, start=1)]) or "None"

    if generation_hints is None:
        generation_hints = ""

    prompt = "\n".join([
        f"Instruction:",

        # Here we encouraging model to think before generation (Chain-of-Thought)
        #   which in general improves answers and allows you to see how LLM thinks.
        f"1. Analize failing test cases in plain text."
            f" Remember that sometimes bugs are within the test cases.",

        # This is to prevent LLM from skipping tests that succeed.
        f"2. Rewrite the whole Code including tested function and FULL implementations"
            f" of ALL unit tests, so that all Tests Issues are resolved.",

        # To remember unit testing hints.
        f"{generation_hints}",

        f"",
        f"Code:",
        f"{wrap_code(code)}",
        f"",
        f"Tests Issues ({len(issues_shown_to_llm)} out of {len(code_issues)}):",
        f"```",
        f"{issues_shown_to_llm_str}",
        f"```",
        f"",

        # # This is not to ignore steps from Instruction above.
        # f"Number each step from Instruction."
    ])

    # Output UI element that will visualize the LLM call.
    llm_call_visualiser = LLMCallVisualiser(prompt=prompt)
    llm_call_visualiser.display()

    llm_results = chat_completion(
        prompt,
        temperature=0.15,
        llm_call_visualiser=llm_call_visualiser)

    if len(llm_results.split("```")) % 2 != 1:
        print(f"Probably, fixed code is not complete (not all of ``` have pairs).\n")
        tests_were_runnable = False
        unit_tests_issues = ["Valid code block wasn't found"]
    else:
        candidate_code = extract_code(
            llm_results,
            extract_only_funcs_and_classes=False,
            remove_starting_code=True)

        # Show generated code
        if llm_call_visualiser is not None:
            llm_call_visualiser.update_content(
                tab_name="Generated Code",
                text=candidate_code,
                change_tab=True)

        if not contains_function(candidate_code) or not contains_class(candidate_code):
            tests_were_runnable = False

            unit_tests_issues = []
            if not contains_function(candidate_code):
                unit_tests_issues.append("Fixed code misses tested function.")
            if not contains_class(candidate_code):
                unit_tests_issues.append("Fixed code misses unit test definition.")

            print("\n".join(unit_tests_issues))
        else:
            code = candidate_code

            tests_were_runnable, unit_tests_issues = run_unit_tests(
                code,
                scope_namespace=globals())
            unit_tests_issues_str = "\n\n\n".join(
                [f"{i}. {issue}"
                 for i, issue in enumerate(unit_tests_issues, start=1)]) or "None"

            # Show unit tests issues
            if llm_call_visualiser is not None:
                llm_call_visualiser.update_content(
                    tab_name="Code Issues",
                    text=unit_tests_issues_str,
                    change_tab=(len(unit_tests_issues) > 0))

    return llm_results, code, tests_were_runnable, unit_tests_issues

def fix_bugs_in_several_attempts(
    code_with_unit_tests: str,
    tests_were_runnable: bool = None,
    unit_tests_issues: list[str] = None,
    generation_hints: str = None,
    attempts_count: int = 20,
) -> (str, list[str]):
    """
    Trying to fix bugs iteratively using several attempts
    """

    if tests_were_runnable is None or unit_tests_issues is None:
        tests_were_runnable, unit_tests_issues = run_unit_tests(
            code_with_unit_tests,
            scope_namespace=globals())

    # Initialize best known solution as the initial one
    total_issues_count = len(unit_tests_issues) if tests_were_runnable else 1000000
    total_issues_count_str = str(total_issues_count) if tests_were_runnable else "Unit tests weren't runnable"
    best_code_with_unit_tests = code_with_unit_tests
    best_unit_tests_issues = unit_tests_issues
    best_issues_count = total_issues_count
    best_issues_count_str = total_issues_count_str

    print(f"Initial number of unit tests issues: {total_issues_count_str}\n")

    for attempt in range(1, attempts_count + 1):
        if total_issues_count == 0:
            print("All unit tests succeeded!\n")
            break

        print(f"\n{'=' * 4} Bug-fix attempt {attempt} out of {attempts_count} {'=' * 48}")

        llm_results, code_with_unit_tests, tests_were_runnable, unit_tests_issues = try_to_fix_bugs(
            best_code_with_unit_tests,
            best_unit_tests_issues,
            generation_hints=generation_hints,
            max_issues_shown_to_llm=2,  # We keep it small not to make the prompt huge
        )

        if tests_were_runnable:
            # Update best known solution if results are not worse than previous one
            total_issues_count = len(unit_tests_issues)
            total_issues_count_str = str(total_issues_count)
            if total_issues_count <= best_issues_count:
                best_code_with_unit_tests = code_with_unit_tests
                best_unit_tests_issues = unit_tests_issues
                best_issues_count = total_issues_count
                best_issues_count_str = str(best_issues_count)
        else:
            total_issues_count_str = "Unit tests weren't runnable"

        print(f"Current attempt unit tests issues: {total_issues_count_str};"
              f" Best issues count so far: {best_issues_count_str}\n")

    return best_code_with_unit_tests, best_unit_tests_issues

In [None]:
# Actual bug-fix starts here.
# Be careful! Sometimes it might not be able to fix all issues (if so, you can try again
#   starting from "Initial code generation" section).
# Or even cheat by tuning the unit tests so that they pass on buggy code!
# Also the code doesn't check if all required libs are installed.
# Note: Most capable models like GPT-4 (as of 2024-04-15) can easily solve all of the examples.
code_with_unit_tests, unit_tests_issues = fix_bugs_in_several_attempts(
    code_with_unit_tests,
    tests_were_runnable=tests_were_runnable,
    unit_tests_issues=unit_tests_issues,
    generation_hints=unit_testing_hints,
)


Now you can go back to the **Select code generation task** section and try other examples by changing the *task_example_id* variable.

# Summary

- Code generation can be guided by textual description, defining constraints, showing input-output examples, etc.
- Unit tests for code validation can be generated, but when possible, prefer prompting to use calculated expected values.
- Generated code and unit tests can be iteratively improved by asking LLM to fix the issues.
- Final code is not guaranteed to be bug-free even if all generated tests pass.
- We covered only relatively simple functions. For more complex ones we might need more sophisticated bug-fixing flow (e.g. adding use of debug printing, passing previous solutions within the prompt, etc).

# Homework

1. Play with existing examples to see how it works without unit_testing_hints or modify function_generation_instruction.

2. Using LLM generate json with countries, continents they are located at, their capitals and populations.

3. Extend code_generation_tasks_examples above with a task to generate function, that loads generated json as a pandas DataFrame.

4. Generate one more function to show resultant DataFrame as a bar chart with continents populations.

Good luck!

# Other cool LLM stuff related to coding

Here is the link to a comprehensive review of LLM researches for code: https://github.com/codefuse-ai/Awesome-Code-LLM.
Initially it was introduced for the "Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code" paper (https://arxiv.org/abs/2311.07989), but is being updated frequently (last updated on 2024-04-15, which was yesterday as of writing).

It contains references to papers/code/models for the most of popular code related tasks.
Here is the **Table of Contents**:

1. [Surveys](#1-surveys)

2. [Models](#2-models)

   2.1 [Off-the-Shelf LLM](#21-off-the-shelf-llm)

   2.2 [Existing LLM Adapted to Code](#22-existing-llm-adapted-to-code)

   2.3 [General Pretraining on Code](#23-general-pretraining-on-code)

   - [Encoder](#encoder)
   - [Decoder](#decoder)
   - [Encoder-Decoder](#encoder-decoder)
   - [UniLM](#unilm)

   <!-- prettier ignore -->

   2.4 [(Instruction) Fine-Tuning on Code](#24-instruction-fine-tuning-on-code)

   2.5 [Reinforcement Learning on Code](#25-reinforcement-learning-on-code)

3. [When Coding Meets Reasoning](#3-when-coding-meets-reasoning)

   3.1 [Coding for Reasoning](#31-coding-for-reasoning)

   3.2 [Code Simulation](#32-code-simulation)

   3.3 [Coding via Planning](#33-coding-via-planning)

   3.4 [Interactive Coding](#34-interactive-coding)

4. [Code LLM for Low-Resource, Low-Level, and Domain-Specific Languages](#4-code-llm-for-low-resource-low-level-and-domain-specific-languages)

5. [Methods/Models for Downstream Tasks](#5-methodsmodels-for-downstream-tasks)

   - [Code Generation](#code-generation)
   - [Code Translation](#code-translation)
   - [Code Summarization](#code-summarization)
   - [Program Repair](#program-repair)
   - [Vulnerability Detection](#vulnerability-detection)
   - [Type Prediction](#type-prediction)
   - [Malicious Code Detection](#malicious-code-detection)
   - [Repository-Level Coding](#repository-level-coding)
   - [Compiler Optimization](#compiler-optimization)
   - [Frontend Development & Web Agents](#frontend-development--web-agents)
   - [Decompilation](#decompilation)
   - [Test Generation](#test-generation)

6. [Analysis of AI-Generated Code](#6-analysis-of-ai-generated-code)

7. [User-LLM Interaction](#7-user-llm-interaction)

8. [Datasets](#8-datasets)

   8.1 [Pretraining](#81-pretraining)

   8.2 [Benchmarks](#82-benchmarks)

   - [Program Synthesis](#program-synthesis)
   - [Text-to-SQL](#text-to-sql)
   - [Code Translation](#code-translation-1)
   - [Program Repair](#program-repair-1)
   - [Code Summarization](#code-summarization-1)
   - [Defect/Vulnerability Detection](#defectvulnerability-detection)
   - [Code Retrieval](#code-retrieval)
   - [Type Inference](#type-inference)
   - [Commit Message Generation](#commit-message-generation)
   - [Repo-Level Coding](#repo-level-coding)

9. [Recommended Readings](#9-recommended-readings)

10. [Citation](#citation)

11. [Star History](#star-history)

12. [Join Us](#join-us)
