# Code generation

In addition to being trained on massive volumes of text, many LLMs today were trained with billions of lines of code. LLMs such as `GPT-4o`, `Gemini 1.5 Flash`, and `Llama 32` can generate code, comment code, review code, optimize code, find bugs in code, and more, and they support dozens of programming languages. Let's try a few examples to see what `GPT-4o` can do. Start by using it to implement a bubble sort in Python:

In [None]:
from openai import OpenAI
from IPython.display import Code, display

client = OpenAI(api_key='OPENAI_API_KEY')

prompt = '''
    Create a Python function that accepts an array of numbers as
    input, bubble sorts the numbers, and returns a sorted array.
    Return the source code only. Do not use markdown.
    '''

messages = [{ 'role': 'user', 'content': prompt }]

response = client.chat.completions.create(
    model='gpt-4o',
    messages=messages,
    temperature=0 # Use a low temperature for code generation
)

display(Code(response.choices[0].message.content, language='python'))

LLMs are adept at explaining code, too, and even adding comments to source code. Suppose you wanted to write a tool to add comments to uncommented source-code files before checking them in. Here's a Python file that lacks comments:

In [2]:
with open('Data/create_database.py', 'r') as file:
    contents = file.read()
    display(Code(contents, language='python'))

Now use `GPT-4o` to add comments:

In [3]:
with open('Data/create_database.py', 'r') as file:
    contents = file.read()

    prompt = f'''
        Add comments to the following source code. Return the source code only.
        Do not use markdown.

        {contents}
        '''

    messages = [{ 'role': 'user', 'content': prompt }]
    
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages
    )

    display(Code(response.choices[0].message.content, language='python'))

It works with C# source code as well. Here's an uncommented C# file that uses Microsoft's [ML.NET library](https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet) to train a machine-learning model that performs sentiment analysis:

In [4]:
with open('Data/Program.cs', 'r') as file:
    contents = file.read()
    display(Code(contents, language='C#'))

And here it is after `GPT-4o` dresses it up:

In [5]:
with open('Data/Program.cs', 'r') as file:
    contents = file.read()

    prompt = f'''
        Add comments to the following source code. Return the source code only.
        Do not use markdown.

        {contents}
        '''

    messages = [{ 'role': 'user', 'content': prompt }]
    
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages
    )

    display(Code(response.choices[0].message.content, language='C#'))

Speaking of C#: Can `GPT-4o` rewrite a block of C# code that manually iterates over a `List` to use LINQ?

In [6]:
prompt = '''
    Rewrite the following code to use LINQ. Return the source code only.
    Do not use markdown.

    var picks = new List<DailyStock>();

    foreach (var stock in stocks)
    {
        if (stock.Close > stock.Open)
            picks.Add(stock);
    }

    foreach(var pick in picks)
    {
        Console.WriteLine($"{pick.Symbol}: {pick.Open:c} -> {pick.Close:c}");
    }
    '''

messages = [{ 'role': 'user', 'content': prompt }]

response = client.chat.completions.create(
    model='gpt-4o',
    messages=messages,
    temperature=0
)

display(Code(response.choices[0].message.content, language='C#'))

One of the more remarkable aspects of LLMs is their ability to find bugs in code. Just after my book went to the printer in 2022, I discovered a bug in the latest version of Scikit-learn that prevented some of my samples from working properly. I spent a couple of hours in the source code and found the bug. I filed a [bug report](https://github.com/scikit-learn/scikit-learn/issues/24942) and the Scikit team confirmed the bug and promised to fix it in the next version. I had to scramble to rewrite some of the code samples in my book to work around the bug and get the changes to the printer before the presses started rolling. Let's see if `GPT-4o` can find the bug:

In [7]:
with open('Data/lfw.py', 'r') as file:
    contents = file.read()

    prompt = f'''
        Find the bug in the code below that prevents the _load_imgs function
        from properly cropping images as specified by the slice_ parameter:

        {contents}
        '''

    messages = [{ 'role': 'user', 'content' : prompt }]
    
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages
    )
    
    print(response.choices[0].message.content)

The issue in the code is with how the `crop` method is being used. The `crop` method is called on `pil_img` but the result is not being reassigned or used further. The `Image.crop` method in PIL does not modify the original image in place; instead, it returns a new `Image` object representing the cropped area.

To fix this bug, you need to capture the return value of the `crop` method so that you use the cropped version of the image for further processing. Here's the corrected part of the code:

```python
# Incorrect usage:
# pil_img.crop((w_slice.start, h_slice.start, w_slice.stop, h_slice.stop))

# Corrected usage:
pil_img = pil_img.crop((w_slice.start, h_slice.start, w_slice.stop, h_slice.stop))
```

With this change, the cropped image will be stored in `pil_img`, and any subsequent operations, such as resizing and conversion to a numpy array, will be performed on this cropped image.


The fact that foundational LLMs are capable of understanding code and generating code of their own is a game-changer for application developers. Tools such as [GitHub Copilot](https://github.com/features/copilot) are making programmers more productive and revolutionizing the way software is written. Without LLMs, such tools wouldn't be possible.