<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_02_1_dev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 2: Code Generation**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 2 Material

* **Part 2.1: Prompting for Code Generation** [[Video]]() [[Notebook]](t81_559_class_02_1_dev.ipynb)
* Part 2.2: Handling Revision Prompts [[Video]]() [[Notebook]](t81_559_class_02_2_multi_prompt.ipynb)
* Part 2.3: Using a LLM to Help Debug [[Video]]() [[Notebook]](t81_559_class_02_3_llm_debug.ipynb)
* Part 2.4: Tracking Prompts in Software Development [[Video]]() [[Notebook]](t81_559_class_02_4_software_eng.ipynb)
* Part 2.5: Limits of LLM Code Generation [[Video]]() [[Notebook]](t81_559_class_02_5_code_gen_limits.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [None]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Note: using Google CoLab


# 2.1: Prompting for Code Generation

## OpenAI for Code Generation

LLMs are adept at generating code and can considerably boost programmers' productivity. This technical course requires you to create programs for the assignments. You might wonder if I consider it  "cheating" to utilize LLMs to help you write your homework assignments. For this course, I do not consider it cheating to use AI to help you with assignments; I expect such utilization in this course.

You can use the same OpenAI LLMs that your OpenAI grants access to for code generation. You also have other options, which may give you access to even greater code generation capabilities, though OpenAI should be sufficient for this class.

There are three possible LLM-based code generation tools. All three require additional fees for use.

* [GitHub CoPilot](https://github.com/features/copilot)
* [ChatGPT](https://chat.openai.com/)
* [Amazon CodeWhisperer](https://aws.amazon.com/codewhisperer/)

You can use the code below to access OpenAI for code generation.

In [1]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_ollama.llms import OllamaLLM
from IPython.display import display_markdown

MODEL = 'codegemma:latest'

def generate_code(prompt):
  messages = [
      SystemMessage(
          content="You are a helpful assistant that writes reliable computer program code."
      ),
      HumanMessage(content=prompt),
  ]

  # Initialize the OpenAI LLM with your API key
  llm = OllamaLLM(
    model=MODEL,
    temperature= 0.0,
    n= 1)

  print(MODEL)
  print("Model response:")
  output = llm.invoke(messages)
  display_markdown(output, raw=True)

With the above function defined, you can now generate code. The code below generates a Python function to create a Fibonacci sequence.

In [2]:
generate_code("""Write Python code to return a fibonacci sequence of a length specified by the parameter l.""")

codegemma:latest
Model response:


```python
def fibonacci(l):
    """
    Returns a fibonacci sequence of a length specified by the parameter l.

    Args:
        l: The length of the fibonacci sequence.

    Returns:
        A list containing the fibonacci sequence.
    """

    fib_sequence = [0, 1]
    for i in range(2, l):
        fib_sequence.append(fib_sequence[i-1] + fib_sequence[i-2])

    return fib_sequence
```

**Example Usage:**

```python
# Get the length of the fibonacci sequence from the user.
l = int(input("Enter the length of the fibonacci sequence: "))

# Generate the fibonacci sequence.
fib_sequence = fibonacci(l)

# Print the fibonacci sequence.
print(fib_sequence)
```

**Output:**

```
Enter the length of the fibonacci sequence: 10
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
```

## Generating Methods

In [7]:
generate_code("""
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
Additionally, build a dictionary of columns to create the Pandas dataframe.
""")

codegemma:latest
Model response:


```python
import pandas as pd

def loan_amortization(loan_amount, apr, term):
    """
    Calculates the loan amortization schedule.

    Args:
        loan_amount: The amount of the loan.
        apr: The interest rate.
        term: The number of months in the loan.

    Returns:
        A Pandas dataframe containing the loan amortization schedule.
    """

    # Calculate the monthly interest rate.
    monthly_interest_rate = apr / 12 / 100

    # Calculate the monthly payment.
    monthly_payment = loan_amount * (monthly_interest_rate / (1 - (1 + monthly_interest_rate) ** (-term)))

    # Create a dictionary of columns for the dataframe.
    columns = {
        "month": [],
        "amount": [],
        "principal": [],
        "interest": [],
        "payment": [],
    }

    # Calculate the amortization schedule.
    for month in range(1, term + 1):
        # Calculate the amount left on the loan.
        amount = loan_amount - (monthly_payment * (month - 1))

        # Calculate the principal and interest payments.
        principal = monthly_payment - amount * monthly_interest_rate
        interest = amount * monthly_interest_rate

        # Add the data to the dictionary of columns.
        columns["month"].append(month)
        columns["amount"].append(amount)
        columns["principal"].append(principal)
        columns["interest"].append(interest)
        columns["payment"].append(monthly_payment)

    # Create the Pandas dataframe.
    df = pd.DataFrame(columns)

    return df
```

In [10]:
import pandas as pd

def loan_amortization(loan_amount, apr, term):
    """
    Calculates the loan amortization schedule.

    Args:
        loan_amount: The amount of the loan.
        apr: The interest rate.
        term: The number of months in the loan.

    Returns:
        A Pandas dataframe containing the loan amortization schedule.
    """

    # Calculate the monthly interest rate.
    monthly_interest_rate = apr / 12 / 100

    # Calculate the monthly payment.
    monthly_payment = loan_amount * (monthly_interest_rate / (1 - (1 + monthly_interest_rate) ** (-term)))

    # Create a dictionary of columns for the dataframe.
    columns = {
        "month": [],
        "amount": [],
        "principal": [],
        "interest": [],
        "payment": [],
    }

    # Calculate the amortization schedule.
    for month in range(1, term + 1):
        # Calculate the amount left on the loan.
        amount = loan_amount - (monthly_payment * (month - 1))

        # Calculate the principal and interest payments.
        principal = monthly_payment - amount * monthly_interest_rate
        interest = amount * monthly_interest_rate

        # Add the data to the dictionary of columns.
        columns["month"].append(month)
        columns["amount"].append(amount)
        columns["principal"].append(principal)
        columns["interest"].append(interest)
        columns["payment"].append(monthly_payment)

    # Create the Pandas dataframe.
    df = pd.DataFrame(columns)

    return df

# Example usage
loan_amount = 100000
apr = 0.05
term = 360

amortization_df = loan_amortization(loan_amount, apr, term)

print(amortization_df.head())


   month         amount   principal  interest     payment
0      1  100000.000000  275.705440  4.166667  279.872106
1      2   99720.127894  275.717101  4.155005  279.872106
2      3   99440.255787  275.728762  4.143344  279.872106
3      4   99160.383681  275.740424  4.131683  279.872106
4      5   98880.511575  275.752085  4.120021  279.872106


```
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
```

## Generating Larger Programs



In [9]:
generate_code("""
Create a PyTorch GPU-enabled neural network for a Kaggle competition that asks me to predict the age of people in provided images.
The images are stored at this path: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age.
The training data is in the file: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv.
The training data has 3 columns, id, filename, and age. The field age is the target, to be predicted, numeric age in years of
the person. The file contains the filename of the image that corresponds to each row, the images are named 1.jpg, 2.jpg, etc,
which corresponds to both the id and the filename fields. There is also a test dataset that we must generate a submission
dataframe for. The test data is in the file /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv,
and has the id and filename columns, but we need to generate a submit dataframe with just id and age(the prediction). Train the neural network, use early stopping and generate the submit dataframe.
""")

codegemma:latest
Model response:


```python
import torch
import torch.nn as nn
import pandas as pd
from PIL import Image
from torchvision.transforms import transforms

# Load the training data
train_df = pd.read_csv('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv')

# Define the data transformation
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

# Define the neural network
class AgeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True)
        self.model.fc = nn.Linear(self.model.fc.in_features, 1)

    def forward(self, x):
        return self.model(x)

# Create the neural network
model = AgeNet().cuda()

# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

# Train the neural network
epochs = 10
early_stopping_counter = 0
best_val_loss = float('inf')

for epoch in range(epochs):
    # Train the model
    model.train()
    for filename, age in zip(train_df['filename'], train_df['age']):
        image = Image.open(f'/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/faces/{filename}')
        image = transform(image).cuda()
        age = torch.tensor([age]).cuda()

        optimizer.zero_grad()
        output = model(image)
        loss = criterion(output, age)
        loss.backward()
        optimizer.step()

    # Validate the model
    model.eval()
    val_loss = 0
    for filename, age in zip(train_df['filename'], train_df['age']):
        image = Image.open(f'/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/faces/{filename}')
        image = transform(image).cuda()
        age = torch.tensor([age]).cuda()

        output = model(image)
        val_loss += criterion(output, age)

    val_loss /= len(train_df)

    # Early stopping
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        early_stopping_counter = 0
    else:
        early_stopping_counter += 1

    if early_stopping_counter > 3:
        break

# Load the test data
test_df = pd.read_csv('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv')

# Generate the submit dataframe
submit_df = pd.DataFrame({'id': test_df['id']})

# Predict the ages
model.eval()
for filename in test_df['filename']:
    image = Image.open(f'/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/faces/{filename}')
    image = transform(image).cuda()

    output = model(image)
    submit_df = submit_df.append({'id': filename, 'age': output.item()}, ignore_index=True)

# Save the submit dataframe
submit_df.to_csv('submission.csv', index=False)
```

# Module 2 Assignment

You can find the first assignment here: [assignment 2](https://github.com/jeffheaton/app_deep_learning/blob/main/assignments/assignment_yourname_class2.ipynb)