This repo contains a sample of using a FM for converting PL/SQL code to Python. The code is prepared to work with
Amazon Claude v3 in Amazon Bedrock (any of its variants) and an Amazon SageMaker Endpoint running
CodeLlama Instruct (
sample deployment script can be found in
deploy_codellama_endpoint.py
).
The code tries to handle the most common causes for error, including:
- Generic errors calling the endpoint (by retrying).
- Timeouts when calling the endpoint (by reducing the amount of output tokens in the request and retrying).
- Incomplete code translations (by iteratively passing the converted code chunk to the FM).
- Incorrect translated code (by making sure that the code compiles before saving it to disk or retrying, if needed).
More details on the process used for conversion can be found below.
Please, consider the cost of running the example in your code. Amazon Bedrock pricing is detailed
here and in case you want to use the Amazon SageMaker endpoint,
consider that the reference script will deploy an Amazon SageMaker endpoint in a
ml.g5.12xlarge
instance. Please review
the AWS SageMaker pricing page to understand the full cost
implications before proceeding.
The code has been tested to work in Python 3.11, you will also need:
- The requirements in
requirements.txt
- If using CodeLlama Instruct for the code conversion:
- A SageMaker endpoint running the FM or AWS quota to launch a
ml.g5.12xlarge
instance as a SageMaker inference endpoint. - An AWS role with permission to create and invoke to SageMaker endpoints. Typically, SageMaker's default
execution role should work fine. When running in SageMaker, the default role will be specified, otherwise
you must supply it to the
deploy_endpoint.py
script as an input argument.
- A SageMaker endpoint running the FM or AWS quota to launch a
You can use deploy_codellama_endpoint.py
to deploy CodeLlama Instruct 13B
with default options (check the script for details). If you run the script in SageMaker, it should be able
to deploy the endpoint correctly, otherwise you can pass the role name in the command line.
# Deploy the endpoint using the default execution role
python deploy_endpoint.py
# Alternatively pass the desired role name in the command line, adapt to your
# default SageMaker execution role name
python deploy_endpoint.py --role-name=[AmazonSageMaker-ExecutionRole-YYYYMMDDTHHMMSS] --accept-license
If everything is successful, you should see some example responses from the model after a few minutes.
The code is translated one stored procedure at a time, and those are separated using a heuristics process,
For this heuristic code to work, make sure that the PROCEDURE
, FUNCTION
& END;
instructions are placed
at the start of the line, with no added spaces before them.
Once you've deployed the endpoint, you can execute the code conversion script. Here are some examples:
# Convert the code in the `scripts` folder using Claude v3 Haiku in Amazon Bedrock
python convert_code.py bedrock --model_id=anthropic.claude-3-haiku-20240307-v1:0
# Convert the code in the `scripts` folder using Claude v3 Sonnet in Amazon Bedrock
python convert_code.py bedrock --model_id=anthropic.claude-3-sonnet-20240229-v1:0
# Convert the code in the `scripts` folder with a SageMaker endpoint provided its name
python convert_code.py sagemaker --endpoint=codellama-13b
# Convert the code in a user-specified folder using Claude v3 Haiku in Amazon Bedrock
python convert_code.py --sources_dir=[PATH_TO_YOUR_SOURCES_DIR] bedrock --model_id=anthropic.claude-3-haiku-20240307-v1:0
# Convert the code in a user-specified folder using Claude v3 Sonnet in Amazon Bedrock
python convert_code.py --sources_dir=[PATH_TO_YOUR_SOURCES_DIR] bedrock --model_id=anthropic.claude-3-sonnet-20240229-v1:0
# Convert the code in a user-specified folder with a SageMaker endpoint provided its name
python convert_code.py --sources_dir=[PATH_TO_YOUR_SOURCES_DIR] sagemaker --endpoint=codellama-13b
This should start a somewhat lengthy process that will write the converted code to a converted/[MODEL_ID]
folder and the non-converted stored procedures to a non-converted/[MODEL_ID]
folder inside the scripts folder.
- If you get an
ValidationException
error when calling theCreateModel
operation (botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateModel operation: The execution role ARN "arn:aws:iam::012345678901:role/Admin" is invalid. Please ensure that the role exists and that its trust relationship policy allows the action "sts:AssumeRole" for the service principal "sagemaker.amazonaws.com".
) please make sure that you execute the code inside SageMaker or to provide the name of the default SageMaker execution role in the command line, as the code is probably using the default CLI role.
The code will process the input code using the flow below, which includes some of the error handling strategies mentiones in the introduction.
The code will look for .pkb
files in the scripts
folder and apply heuristics to extract
individual stored procedures or functions from the code and will try to convert them one by one using
the flow described below. Also, the whole process will be retried up to three times per stored procedure
in case of error (this is not depicted in the diagram below for clarity).
flowchart TD
source_files[("Source files")] -->|Regular expression|routine[["FUNCTION / PROCEDURE"]]
routine --> FM
FM["Foundation Model"] --> fm_output["FM output"]
fm_output --> fm_output_complete{"Complete?"}
fm_output_complete -->|no, i≤4|FM
fm_output_complete -->|no, i>4|non_converted_files
fm_output_complete -->|yes|compiles{"Compiles?"}
compiles -->|no, j≤3|FM
compiles -->|no, j>3|non_converted_files
compiles -->|yes|converted_files[("Converted code")]
non_converted_files[("Code not converted")]
Converted code will be stored in the scripts/[MODEL_ID]/converted
folder, whereas non-converted stored procedures
will be written to scripts/[MODEL_ID]/non-converted
for tracking purposes.
The conversion code has been tested with the examples in scripts
and with internal code. As a general rule:
- Claude models on Bedrock tend to operate better on very long code, being able to produce longer chunks before needing to iterate with more code chunk.
- Simple (yet non-trivial) code tends to yield good translations that follow the original code's intended purpose and inner working.
- Medium-complexity code typically is translated in a way that follows the original intent, but lacks specific details regarding corner case handling and/or specific logical code branches.
- Complex code that includes advanced use or cursors or very long bodies yield poorer performance, with code not being converted or the converted code lacking many details from the original code. Bigger Claude 3 variants appear to perform better for these cases, but manual review/conversion is required.
With that said, you'll see differences in the code style is different between the models, with some examples showing that CodeLlama would convert some stored procedure local variables into function parameters whereas Claude might correctly add them as function variables.
Also, Sonnet appears to provide more correct outputs (pending testing with Opus). Consider the following block of
code from pck_chinook_reports.pkb
:
PROCEDURE GET_ARTIST_BY_ALBUM(P_ARTIST_ID ALBUM.TITLE%TYPE)
IS
V_ARTIST_NAME ARTIST.NAME%TYPE;
BEGIN
SELECT ART.NAME
INTO V_ARTIST_NAME
FROM ALBUM ALB
JOIN ARTIST ART USING (ARTISTID)
WHERE ALB.TITLE = P_ARTIST_ID;
DBMS_OUTPUT
.
PUT_LINE
('ArtistName: ' || V_ARTIST_NAME);
END;
This procedure accepts an input parameter named P_ARTIST_ID
which actually is the album title whose artist
name you want to retrieve. Both CodeLlama and Claude3 Haiku will respect that, but Sonnet actually renames the
variable so that it makes sense:
def GET_ARTIST_BY_ALBUM(db_conn, p_album_title):
with db_conn.cursor() as cursor:
query = """
SELECT ART.NAME
FROM ALBUM ALB
JOIN ARTIST ART USING (ARTISTID)
WHERE ALB.TITLE = :album_title
"""
cursor.execute(query, album_title=p_album_title)
result = cursor.fetchone()
if result:
v_artist_name = result[0]
print(f"ArtistName: {v_artist_name}")
else:
print("No artist found for the given album title.")
Also, Sonnet seems to exhibit a deeper knowledge of Oracle PL/SQL error codes. Think of the following pseudo-code representing a situation found with some internal code:
CREATE PROCEDURE TEST
IS
BEGIN
INSERT INTO TABLE (COLUMNS) VALUES (ACTUAL_VALUES);
EXCEPTION
WHEN DUP_VAL_ON_INDEX
THEN
DO_SOME_THINGS();
WHEN OTHERS
THEN
LOG_ERROR();
End;
While Haiku & CodeLlama Instruct 13B will handle all IntegrityError
s, Sonnet is actually handling the
ORA-00001
message separately from the rest of
errors one might get:
def test(db_conn):
try:
cursor = db_conn.cursor()
sql = """
INSERT INTO TABLE (COLUMNS)
VALUES (:1)
"""
cursor.execute(sql, actual_values)
db_conn.commit()
except cx_Oracle.IntegrityError as e:
if e.args[0].code == 1: # DUP_VAL_ON_INDEX
do_some_things()
else:
log_error()
pck_chinook_reports.pkb
contains some sample PL/SQL stored procedures that have
been translated with this code and left as an example. You can check the results for the different examples
in the converted
folder, and you can test your own code by adding your own pkb files
to the scripts
folder and running the code as described above.
If you deployed the SageMaker endpoint, you can remove it as follows:
# Replace the endpoint name if you specified a custom one
aws sagemaker delete-endpoint --endpoint-name codellama-13b
aws sagemaker delete-endpoint-config --endpoint-config-name codellama-13b