# Welcome to the Prompt Engineering Workshop
Let's first install the pre-requisite python packages

In [None]:
# pandas is a library for parsing CSVs
!pip install pandas
# also as we will be using Claude, hence we need to install anthropic
!pip install anthropic
# and for testing we need a testing framework
!pip install pytest 

We will be generating code for parsing some CSVs so let's create the CSV file to work with quickly

In [None]:
# For workshop purposes, so that everyone can access the same data, we are writing the file dynamically
import csv

def create_csv(file_name):
    # Create data as list of lists
    data = [
        ["Flow ID","Tx Frames","Rx Frames","Frames Delta","Loss %","Tx Frame Rate","Rx Frame Rate","Tx L1 Rate (bps)","Rx L1 Rate (bps)","Rx Bytes","Tx Rate (Bps)","Rx Rate (Bps)","Tx Rate (bps)","Rx Rate (bps)","Tx Rate (Kbps)","Rx Rate (Kbps)","Tx Rate (Mbps)","Rx Rate (Mbps)","Store-Forward Avg Latency (ns)","Store-Forward Min Latency (ns)","Store-Forward Max Latency (ns)","First TimeStamp","Last TimeStamp"],
        ["Flow 1",1000,1000,0,0.000,0.000,0.000,0.000,0.000,66000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0,0,0,"00:00:01.063","00:00:04.393"],
        ["Flow 2",1000,100,0,0.000,0.000,0.000,0.000,0.000,66000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0,0,0,"00:00:01.063","00:00:04.393"],
        ["Flow 3",1000,500,0,0.000,0.000,0.000,0.000,0.000,66000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0,0,0,"00:00:01.063","00:00:04.393"],
        ["Flow 4",1000,"N/A",0,0.000,0.000,0.000,0.000,0.000,66000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0.000,0,0,0,"00:00:01.063","00:00:04.393"]
    ]

    # Write to CSV file
    with open(file_name, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(data)

create_csv("stats.csv")


In [None]:
# check whether the file is created
!cat stats.csv

## Let's get started with our first prompt
### Level 1: The "Zero-Shot"
Asking the model to perform a task with no examples, no context, and no constraints. You are relying entirely on the model's training data assumptions.

In [None]:
from anthropic import AnthropicFoundry
import re

def extract_and_save_python_code(text, output_file='output.py'):
    # Pattern to match ```python ... ``` blocks
    pattern = r'```python\s*(.*?)```'
    
    # Find all matches (re.DOTALL allows . to match newlines)
    matches = re.findall(pattern, text, re.DOTALL)
    
    # Join all code blocks if multiple exist
    python_code = '\n\n'.join(matches)
    
    # Write to file
    with open(output_file, 'w') as f:
        f.write(python_code)
    
    return python_code
 
def get_response_from_llm(prompt, token_limit=1024):
    message = client.messages.create(
        model=deployment_name,
        messages=[
            {"role": "user", "content": prompt}
        ],
        max_tokens=token_limit,
    )
    return message.content[0].text

# API Configuration
endpoint = "https://foundary-codegen-1.services.ai.azure.com/anthropic/"
deployment_name = "claude-sonnet-4-5"
api_key = ""

client = AnthropicFoundry(
    api_key=api_key,
    base_url=endpoint
)

prompt = """
Write a python script to read a csv file called 'stats.csv' 
calculate the total loss by looking at Tx Frames and Rx Frames.
I want you to calculate loss as (tx - rx) / tx * 100 and print the result for all Flow 2, Flow3 and Flow 4.
we will have multiple columns but the relevant columns are Flow ID, Tx Frames, Rx Frames.
return only the python code without any explanation or comments.
"""

response = get_response_from_llm(prompt)
print(response)
extract_and_save_python_code(response, output_file='traffic_loss_calculator.py')


In [None]:
# let's run the python code generated by the model
!python traffic_loss_calculator.py

got an error right ! Do you know why ? because the LLM is not at all aware of the format of your csv

### Level 2: The "Few-Shot"
Providing examples (shots) of the input data and the desired output format. This guides the model's pattern recognition engine to handle data formatting

In [None]:
prompt = """
Write a python script to read a csv file called 'stats.csv' 
Calculate the total loss by looking at Tx Frames and Rx Frames.
I want you to calculate loss as (tx - rx) / tx * 100 and print the result for all Flow 2, Flow3 and Flow 4.
We will have multiple columns but the relevant columns are Flow ID, Tx Frames, Rx Frames.

Its a messy csv
Here is how my csv looks like partially:
"Flow ID","Tx Frames","Rx Frames"
"Flow 1",1000,1000,....
"Flow 2",1000,"N/A",....
.
.
.
"Flow N",1000,200,....

Return only the python code without any explanation or comments.
"""

response = get_response_from_llm(prompt)
print(response)
extract_and_save_python_code(response, output_file='traffic_loss_calculator2.py')

In [None]:
# now lets run the new code generated by the model
!python traffic_loss_calculator2.py

Nice right! but it still has a lot of things it cannot handle , like proper type casts, missing rows, missing files etc
This leads to 

### Level 3: Chain of Thought

Forcing the model to "Show its work" or verbalize reasoning before generating code. This reduces hallucination and ensures edge cases are considered.

In [None]:
prompt = """
Write a python script to read a csv file called 'stats.csv' 
Calculate the total loss by looking at Tx Frames and Rx Frames.
I want you to calculate loss as (tx - rx) / tx * 100 and print the result for all Flow 2, Flow3 and Flow 4.
We will have multiple columns but the relevant columns are Flow ID, Tx Frames, Rx Frames.

Its a messy csv
Here is how my csv looks like partially:
"Flow ID","Tx Frames","Rx Frames"
"Flow 1",1000,1000,....
"Flow 2",1000,"N/A",....
.
.
.
"Flow N",1000,200,....

Before coding, consider the following edge cases and handle them in your code:
1. How do we handle missing files?
2. How do we handle non-numeric rows?
3. How do we safely convert types?

Return only the python code without any explanation or comments.
"""

response = get_response_from_llm(prompt, token_limit=2048)
print(response)
extract_and_save_python_code(response, output_file='traffic_loss_calculator3.py')

In [None]:
# let's now delete the file and see if the new code handles it gracefully and returns proper error message
!rm stats.csv
!python traffic_loss_calculator3.py

Seems pretty neat !, but is it maintainable ? Is it like a code written by someone senior ?

### Level 4: Role + Constraints

Assigning a Persona (Senior Engineer) and strict Constraints (Standards) to force production-quality structure and formatting.

In [None]:
# let's create the csv again as we deleted it in previous step
create_csv("stats.csv")

prompt = """
Write a python script to read a csv file called 'stats.csv' 
Calculate the total loss by looking at Tx Frames and Rx Frames.
I want you to calculate loss as (tx - rx) / tx * 100 and print the result for all Flow 2, Flow3 and Flow 4.
We will have multiple columns but the relevant columns are Flow ID, Tx Frames, Rx Frames.

Its a messy csv
Here is how my csv looks like partially:
"Flow ID","Tx Frames","Rx Frames"
"Flow 1",1000,1000,....
"Flow 2",1000,"N/A",....
.
.
.
"Flow N",1000,200,....

Before coding, consider the following edge cases and handle them in your code:
1. How do we handle missing files?
2. How do we handle non-numeric rows?
3. How do we safely convert types?

Act as a Senior Engineer.
Constraints:
1. Use `argparse` for CLI usage.
2. Use Type Hinting (PEP 484).
3. Use `logging` instead of print.
4. Modular functions.

Return only the python code without any explanation or comments.
"""
response = get_response_from_llm(prompt, token_limit=2048)
print(response)
extract_and_save_python_code(response, output_file='traffic_loss_calculator4.py')

In [None]:
# let's run the generated code to see if it works
!python traffic_loss_calculator4.py --file stats.csv

Great! nice formatting, but is it scalable can it handle 50GB size of CSV ?

### Level 5: Reflection (Optimization)

Asking the model to critique its own work for performance or security flaws before generating the final code. This catches hidden issues.

In [None]:
prompt = """
Write a python script to read a csv file called 'stats.csv' 
Calculate the total loss by looking at Tx Frames and Rx Frames.
I want you to calculate loss as (tx - rx) / tx * 100 and print the result for all Flow 2, Flow3 and Flow 4.
We will have multiple columns but the relevant columns are Flow ID, Tx Frames, Rx Frames.

Its a messy csv
Here is how my csv looks like partially:
"Flow ID","Tx Frames","Rx Frames"
"Flow 1",1000,1000,....
"Flow 2",1000,"N/A",....
.
.
.
"Flow N",1000,200,....

Before coding, consider the following edge cases and handle them in your code:
1. How do we handle missing files?
2. How do we handle non-numeric rows?
3. How do we safely convert types?

Act as a Senior Engineer.
Constraints:
1. Use `argparse` for CLI usage.
2. Use Type Hinting (PEP 484).
3. Use `logging` instead of print.
4. Modular functions.

Critique your initial plan for **Memory Safety** assuming a 50GB file size.
Correction: Rewrite logic to be memory efficient

Return only the python code without any explanation or comments.
"""
response = get_response_from_llm(prompt, token_limit=2048)
print(response)
extract_and_save_python_code(response, output_file='traffic_loss_calculator5.py')

In [None]:
# lets check the new code for memory efficiency
!python traffic_loss_calculator5.py --file stats.csv

Now you have a memory efficient , fully structured well documented functioning code ! but is it ?
There are other approaches right ? We just considered one. are there other better approaches ?

### Level 6: Tree of Thoughts

Generating multiple solutions, evaluating the trade-offs (Pros/Cons) of each, and deliberately selecting the best one.

In [None]:
with open('traffic_loss_calculator5.py', 'r') as f:
    code_content = f.read()

prompt = f"""
For my use-case of parsing a csv file which can go upto 50GB in size,
you provided me a well structured code earlier. Now I want you to review the code for memory efficiency.

Step 1: Brainstorm 3 approaches (Pandas vs CSV Lib vs Polars).
Step 2: Evaluate Pros/Cons for a **microservice with low memory**.
Step 3: Pick the winner and change the code accordingly.

Here is the code:
{code_content}

explain why you chose the approach and tradeoffs with other approaches.
when explaining the approach do not use ``python ... ``` block. it should be plain text.
Then return the final python code within ```python ... ``` block. it should appear only once. 
"""

response = get_response_from_llm(prompt, token_limit=3000)
print(response)
# first we need to save the response in md so that we can see the explanation as well as code properly
with open('code_review_traffic_loss_calculator.md', 'w', encoding='utf-8') as f:
    f.write(response)

# now lets extract the code and save it to a new file
extract_and_save_python_code(response, output_file='traffic_loss_calculator6.py')

In [None]:
# lets check the new code is still running fine
!python traffic_loss_calculator6.py --file stats.csv

At this time everything seems complete right ? but have we tested all the error scenarios ?
Lets generate Test cases but with a different prompt format

### Level 7: Decomposition

Breaking a large request into **structural components** (Architecture, Interface, Implementation). This moves from scripting to software engineering

In [None]:
prompt = f"""
We have created a python script to read a csv file called 'stats.csv'
The code is saved with filename 'traffic_loss_calculator5.py'.
and we can execute it as: python traffic_loss_calculator5.py --file stats.csv 
Here is the code:
{code_content}

I want to generate unit tests for this code using simple python functions and pytest framework.
Let's break down the task which needs to be done.
1. Analyze the code and identify key areas we can test
2  Create test cases where you will creating mock data to simulate different scenarios and call the python code.
4. please ensure if you are creating mock data files, you clean them up after the test is done.
3. Finally combine all the test -cases maybe 3or 4 in number in a single python file.


Return only the python code within ```python ... ``` block.
"""
response = get_response_from_llm(prompt, token_limit=4000)
print(response)

extract_and_save_python_code(response, output_file='check_loss_calculator.py')

In [None]:
# lets check the test now 
!pytest -s -v check_loss_calculator.py

Now we have unit test to actually test each part of the code.
Finally coming up with this prompt is something you need to do a lot of stuff 
Is there a standard way where we can get a structure a build upon it ?

### Level 8: Meta-Prompting

The Force Multiplier. Instead of writing the code prompt yourself, you ask the AI to write the perfect prompt for you.


In [None]:
prompt = f"""You are a prompt engineering expert.
Generate a prompt to achieve this use-case :
We have a python script to read a csv file called 'stats.csv' the code should memory efficient and handle edge cases.
The prompt should instruct that code quality should be modular and of senior engineer level.
"""

response = get_response_from_llm(prompt)
print(response)

with open('prompt.md', 'w', encoding='utf-8') as f:
    f.write(response)