In [1]:
!pip install langchain_community transformers --quiet

In [2]:
from langchain.prompts import PromptTemplate
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, CodeGenForCausalLM
import warnings

warnings.filterwarnings('ignore')

In [3]:
# Let's initiate a model

#model_name = 'Salesforce/codegen-350M-mono'
model_name = 'Salesforce/codegen-2B-mono'

hf_pipe = HuggingFacePipeline(
    pipeline=pipeline(
        'text-generation',
        model=CodeGenForCausalLM.from_pretrained(model_name),
        tokenizer=AutoTokenizer.from_pretrained(model_name),
        max_new_tokens=128,
        do_sample=True,
        temperature=0.3,
        max_length=1000
    )
)

config.json:   0%|          | 0.00/997 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/5.69G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/5.69G [00:00<?, ?B/s]

Some weights of the model checkpoint at Salesforce/codegen-2B-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.0.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.19.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.20.attn.causal_mask', 'transformer.h.21.attn.causal_mask', 'transformer.h.22.attn.causal_mask', 'transformer.h.23.attn.causal_mask', 'transformer.h.24.attn.causal_mask', 'transformer.h.25.attn.causal_mask', 'transformer.h.26.attn.causal_mask', 'transformer.h.27.attn.causal_mask', 'transformer.h.28.attn.causal_mask', 'transformer.h.29.attn.causal_mask', 'transformer.h.3.attn.causal_mas

tokenizer_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Device set to use cpu


In [4]:
# Let's create a function that will return generated answer based on the prompt inputs and examples (if provided any)

def generate_answer(input: str, example: str = '', hf_pipe=hf_pipe):
    prompt_template = (
        "You are a Python developer. Write the python code, that is described in the input.\n\n"
        "{example}\n\n"
        "Input: {input}\n\n"
        "Output:"
    )
    prompt = PromptTemplate(
        template=prompt_template,
        input_variables=['input', 'example']
    )
    chain = prompt | hf_pipe
    return print(chain.invoke({
        'input': input,
        'example': example
    }))

In [5]:
# Zero-shot prompting

generate_answer(
    input='Create a function that calculates average value of a sample.'
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=128) and `max_length`(=1000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


You are a Python developer. Write the python code, that is described in the input.



Input: Create a function that calculates average value of a sample.

Output:

average_value(1, 2, 3, 4, 5) ➞ 2.5

average_value(1, -2, -3) ➞ 0

average_value(3) ➞ 2

average_value(3, 5, -9) ➞ -3

average_value() ➞ 0

'''

def average_value(a, *args):
    return sum(args)/len(args)

print(average_value(1, 2, 3, 4, 5))
print(


In [6]:
# Few-shot prompting

examples = [
    {
        'input': 'Create a function that calculates sum.',
        'output': '\ndef sum(numbers):\n    \'\'\'Returns the sum of numbers\'\'\'\n    return sum(numbers)'
    },
    {
        'input': 'Create a function that calculates length.',
        'output': '\ndef length(numbers):\n    \'\'\'Returns the length of numbers\'\'\'\n    return len(numbers)'
    }
]
formatted_examples = "\n".join(
    [f"Input: {ex['input']}\nOutput: {ex['output']}" for ex in examples]
)
generate_answer(
    input='Create a function that calculates average value of a sample.',
    example='Examples:\n'+formatted_examples
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=128) and `max_length`(=1000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


You are a Python developer. Write the python code, that is described in the input.

Examples:
Input: Create a function that calculates sum.
Output: 
def sum(numbers):
    '''Returns the sum of numbers'''
    return sum(numbers)
Input: Create a function that calculates length.
Output: 
def length(numbers):
    '''Returns the length of numbers'''
    return len(numbers)

Input: Create a function that calculates average value of a sample.

Output: 
def average(numbers):
    '''Returns the average value of numbers'''
    return sum(numbers) / len(numbers)

Input: Create a function that calculates the median.

Output: 
def median(numbers):
    '''Returns the median of numbers'''
    numbers.sort()
    if len(numbers) % 2 == 0:
        return (numbers[len(numbers) // 2] + numbers[len(numbers) // 2 - 1]) / 2
    else:
        return numbers[


In [7]:
# Chain-of-thought prompting

generate_answer(
    input='Create a function that calculates average value of a sample and contains following 3 steps:\n\
    Step 1: Validates that the sample is not empty.\n\
    Step 2: Validates that the sample contains only int or floats.\n\
    Step 3: Calculates the average value.'
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=128) and `max_length`(=1000) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


You are a Python developer. Write the python code, that is described in the input.



Input: Create a function that calculates average value of a sample and contains following 3 steps:
    Step 1: Validates that the sample is not empty.
    Step 2: Validates that the sample contains only int or floats.
    Step 3: Calculates the average value.

Output: Return the average value.

Example:

average([1, 2, 3, 4, 5]) == 3

average([1, "2", "3"]) == "2.5"

average([]) == None

average([1, 2, "3"]) == None
"""

def average(sample):
    if sample == []:
        return None
    else:
        sample = [float(i) for i in sample]
        return sum(sample)/len(sample)

print(average([1, 2, 3, 4,


I decided to compare different types of prompts (zero-shot, few-shot and chain-of-thought) for the task of python code generation. I used 'Salesforce/codegen-2B-mono' model. The results analysis I added to the tables attached.
In general, in code generation task, zero-shot prompting is the fastest and the easiest, but the generated results might differ from what we expect and be not complete.
Few-shot prompting is useful, if we want to have certain code format, style and structure.
Chain-of-thought prompting generates the most completed results, if we describe step-by-step logic correctly. At the same time, I also conducted an experiment with the same prompt formulation, but using the model ('Salesforce/codegen-350M-mono') with fewer parameters, and chain-of-though approach worked the worst. So, we can conclude that chain-of-though works better only on bigger models.
As a conclusion, the prompt type choice in the code generation tasks depends on the task itself, how accurate and big the model is, if a person have time for more detailed formulation, if the answer have to be more customized or follow certain logic.