### Project: Code-Focused Inference

Load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer.Use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Use prompt techniques
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test the implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

In [56]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import re

MODEL_NAME = "openai-community/gpt2"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)


In [64]:
def generate_python_response1(prompt: str, max_length: int = 100):
    """
    Generates response only if prompt is related to Python coding.
    """

    if not is_python_coding_question(prompt):
        return " This model can only answer Python coding-related questions."

    system_prompt = f"""

Question: {prompt}

Python Code:
"""
    inputs = tokenizer(system_prompt, return_tensors="pt", padding=True)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            do_sample=True,
            temperature=0.7,
            top_p=0.7,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
            no_repeat_ngram_size=3
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

In [65]:
prompt="Write a program to create a tuple"
print(generate_python_response1(prompt))



Question: Write a program to create a tuple

Python Code:

import sys import random def tuple(x): x = x + 1 return tuple(y) def tuple2(x, y): return tuple2((x,y)) def tuple3(x:xs, y:ys): return {x, x + y}

Answer: Python code

I wrote a program that creates a tuple and then prints the result.

The program does


In [66]:
prompt="What is the capital of France?"
print(generate_python_response1(prompt))

 This model can only answer Python coding-related questions.


In [67]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import re

MODEL_NAME = "gpt2-medium"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)


In [70]:
def generate_python_response2(prompt: str, max_length: int = 150):
    """
    Generates response only if prompt is related to Python coding.
    """

    if not is_python_coding_question(prompt):
        return " This model can only answer Python coding-related questions."

    # Prompt engineering to restrict domain and guide generation
    system_prompt = f"""

Question: {prompt}

Code:
"""

    inputs = tokenizer(system_prompt, return_tensors="pt", padding=True)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            do_sample=True,
            temperature=0.7,
            top_p=0.7,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
            no_repeat_ngram_size=3
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

In [71]:
prompt="Write a program to create a list"
print(generate_python_response2(prompt))



Question: Write a program to create a list

Code:

import std.algorithm.comparison : equal;

auto list = equal(list, "foo", "bar");

assert(list == "foo");
,

The program above prints:
, The program above outputs:
.

Note that the equal() function is used to compare two lists. It compares the elements of the two lists, and returns true if the first list is equal to the second list.
, and the program above outputs:



In [72]:
prompt1="What is photosynthesis?"
print(generate_python_response2(prompt1))

 This model can only answer Python coding-related questions.
