<a href="https://colab.research.google.com/github/Syed8855/Titanic_Dataset_Profile_Repo/blob/main/Build_Your_Own_GPT_Creating_a_Custom_Text_Generation_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Assignment: Code-Focused Inference**

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Before generating a response, check if the input prompt is related to Python coding. You can use simple keyword matching (e.g., "Python", "code", "function", "class", "import") or a more sophisticated approach using a text classification model (optional).
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

## Load Model and Tokenizer

## Implement a Filtering Mechanism

## Generate Response

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

#Load Model and Tokenizer

class PythonCodeGPT:
    def __init__(self, model_name='gpt2-medium'):
        print(f"Loading model: {model_name}...")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.python_keywords = [
            'python', 'code', 'function', 'class', 'import', 'list',
            'dict', 'tuple', 'pandas', 'numpy', 'def', 'return', 'for',
            'while', 'try', 'except', 'lambda', 'pip'
        ]
        print(f"Model loaded successfully on device: {self.device}")

#Implement a Filtering Mechanism

    def _is_python_question(self, prompt: str) -> bool:
        prompt_lower = prompt.lower()
        return any(keyword in prompt_lower for keyword in self.python_keywords)


#Generate Response

    def generate_response(self, prompt: str, max_length: int = 100) -> str:
        if not self._is_python_question(prompt):
            return "I'm sorry, I can only answer questions about Python coding."

        inputs = self.tokenizer.encode(
            prompt,
            return_tensors='pt'
        ).to(self.device)

        outputs = self.model.generate(
            inputs,
            max_length=max_length,
            num_return_sequences=1,
            pad_token_id=self.tokenizer.eos_token_id,
            no_repeat_ngram_size=2
        )

        generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return generated_text

In [2]:
python_bot = PythonCodeGPT()

Loading model: gpt2-medium...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model loaded successfully on device: cpu


## TEST

In [3]:
test_prompts = [
    "How do I write a function in Python to add two numbers?",
    "What is the capital of France?",
    "Can you show me a Python code example for reading a file?",
    "What's the weather like today?",
    "Explain the 'import' statement in Python."
]

print("\n Testing Prompts ")
for i, prompt in enumerate(test_prompts):
    print(f"\n[Test {i+1}]")
    print(f"User Prompt: {prompt}")
    response = python_bot.generate_response(prompt)
    print(f"Bot Response: {response}")

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.



 Testing Prompts 

[Test 1]
User Prompt: How do I write a function in Python to add two numbers?
Bot Response: How do I write a function in Python to add two numbers?

The answer is to use the add function.
, , and are the two arguments to the function, and the result is the number of the second argument. The add() function takes two parameters, the first is a number and a string. If you want to write the same function as above, you can use a list of numbers. For example, to create a new function that adds two integers, we can write:

[Test 2]
User Prompt: What is the capital of France?
Bot Response: I'm sorry, I can only answer questions about Python coding.

[Test 3]
User Prompt: Can you show me a Python code example for reading a file?
Bot Response: Can you show me a Python code example for reading a file?

Yes, you can.
, the file is read from the standard input, and the output is written to the stdout. The file can be read by any program that can read files. For example, if you w

## DYNAMIC TESTING

In [4]:
print("Starting the interactive Python Code Bot...")
print("Type 'quit' or 'exit' to end the conversation.")
print("-" * 40)

while True:
    user_input = input("> You: ")

    if user_input.lower() in ['quit', 'exit']:
        print("\nü§ñ Bot: Goodbye!")
        break

    response = python_bot.generate_response(user_input)
    print(f"\nü§ñ Bot: {response}\n")

Starting the interactive Python Code Bot...
Type 'quit' or 'exit' to end the conversation.
----------------------------------------
> You: difference between char and str

ü§ñ Bot: I'm sorry, I can only answer questions about Python coding.

> You: difference between list and tuples

ü§ñ Bot: difference between list and tuples.

The difference between lists and lists of tuple is that tupled lists are immutable. This means that if you want to change the order of elements in a list, you can't change its order. You can, however, change how the elements are ordered. For example, if we wanted to add a new element to a tuple, we could do so by adding a value to the tuple. If we want the element in the list to be the

> You: quit

ü§ñ Bot: Goodbye!




# **Conclusion**

* The project demonstrated how placing a **filter in front of a pre-trained LLM** can create a specialized chatbot.
* A **simple keyword filter** worked well, allowing only Python-related queries and rejecting off-topic ones, proving that even basic input control is effective.
* However, the **base model (gpt2-medium)** showed clear limitations. Its answers were often inaccurate‚Äîfor example, incorrectly stating that ‚Äúlists are immutable.‚Äù It produced text that looked technical but lacked factual correctness.
* The experiment highlights that **general-purpose models are insufficient** for expert tasks.
* Overall, the project underscored both the **power of filtering** and the **importance of domain-specific models**.

