### Assignment: Code-Focused Inference

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Before generating a response, check if the input prompt is related to Python coding. You can use simple keyword matching (e.g., "Python", "code", "function", "class", "import") or a more sophisticated approach using a text classification model (optional).
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

2025-09-21 17:23:08.050665: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1758475388.227898      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1758475388.280433      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


## 1. Loading Models and Tokenizer

In [2]:
model_name = "gpt2"  
model_name2 = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer2 = AutoTokenizer.from_pretrained(model_name2)

model = AutoModelForCausalLM.from_pretrained(model_name)
model2 = AutoModelForCausalLM.from_pretrained(model_name2)


# Build pipeline for convenience
generator = pipeline("text-generation", model=model, tokenizer=tokenizer ) #, device=0 if torch.cuda.is_available() else -1)
generator2 = pipeline("text-generation", model=model2, tokenizer=tokenizer)  # , device=0 if torch.cuda.is_available() else -1) this code if for my gpu use but this time i using kaggle


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cuda:0
Device set to use cuda:0


## 2. Filtering mechanism (simple keyword-based)

In [3]:
def is_python_question(prompt: str) -> bool:
    keywords = ["python", "code", "function", "class", "import", "def", "list", "dict"]
    prompt_lower = prompt.lower()
    return any(keyword in prompt_lower for keyword in keywords)

## 3. Generate response if coding related, else return fallback

In [4]:
def code_focused_inference(prompt: str, max_length: int = 150):
    if is_python_question(prompt):
        response = generator(prompt, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
        return response[0]["generated_text"]
    else:
        return "⚠️ This model only answers Python coding-related questions."

def code_focused_inference2(prompt: str, max_length: int = 150):
    if is_python_question(prompt):
        response = generator2(prompt, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
        return response[0]["generated_text"]
    else:
        return "⚠️ This model only answers Python coding-related questions."

## 4. Testing with different inputs

In [6]:
test_prompts = [
    "Write a Python function to reverse a string.",
    "What is the capital of France?",
    "Show me how to import pandas and create a DataFrame.",
    "Who won the World Cup in 2018?",
    "Give me list of python Data Structure ?",
    "Who is the owner of Tesla",
    "What is the meaning of list in python",
    "What is Youtube",
    "Who created the Python coding language and when"
    
]

for prompt in test_prompts:
    print(f"Prompt: {prompt}")
    print('----'*20)
    print(f"Response: {code_focused_inference(prompt)}\n")
    print('----'*20)
    print(f"Response2: {code_focused_inference2(prompt)}\n")
    print('----'*20)
    print('----'*20)
    
    

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Prompt: Write a Python function to reverse a string.
--------------------------------------------------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response: Write a Python function to reverse a string.

def reverse_string ( args ): return {'a': False, 'b': False}

So, let's say we have a pattern on a string, and a pattern of types "a", "b", and "c" on a string.

def reverse_pattern ( args ): from sqlite3 import ReverseString def get_pattern ( pattern ): if pattern.startswith( 'a' ): print "%s".format(pattern.strip() + " " + pattern.width) def get_pattern ( pattern ): print "%s".format(pattern.strip() + " " + pattern.width + " " + pattern.length) def get_pattern_or_strip ( pattern ): print "%s".format(pattern.strip() + " " + pattern.width) def get_pattern_or_strip_all ( pattern ): print "%s".format(pattern.strip() + " " + pattern.width + " " + pattern.length) def get_pattern_or_strip_all_all ( pattern ): print "%s".format(pattern.strip() + " " + pattern.width + " " + pattern.length) def get_pattern_

--------------------------------------------------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response2: Write a Python function to reverse a string.

A Python function that allows reversing pairs of strings.

A Python function that turns a string into a list.

A Python function that makes a list of the characters that you enter.

A Python function that makes a list of the characters that you select.

A Python function that makes a list of the characters that you change.

A Python function that makes a list of the characters that you change.

A Python function that makes a list of the characters that you insert.

A Python function that makes a list of the characters that you insert.

A Python function that makes a list of the characters that you change.

A Python function that makes a list of the characters that you set.

A Python function that makes a list of the characters that you set.

A Python function that makes a list of the characters that you set.

A Python function that makes a list of the characters that you check.

A Python function that makes a list of the characte

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response: Show me how to import pandas and create a DataFrame.

import pandas as pd import pandas.core.model.DB import pandas.core.model.Constant class DataFrame () {

return (

<DataFrame a="{}" b="{}" />)

}

}

Output:

{ "id" : "828c1b-4f5b-4f66-ba55-0d20b5ca5b9", "title" : "Constant DataFrame", "version" : "0.10.0", "description" : "Constant DataFrame", "data" : [ "data:data" ]

}

Output:

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame.

Constant DataFrame

--------------------------------------------------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response2: Show me how to import pandas and create a DataFrame.

import pandas as pd import numpy as np d = pd. read_csv ( 'input.csv' ) # get row 1 df = d. read_csv ('result.csv' ) df. skip. append ( df [ 1 :]) df. skip. append ( df [ 2 ]) df. skip. append ( df [ 3 ]) # get row 2 df = d. read_csv ('result.csv' ) df. skip. append ( df [ 4 :]) df. skip. append ( df [ 5 ]) # end of result df. skip. append ( df [ 6 ]) # end of result df. skip. append ( df [ 7 ]) # end of result

You can use this on your own data.

import pd from pandas import DataFrame import data.frame as df from datetime import datetime import str, timedelta from datetime import timedelta as d t1 = df. read_csv ('result.csv' ) t2 = df. read_csv ('result.csv' ) # start with row 1 df. skip. append ( t1 ) df. skip. append ( t2 ) df. skip. append ( t3

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Prompt: Who 

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response: Give me list of python Data Structure ?

Answer: There is a very simple way

to use Data Structure in Python

I use it to:

Generate a list of strings

Find a list of strings (using Python 4)

Find a list of strings (using Python 3)

Find a list of strings (using Python 2)

Read a list of strings (using Python 1)

Read a list of strings (using Python 0)

Read a list of strings (using Python 0)

Check if python 3.4 is using

(Check if python 3.4 is using Python 2)

I use Python 2 to check if python 3.4 is using Python 3.2

I use Python 2 to check if python 3.2 is using Python 3.1

I use Python 2 to check if python 3.1 is using python 3.0

I use Python 2 to check if python 2 is using python 3.0

I use Python 2 to check if python 2 is using Python 3.0

I use Python 2 to check if python 2 is using python 3.0

So what do I do next?

I do:

Check if Python

--------------------------------------------------------------------------------


Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response2: Give me list of python Data Structure ?

no, you can use simple one, not list.

How can I make my own list of python Data Structure?

You can use the standard Python list type class and get a list of lists:

>>> import lists >>> list = lists.copy() >>> list.sort() >>> list.append(['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']) >>> lists.contains(list) True >>> lists.contains(['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']) False

or you can use list.append() method instead:

>>> list = lists.append(['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Prompt: Who is the owner of Tesla
--------------------------------------------------------------------------------
Res

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response2: What is the meaning of list in python?

This is a collection of lists that we can sort.

List sorted by length

Examples:

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]

[1, 2, 3, 4]


--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Prompt: What is Youtube
--------------------------------------------------------------------------------
Response: ⚠️ This model only answers Python coding-related questions.

--------------------------------------------------------------------------------
Response2: ⚠️ This model only answers Python coding-related questions.

--------------------------------------------------------------------------------

Both `max_new_tokens` (=256) and `max_length`(=150) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Response: Who created the Python coding language and when did it become the most widely used and used programming language? In this talk, you'll learn about the origins of Python, and how it's evolved since then. You'll also learn about the Python programming language's potential and its future.

What is the Python programming language?

The Python programming language is the Python programming language. Python is the Python programming language which is the code that is written for the Python programming language.

The Python programming language was written with the Python programming language in mind.

To understand the Python programming language, you will need to know:

the type system for Python

the type system for Python the number type system

the type system for Python the type system and the type system are the same

The type system of Python is an abstract type system.

The "Python programming language" is a set of abstract types, such as the Python type system.

The Python

##### The gpt-2 medium good result and exact result that we ask , gpt2 is give the response more that not ask or required 