### Assignment: Code-Focused Inference

Your task is to load a pre-trained GPT-2 model and configure it to answer *only* questions related to Python coding.

1. **Load Model and Tokenizer:** Load a suitable pre-trained GPT-2 model and its corresponding tokenizer. You can use `transformers.AutoModelForCausalLM` and `transformers.AutoTokenizer`. A smaller model like `gpt2` or `gpt2-medium` might be sufficient.
2. **Implement a Filtering Mechanism:** Before generating a response, check if the input prompt is related to Python coding. You can use simple keyword matching (e.g., "Python", "code", "function", "class", "import") or a more sophisticated approach using a text classification model (optional).
3. **Generate Response:** If the prompt is deemed a Python coding question, generate a response using the loaded GPT-2 model.
4. **Handle Non-Coding Questions:** If the prompt is not related to Python coding, return a predefined message indicating that the model can only answer coding questions.
5. **Test:** Test your implementation with various prompts, including both Python coding questions and non-coding questions, to ensure the filtering mechanism works correctly.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

def is_python_coding_question(prompt):
    keywords = [
        'python', 'code', 'function', 'class', 'import', 'def', 'lambda', 'list', 'dict', 'tuple', 'set',
        'loop', 'for', 'while', 'if', 'elif', 'else', 'exception', 'try', 'except', 'pandas', 'numpy',
        'matplotlib', 'plot', 'read_csv', 'DataFrame', 'script', 'decorator', 'generator', 'comprehension',
        'object', 'inheritance', 'method', 'attribute', 'self', 'init', 'main', 'package', 'module', 'pip',
        'virtualenv', 'venv', 'jupyter', 'notebook', 'pytest', 'unittest', 'assert', 'type', 'int', 'str', 'float',
        'bool', 'None', 'True', 'False', 'print', 'input', 'open', 'file', 'with', 'as', 'from', 'import', 'except',
        'raise', 'super', 'staticmethod', 'classmethod', 'property', 'yield', 'return', 'break', 'continue', 'pass'
    ]
    prompt_lower = prompt.lower()
    return any(kw in prompt_lower for kw in keywords)

# Load model and tokenizer (small GPT-2 for demo)
tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')
model.eval()

def code_focused_inference(prompt, max_length=100):
    if is_python_coding_question(prompt):
        inputs = tokenizer(prompt, return_tensors='pt')
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_length=max_length,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                do_sample=True,
                top_p=0.95,
                top_k=50
            )
        return tokenizer.decode(outputs[0], skip_special_tokens=True)
    else:
        return "Sorry, I can only answer Python coding questions."

# Test cases
prompts = [
    "How do I write a function to reverse a list in Python?",
    "What is the capital of France?",
    "Show me how to use pandas to read a CSV file.",
    "Tell me a joke.",
    "How do I create a class with inheritance in Python?"
]

for i, prompt in enumerate(prompts, 1):
    print(f"Prompt {i}: {prompt}")
    print("Response:")
    print(code_focused_inference(prompt, max_length=60))
    print("-"*60)


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Prompt 1: How do I write a function to reverse a list in Python?
Response:
How do I write a function to reverse a list in Python?

There is a function called reverse.py. It accepts two arguments:

data = [] def reverse ( list ): self . list = list

The first argument consists of a list of tuples

>>> len
------------------------------------------------------------
Prompt 2: What is the capital of France?
Response:
Sorry, I can only answer Python coding questions.
------------------------------------------------------------
Prompt 3: Show me how to use pandas to read a CSV file.
Response:
Show me how to use pandas to read a CSV file.

I had to take on a whole new approach to this project:

I took the simple, yet awesome, example above and ran a series of tests to verify that the dataset worked. By looking at some of the examples
------------------------------------------------------------
Prompt 4: Tell me a joke.
Response:
Sorry, I can only answer Python coding questions.
-------------

### Conclusion

This notebook demonstrates how to build a simple code-focused inference system using a pre-trained GPT-2 model. The system utilizes a keyword-based filtering mechanism to determine if a user's prompt is related to Python coding. If it is, the model generates a response; otherwise, it provides a predefined message. While this approach is basic, it serves as a starting point for creating more sophisticated domain-specific language models.

Further improvements could involve:
- Employing a more robust filtering mechanism (e.g., using a text classification model).
- Fine-tuning the GPT-2 model on a dataset of Python coding questions and answers.
- Implementing a more advanced prompt engineering strategy to guide the model's output towards generating accurate and helpful code-related responses.