### Prompt Engineering Playground 🚀

This notebook demonstrates **prompt engineering** using **Hugging Face LLMs** and **Gradio**. You'll learn how to:

✅ Explore different prompt types  
✅ Control LLM outputs with parameters (temperature, top-p, max tokens)  
✅ Build an interactive **Gradio app** for live prompt experimentation  

By the end of this notebook, you'll have an **interactive playground** you can deploy on **Hugging Face Spaces**!


In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Step 2: Load the Model and Tokenizer
model_id = "distilgpt2"
print(f"🔄 Loading model: {model_id}...")


🔄 Loading model: distilgpt2...


In [2]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Step 3: Set device to CPU (your laptop)
device = "cpu"
model.to(device)

print("Model loaded successfully and moved to CPU!")

Model loaded successfully and moved to CPU!


In [4]:
# Step 4: Define the Generate Function
def generate_response(user_prompt, temperature=0.9, top_p=0.85, max_tokens=100):
    """
    Generate a response from the model based on the user's prompt and parameters.
    """
    # Combine system + user prompts
   # full_prompt = f"{system_prompt}\n\n{user_prompt}"
    
    #print(f"\n📝 Full Prompt:\n{full_prompt}\n")
    
    # Tokenize input and explicitly truncate
    inputs = tokenizer(
        user_prompt,
        return_tensors="pt",
        truncation=True,
        max_length=512  # Or set the model's context window
    )

     # Move inputs to the CPU
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generate tokens
    outputs = model.generate(
        **inputs,
        max_length=max_tokens,
        temperature=temperature,
        top_p=top_p,
        repetition_penalty=1.5,  # Try values between 1.1 and 2.0
        do_sample=True
    )

    # Decode output tokens
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    #print(f"🖋️ Model Response:\n{generated_text}\n")
    
    return generated_text

# Step 5: Example Usage
if __name__ == "__main__":
    # Example prompts
    user_prompt1 = "Explain how neural networks work in simple terms."

    user_prompt2 = """Translate English to French:
1. Hello → Bonjour
2. Thank you → Merci
3. I love you →"""

    # Parameters for generation
    temperature = 0.9
    top_p = 0.85
    max_tokens = 100

    # Generate and display the response
    response = generate_response(user_prompt1, temperature, top_p, max_tokens)
    
    print("=" * 60)
    print(f"Final Output:\n{response}")
    print("=" * 60)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Final Output:
Explain how neural networks work in simple terms.
I'm not quite sure where these are going to get out, but I think there will be a lot more of them that come into the next few years and it may very well become better for us as we grow faster than previously thought about this phenomenon . So what is possible? It's still up to human beings with different brain types -- particularly those who have had many or even thousands of new neurons since their early days (before they were really
