Gentle Introduction to Hugging Face Transformers Library
===

This notebook demonstrates the code examples from this [article](https://tobeadatascientist.substack.com/p/gentle-introduction-to-hugging-face-transformers), showcasing the before and after of each technique.

For more resources like this, visit [tobeadatascientist.com](https://tobeadatascientist.com)

# Method 1: Using the pipeline API

In [4]:
from transformers import pipeline

# Initialize the text-generation pipeline
generator = pipeline("text-generation", model="gpt2")

# Generate text
prompt = "Hugging Face makes NLP"
output = generator(prompt, max_length=30)

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [5]:
# Print the generated text
print(f"Input: {prompt}")
print(f"Generated Text: {output[0]['generated_text']}")

Input: Hugging Face makes NLP
Generated Text: Hugging Face makes NLP better than I expected, it's my fault she didn't take a more important part of the game and pushed us out


# Method 2: Using AutoTokenizer and AutoModelForCausalLM

In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Prepare the input text
prompt = "Hugging Face makes NLP"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(**inputs, max_length=30)

# Decode the output
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [7]:
# Print the generated text
print(f"Input: {prompt}")
print(f"Generated Text: {generated_text}")

Input: Hugging Face makes NLP
Generated Text: Hugging Face makes NLP a great way to get your face into the game.

The NLP is a great way to get your face


In [None]:
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

# Step 1: Simulate a user input
user_input = "Hello, I am looking for a bot to help me. Can you do that??"

# Step 2: Tokenize with truncation and padding
inputs = tokenizer(
    user_input,
    max_length=30,  # Allow up to 30 tokens
    padding="max_length",  # Add padding if input is short
    truncation=True,  # Truncate input if it's too long
    return_tensors="pt"
)

# Step 3: Generate a response with controlled length
output_ids = model.generate(
    inputs["input_ids"],
    max_length=20,  # Limit response to 20 tokens
    num_return_sequences=1,  # Generate one response
    temperature=0.7,  # Add randomness for creative responses
)

# Step 4: Decode the generated tokens back to text
response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

In [None]:
# Print the input and chatbot's response
print("User Input:", user_input)
print("\nChatbot Response:", response)

*Find more information in the official [documentation](https://huggingface.co/docs/transformers/en/index)*