# **Hugging Face Direct Model Usage with Examples**
Using Hugging Face models directly provides more control over the entire process, including tokenization, model inference, and decoding. Below is a step-by-step guide.

In [1]:
pip -q install transformers

Note: you may need to restart the kernel to use updated packages.


In [2]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")

In [3]:
from huggingface_hub import login
login(secret_value_0)

# **1. Import Necessary Libraries and Load a Pre-trained Model and Tokenizer**

Use AutoTokenizer and AutoModelForCausalLM to load the tokenizer and model. Replace "meta-llama/Llama-3.2-1B" with the desired model name.

In [4]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

# **2. Tokenize the Input Text**
Tokenization converts text into numerical representations (tokens) that the model can understand.

In [5]:
input_text = "High in the halls of the kings who are gone"
inputs = tokenizer(input_text, return_tensors="pt")  # "pt" for PyTorch tensors
print(inputs)

{'input_ids': tensor([[128000,  12243,    304,    279,  52473,    315,    279,  45619,    889,
            527,   8208]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


The `return_tensors` parameter in Hugging Face's tokenizer specifies the type of tensors that should be returned after tokenizing the input text. The **options** for `return_tensors` determine the framework to which the tensors are compatible. Here's a breakdown:

---

## **1. What is `return_tensors="pt"`?**
- `pt` stands for **PyTorch**, a popular deep learning framework.
- When you use `return_tensors="pt"`, the tokenizer returns a PyTorch tensor that is directly compatible with PyTorch-based models.

Example:
```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
tokens = tokenizer("Hello, Hugging Face!", return_tensors="pt")
print(tokens)
```

Output:
```python
{'input_ids': tensor([[  101,  7592,  1010, 17662,  2227,   999,   102]]), 
 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
```

Here:
- `input_ids`: Encoded tokens of the input sentence.
- `attention_mask`: Binary mask indicating which tokens should be attended to (1) and which should be ignored (0).

---

## **2. Other Options for `return_tensors`**

### **a. `return_tensors="tf"`**
- Returns a **TensorFlow** tensor.
- Used when working with TensorFlow/Keras models.
- Example:
  ```python
  tokens = tokenizer("Hello, Hugging Face!", return_tensors="tf")
  print(tokens)
  ```
  Output:
  ```python
  {'input_ids': <tf.Tensor: shape=(1, 7), dtype=int32, numpy=array([[  101,  7592,  1010, 17662,  2227,   999,   102]])>, 
   'attention_mask': <tf.Tensor: shape=(1, 7), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1]])>}
  ```

---

### **b. `return_tensors="np"`**
- Returns a **NumPy array**, which is not tied to a specific deep learning framework.
- Useful for preprocessing, lightweight testing, or when you don’t need a specific framework.
- Example:
  ```python
  tokens = tokenizer("Hello, Hugging Face!", return_tensors="np")
  print(tokens)
  ```
  Output:
  ```python
  {'input_ids': array([[  101,  7592,  1010, 17662,  2227,   999,   102]]), 
   'attention_mask': array([[1, 1, 1, 1, 1, 1, 1]])}
  ```

---

## **3. When to Use Each Option**

| Option          | Use Case                                                                                       |
|------------------|-----------------------------------------------------------------------------------------------|
| `return_tensors="pt"` | When working with **PyTorch** models.                                                     |
| `return_tensors="tf"` | When working with **TensorFlow/Keras** models.                                            |
| `return_tensors="np"` | When you want a general-purpose **NumPy** array for preprocessing or lightweight analysis.|

---

## **4. What Happens If You Don’t Use `return_tensors`?**
If you don’t specify `return_tensors`, the tokenizer returns **Python lists** by default. While this format is readable and straightforward, it's not optimized for computation and cannot be directly passed to models.

Example:
```python
tokens = tokenizer("Hello, Hugging Face!")
print(tokens)
```

Output:
```python
{'input_ids': [101, 7592, 1010, 17662, 2227, 999, 102], 
 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
```

You would need to manually convert these lists into tensors compatible with your framework.

---

### **Summary**
- `return_tensors` is essential for converting tokenized data into tensors compatible with deep learning frameworks.
- Options:
  - `"pt"`: PyTorch tensors.
  - `"tf"`: TensorFlow tensors.
  - `"np"`: NumPy arrays.
- Choose the option based on the framework you're using.

# **3. Generate Output Using the Model**

Use the model's generate method to generate text based on the input tokens.

In [6]:
outputs = model.generate(
    inputs["input_ids"],  # Input tokens
    max_length=50,        # Maximum length of the output sequence
    num_return_sequences=1,  # Number of sequences to generate
    do_sample=True,         # Enable sampling for creative outputs
    
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


# **4. Decode the Generated Tokens**
The model outputs tokens, which you need to decode back into readable text.

In [7]:
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

High in the halls of the kings who are gone
They know the way to the great hall
The way to the great hall
Where the kings who are gone are laid
The way to the great hall
Where the kings who are gone


# **Advanced Options**
## **1. Modify Generation Parameters**
You can control the output by adjusting parameters like:
- `temperature`: Controls randomness (lower = less random, higher = more creative).
- `top_k`: Limits token selection to the top K most probable tokens.
- `top_p`: Enables nucleus sampling (tokens with cumulative probability ≤ `top_p`).

Example:
```python
outputs = model.generate(
    inputs["input_ids"],
    max_length=50,
    temperature=0.7,
    top_k=50,
    top_p=0.9,
    num_return_sequences=3
)
```

## **2. Batch Processing**
You can process multiple inputs simultaneously for efficiency.
```python
input_texts = ["Hello, how are you?", "Once upon a time,"]
inputs = tokenizer(input_texts, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(inputs["input_ids"], max_length=50)
```

## **3. Save and Load Models Locally**
If you want to save the model locally:
```python
model.save_pretrained("./local_model")
tokenizer.save_pretrained("./local_model")
```
To load it later:
```python
tokenizer = AutoTokenizer.from_pretrained("./local_model")
model = AutoModelForCausalLM.from_pretrained("./local_model")
```

---

## **Documentation and Tutorials**

1. **Hugging Face Transformers Documentation:**
   - [AutoTokenizer](https://huggingface.co/docs/transformers/model_doc/auto)
   - [AutoModelForCausalLM](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM)
   - [Text generation strategies](https://huggingface.co/docs/transformers/main/en/generation_strategies#text-generation-strategies)

2. **Hugging Face Tutorials:**
   - [Getting Started with Transformers](https://huggingface.co/docs/transformers/main/en/index)
   - [Generation with Transformers](https://huggingface.co/blog/how-to-generate)

3. **Model Repository:**
   - [Meta-Llama Models](https://huggingface.co/meta-llama)

4. **Hugging Face Forums:**
   - [Community Discussions](https://discuss.huggingface.co/)