# **Text Generator Model By LLM**

## 1. Importing Required Libraries

In [2]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

- **`transformers`**: This library is developed by Hugging Face, which provides state-of-the-art pre-trained models for natural language processing (NLP).
- **`GPT2LMHeadModel`**: This class is used to load the pre-trained GPT-2 model, which is a type of language model specifically designed for generating text.
- **`GPT2Tokenizer`**: This class is used to load the tokenizer corresponding to the GPT-2 model. The tokenizer is responsible for converting text into a format that the model can process (tokenization) and converting the model's output back into text (detokenization).

### Summary

The code performs the following steps:
1. Imports the necessary classes from the Hugging Face Transformers library.
2. Loads a pre-trained GPT-2 model and its corresponding tokenizer.
3. Encodes an initial text prompt into a format suitable for the model.
4. Uses the model to generate text based on the encoded input.
5. Decodes the generated token IDs back into human-readable text.
6. Prints the generated text.

## 2. Loading the Pre-trained Model and Tokenizer

In [4]:
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

- **`model_name`**: Specifies the name of the pre-trained model to use. Here, we're using the "gpt2" model.
- **`from_pretrained(model_name)`**: This method loads the pre-trained weights and configuration for the specified model and tokenizer. This means we don't have to train the model from scratch.

## 3. Encoding Input Text

In [8]:
input_text = input('Enter Your Text Query which You want to gererate : ')
input_ids = tokenizer.encode(input_text, return_tensors="pt")

Enter Your Text Query which You want to gererate : i love


- **`input_text`**: This is the initial text prompt that we provide to the model to generate text from.
- **`tokenizer.encode(input_text, return_tensors="pt")`**: This method converts the input text into a list of token IDs that the model can process. The `return_tensors="pt"` argument specifies that the output should be in PyTorch tensor format, which is required for input to the model.

## 4. Generating Text

In [9]:
output = model.generate(input_ids, max_length=50, num_return_sequences=1)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- **`model.generate`**: This method generates text based on the input token IDs.
- **`input_ids`**: The encoded input text.
- **`max_length=50`**: The maximum length of the generated text, including the length of the input prompt. The model will stop generating text once this length is reached.
- **`num_return_sequences=1`**: The number of different sequences to generate. In this case, we are generating one sequence.

## 5. Decoding and Printing the Generated Text
  - **`tokenizer.decode(output[0], skip_special_tokens=True)`**: This method converts the token IDs generated by the model back into human-readable text. The `skip_special_tokens=True` argument ensures that special tokens (like end-of-sequence markers) are not included in the output text.
  - **`print(generated_text)`**: This prints the generated text to the console.


In [10]:
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

i love to say that I'm not a fan of the idea of a "real" world. I'm not a fan of the idea of a "real" world. I'm not a fan of the idea of a "real" world.


# ************************************************************************

## **Name - Aatish Kumar Baitha**
  - M.Tech(Data Science 2nd Year Student)
- My Linkedin Profile -
  - https://www.linkedin.com/in/aatish-kumar-baitha-ba9523191
- My Blog
  - https://computersciencedatascience.blogspot.com/
- My Github Profile
  - https://github.com/Aatishkb

# **Thank you!**