<a href="https://colab.research.google.com/github/bandiajay/Generative-AI/blob/main/04_Text_Generation_Hugging_Face_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="text-align: center;">
<h1>  Text Generation using Hugging Face</h1>
</div>

<b> Objective: </b> <p> The aim of this worksheet is to equip participants with the skills to utilize the <b> Hugging Face </b> library for text generation tasks. Through practical exercises and examples, learners will explore the cutting-edge capabilities of Hugging Face's transformer models to generate text that is coherent, contextually relevant, and stylistically appropriate. This worksheet is designed to enable participants to apply these techniques in various domains such as automated content creation, dialogue systems, and personalized communication, thereby enhancing their proficiency in leveraging artificial intelligence for natural language generation.</p>

**Introduction**: Hugging Face is a pioneering technology company and research organization known for its groundbreaking contributions to the field of natural language processing (NLP). They have developed and popularized the "Transformers" library, which has become a cornerstone in the NLP community for offering state-of-the-art pre-trained models that can be easily customized and scaled. Hugging Face's models, such as BERT, GPT, and T5, are widely used for a variety of NLP tasks including text classification, sentiment analysis, question answering, and text generation.

The Hugging Face platform not only provides access to these powerful models but also fosters a vibrant community of AI researchers and practitioners who contribute to the continuous improvement and expansion of the library. This collaborative environment, combined with a strong commitment to open-source development, makes Hugging Face a hub for cutting-edge innovations in AI and machine learning. With their user-friendly tools and extensive documentation, Hugging Face significantly lowers the barrier to entry for individuals and organizations looking to implement advanced NLP solutions.

<b> Requirements: </b>
<ol>
<li> <i> Transformers </i> - A versatile library from Hugging Face providing state-of-the-art pre-trained models for natural language processing tasks </li>

<b> Steps: </b>
<ol>
    <li> Install <code> transformers </code> package.</li>
     <li> Write source code
        <p> 2.1 Import <code> transformers </code> module <br>
            2.2 Get the <b> API </b> from Hugging Face. <br>
            2.3 Load the Text generation pipeline. <br>
            2.4 Give a prompt <br>
            2.5 Generate the text <br>
            2.6 Print the output <br>
        </p>
      </li>  
</ol>

<h3> Install <code> transformers </code> package </h3>

**Note** : if the following command fails, execute `python -m pip install transformers`

In [None]:
pip install transformers



<h3> Step 2 : Write source code </h3>

<h4> Step 2.1 : Import the <code>transformers</code> module </h4>

Here *pipeline* function from transformers module helps in constructing the deep learning model required for text generation. *os* helps in configuring the underlying environment for text-generation.

In [None]:
from transformers import pipeline
import os

<h4> Step 2.2 : Get the API from Hugging Face </h4>

Configure the environment with Hugging face API.

**Note** : You need to create an account at [Huggin Face](https://huggingface.co/) and get an api key.

In [None]:
os.environ["api_key"] = "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"

<h4> Step 2.3 : Load the Text generation pipeline </h4>

Here,

*   text - generation mentions the task to be performed.
*   model - pre-trained model used by the pipeline. Here we are using `distillgpt2`

**Note** : You can find more models at [Hugging Face](https://huggingface.co/)

In [None]:
generator = pipeline("text-generation", model="distilgpt2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

<h4> Step 2.4 : Give a prompt </h4>

In [None]:
prompt = "After years of research, scientists finally discovered the secret to..."

<h4> Step 2.5 : Generate the text </h4>

Here,


*   *generator* is the required function.
*   prompt - input.
*   max_length - maximum length of the generated text. `100` indicates not to exceed 100 words.
*   do_sample - a text generation technique. `True` makes to consider diversified vocabulary instead of repetitive words.
*    temperature - randomness of the output generation.
    * < 1 - consider high probability words.
    * \> 1 - consider low probability words



In [None]:
generated_text = generator(prompt, max_length=100, do_sample=True, temperature=0.7)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


<h4> Step 2.6: Print the output </h4>

In [None]:
print(generated_text[0]['generated_text'])

After years of research, scientists finally discovered the secret to...

You can see a screenshot of the video below.
