<center><img src="https://gitlab.com/accredian/insaid-data/-/raw/main/Logo-Accredian/Case-Study-Cropped.png" width= 30% /></center>

# <center><b> LLM with Hugging Face

---
# **Table of Contents**
---

**1.** [**Introduction**](#Section1)<br>
  - **1.1** [**Common LLM Applications**](#Section11)<br>
  - **1.2** [**Prompt Engineering**](#Section12)<br>
  - **1.3** [**Hugging Face**](#Section13)<br>
  

**2.** [**Problem Statement**](#Section2)<br>
**3.** [**Installing & Importing Libraries**](#Section3)<br>
  - **3.1** [**Installing Libraries**](#Section31)
  - **3.2** [**Importing Libraries**](#Section32)
  - **3.3** [**Logging Notebook with Token**](#Section33)
  

**4.** [**Data Description**](#Section4)<br>
  
**5.** [**LLM Model Training**](#Section5)<br>
  - **5.1** [**Summarization**](#Section51)
  - **5.2** [**Sentiment Analysis**](#Section52)
  - **5.3** [**Zero Shot Classification**](#Section53)
  - **5.4** [**Few Shot Learning**](#Section54)

**6.** [**Conclusion**](#Section6)<br>
  

---
<a name = Section1></a>
# **1. Introduction**
---

<br>

<a name = Section11></a>
### **1.1 Common LLM Applications**

With **Hugging Face**, we can have a tour of common LLM applications, which are:
- **Summarization**: Summarization take two forms:
    - **Extractive**: It is about **selecting** representative passage from the text.
    - **Abstractive**: It is about **generating** good text summaries.
    
- **Sentiment Analysis**: It is a **text classification** task which estimates whether a piece of text is positive, negative, neutral, or any other sentiment label.
- **Zero-shot classification**: It is the task of classifying a piece of text into given labels **without** training the model on any **examples**. The model uses it's own knowledge here.
- **Few-shot learning**: It is the task where model is given **instructions** by the user, some **query-response examples**, and then the model generate response for the new query.

**Visit** the hugging face [models](https://huggingface.co/models), and select the **model** according to your application.





<center><img src="https://lh5.googleusercontent.com/Rl3jaVxNteHl9tthHyprvvZD0ZlnKO5KHXKqKXDrLOR_Uyz1B4pOncHIRP3Ktap3YVQFS7KVbsu4kOzeEMrQ8qHPHs2HHcr7wZNTE5yfqwe5vX00ViNY6dwR6pFl9zAzIItp8blBV478vS2ZoI3buu4" width = 100% /></center>

<a name = Section12></a>
### **1.2 Prompt Engineering**

####**Prompt**:
A prompt is a **specific instruction** or query given to a computer program or model, like myself, to **generate a response**. It's the input or question provided to **solicit** a particular output or answer. In the context of large language models, prompts are used to **guide** the model in generating text or completing tasks.

####**Prompt Engineering**:
This is the process of **designing** and **refining prompts** to achieve desired outcomes when using large language models. It involves **crafting prompts** that are clear, specific, and effectively communicate the desired task or information to the model. Prompt engineering is **important** because the way a prompt is formulated can significantly impact the quality and relevance of the **model's responses**.

<center><img src="https://nextgeninvent.com/wp-content/uploads/2023/05/Prompt-Engineering-Best-Practices.png" width = 100% /></center>




<a name = Section13></a>
### **1.3 Hugging Face**
Hugging Face is a company and platform known for its work in natural language processing (NLP) and artificial intelligence (AI). They have developed various tools and resources for NLP tasks, and one of their most notable contributions is the creation of the Transformers library.

Some of the models of hugging face are not accessed until you have an account on hugging face.
Your notebook will need a token from hugging face to connect with it's model.

**Generating a token on Hugging Face**

**Step 1**: Visit [here](https://huggingface.co/) Go to the `SignUp` on the right corner of the website. You will be directed to the signup page.

<center><img src="https://drive.google.com/uc?export=download&id=1yNPOQ7x-x0h0edmyajT9HhiH3zJlNr1q" width = 60% /></center>

**Step 2**: Enter your email address and set your password. After that you are directed here:

<center><img src="https://drive.google.com/uc?export=download&id=1JM1Ft8k-wt9fOeS7XgD0ynNxBdHMxja8" width = 60% /></center>

**Step 3**: Complete your profile, and a confirmation email will come on your email ID. Confirm your email, and you will be directed to the welcome page.

<center><img src="https://drive.google.com/uc?export=download&id=1YezZGlkoaT1xLKmK97_7z3mbZk7TWkfF" width = 80% /></center>

**Step 4**: Go to the right corner of your page, and click on your profile. Go to the settings.

<center><img src="https://drive.google.com/uc?export=download&id=17RIVyqdcMfcPueGKLD_Df8f_RJobnj3K" width = 60% /></center>

**Step 5**: Now go to the `Access Tokens` on the left navigation pane. You will see an option for `New Token`.

<center><img src="https://drive.google.com/uc?export=download&id=1Tm0SJVsIzNcTq2ciaDdaM53ZZkcLEB0C" width = 80% /></center>

**Step 6**: Click on `New Token`. A window will appear.

<center><img src="https://drive.google.com/uc?export=download&id=1c4rACPkJznuVIIRZyNJziPwx6-H6dWr4" width = 50% /></center>

**Step 7**: Fill in the name you want, and change the role to `write`.Then Click on `Generate a token`.

<center><img src="https://drive.google.com/uc?export=download&id=1Xeui9hu8a1h-zbu6oudo44xBqOAI2rur" width = 50% /></center>

**Step 8**: Your token will be generated. You can copy the token and use it when you have to login in your colab notebook.

<center><img src="https://drive.google.com/uc?export=download&id=12g2TJ-b4mOtk8LYycloHR8jMTEhKyWfJ" width = 80% /></center>


**Note**: While running your colab notebook, go to `Runtime` and `Change your runtime type` to `TPU`. Always close other tabs while running your LLM colab notebook. If not, your colab will crash.

<center><img src="https://drive.google.com/uc?export=download&id=1hrNZ5fiwNlA5TjTs5mPRlJj76P8gVf3k" width = 50% /></center>




---
<a name = Section2></a>
# **2. Problem Statement**
---
Using **pre-trained LLM**, we'll explore each LLM application:

- **Summarization**: From the **set of BBC articles and summaries**, we'll do abstractive summarization.
- **Sentiment Analysis**: Using text classification task, we'll classify sentiments of a **poem tagged with sentiments**: negative, positive, no_impact, and mixed.
- **Zero shot classification**: We'll **categorize** the summarized articles of BBC news dataset.
- **Few-shot learning**: Using **prompt engineering**, we'll generate a book summary from the given title.

---
<a name = Section3></a>
# **3. Installing & Importing Libraries**
---

<a name = Section31></a>
### **3.1 Installing Libraries**

In [None]:
!pip install -q datasets                            # Designed to simplify the process of working with various datasets
!pip install -q transformers                        # Provides pre-trained models and tools for working with LLM tasks
!pip install -q huggingface_hub                     # Platform for sharing and hosting models, datasets, and other resources related to LLM tasks
!pip install -q sentencepiece                       # Subword tokenization library often used in NLP to handle tokenization and text preprocessing

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.3/179.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.[0m[31m
[0m

<a name = Section32></a>
### **3.2 Importing Libraries**

In [None]:
import pandas as pd                                                              # Used for data manipulation and analysis in Python
from datasets import load_dataset                                                # Used to load datasets for LLM tasks
from transformers import pipeline                                                # Provides a high-level API for running various LLM tasks using pre-trained models
from huggingface_hub import notebook_login                                       # Library allows you to log in to the Hugging Face model hub
from transformers import AutoTokenizer                                           # Used for tokenizing text data when working with pre-trained models
from transformers import PegasusForConditionalGeneration                         # A class that represents the Pegasus model specifically designed for text generation tasks, text summarization and text completion.

<a name = Section33></a>
### **3.3 Logging Notebook with Token**

In [None]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

---
<a name = Section4></a>
# **4. Data Description**
---
- **Summarization & Zero Shot Classification**: [xsum](https://huggingface.co/datasets/xsum) dataset consists of a set of BBC news article and summaries.
- **Sentiment Analysis**: [poem_sentiment](https://huggingface.co/datasets/poem_sentiment) dataset is tagged with sentiment labels: `negative` (0), `positive` (1),`no_impact` (2), and `mixed` (3)



---
<a name = Section5></a>
# **5. LLM Model Training**
---

<a name = Section51></a>
### **5.1 Summarization**

In [None]:
# Loading Data
xsum_dataset = load_dataset(
    "xsum", version="1.2.0")
xsum_dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/6.24k [00:00<?, ?B/s]

xsum.py:   0%|          | 0.00/5.76k [00:00<?, ?B/s]

The repository for xsum contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/xsum.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


(…)SUM-EMNLP18-Summary-Data-Original.tar.gz:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.72M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/204045 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11332 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11334 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

In [None]:
# Converting the data to pandas dataframe
xsum_sample = xsum_dataset["train"].select(range(10))
display(xsum_sample.to_pandas())

Unnamed: 0,document,summary,id
0,"The full cost of damage in Newton Stewart, one...",Clean-up operations are continuing across the ...,35232142
1,A fire alarm went off at the Holiday Inn in Ho...,Two tourist buses have been destroyed by fire ...,40143035
2,Ferrari appeared in a position to challenge un...,Lewis Hamilton stormed to pole position at the...,35951548
3,"John Edward Bates, formerly of Spalding, Linco...",A former Lincolnshire Police officer carried o...,36266422
4,Patients and staff were evacuated from Cerahpa...,An armed man who locked himself into a room at...,38826984
5,Simone Favaro got the crucial try with the las...,Defending Pro12 champions Glasgow Warriors bag...,34540833
6,"Veronica Vanessa Chango-Alverez, 31, was kille...",A man with links to a car that was involved in...,20836172
7,Belgian cyclist Demoitie died after a collisio...,Welsh cyclist Luke Rowe says changes to the sp...,35932467
8,"Gundogan, 26, told BBC Sport he ""can see the f...",Manchester City midfielder Ilkay Gundogan says...,40758845
9,The crash happened about 07:20 GMT at the junc...,A jogger has been hit by an unmarked police ca...,30358490


In [None]:
# From hugging face, selecting the model and then building the pipeline.
model_name = "google/pegasus-large"                                               # Model Name

tokenizer = AutoTokenizer.from_pretrained(model_name)                             # Converting texts into tokens such that the model can understand
model = PegasusForConditionalGeneration.from_pretrained(model_name)               # Loads the Pegasus model with conditional generation to initialize the model with pre trained weights and architecture

summarizer = pipeline(                                                            # Creating a summarization pipeline
    task="summarization",                                                         # Defining task
    model=model,                                                                  # Defining model
    tokenizer=tokenizer,                                                          # Defining tokenizer
    min_length=10,                                                                # Setting minimum token length for generating summaries
    max_length=60,                                                                # Setting maximum token length for generating summaries
    do_sample=True,                                                               # Sampling
    top_k=50,                                                                     # Top-k sampling setting to 50 tokens with highest probabilities
    top_p=0.95,                                                                   # Used in conjunction with Top-k to control the diversity over probabilistic mass of tokens
    temperature=0.7)                                                              # Token Randomness


tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# Training the model
summarizer = pipeline(task = "summarization",
                      model = "google/pegasus-large",
                      min_length = 10,
                      max_length = 60,
                      truncation = True)                                         # Token generation will be truncated if exceeded max_length

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


In [None]:
# Results
summarization_results = summarizer(xsum_sample["document"])
summarization_results

[{'summary_text': 'Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town. "Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses." He said it was important that "immediate steps" were taken to'},
 {'summary_text': 'Insp David Gibson said: "It appears as though the fire started under one of the buses before spreading to the second.'},
 {'summary_text': "Mercedes were wary of Ferrari's pace before qualifying after Vettel and Raikkonen finished one-two in final practice, and their concerns appeared to be well founded as the red cars mixed it with the silver through most of qualifying. But Hamilton saved his best for last, fastest in every sector of his"},
 {'summary_text': 'Mrs Hale said: "The complainant\'s recollection is that on a number of occasions sexual acts would happen with the defendant either in the defendant\'s car or in his cottage." She told the jury 

In [None]:
# Displaying results for all

display(
    pd.DataFrame.from_dict(summarization_results)
    .rename({"summary_text": "generated_summary"}, axis=1)
    .join(pd.DataFrame.from_dict(xsum_sample))[
        ["generated_summary", "summary", "document"]
    ]
)

Unnamed: 0,generated_summary,summary,document
0,Many businesses and householders were affected...,Clean-up operations are continuing across the ...,"The full cost of damage in Newton Stewart, one..."
1,"Insp David Gibson said: ""It appears as though ...",Two tourist buses have been destroyed by fire ...,A fire alarm went off at the Holiday Inn in Ho...
2,Mercedes were wary of Ferrari's pace before qu...,Lewis Hamilton stormed to pole position at the...,Ferrari appeared in a position to challenge un...
3,"Mrs Hale said: ""The complainant's recollection...",A former Lincolnshire Police officer carried o...,"John Edward Bates, formerly of Spalding, Linco..."
4,"The chief consultant of Cerahpasa hospital, Ze...",An armed man who locked himself into a room at...,Patients and staff were evacuated from Cerahpa...
5,It took 24 minutes for a disjointed game to pr...,Defending Pro12 champions Glasgow Warriors bag...,Simone Favaro got the crucial try with the las...
6,"Veronica Vanessa Chango-Alverez, 31, was kille...",A man with links to a car that was involved in...,"Veronica Vanessa Chango-Alverez, 31, was kille..."
7,"""Say we put a 10 kilometres per hour limit on ...",Welsh cyclist Luke Rowe says changes to the sp...,Belgian cyclist Demoitie died after a collisio...
8,"He said: ""It is heavy mentally to accept that....",Manchester City midfielder Ilkay Gundogan says...,"Gundogan, 26, told BBC Sport he ""can see the f..."
9,A spokeswoman for Essex Police said it was not...,A jogger has been hit by an unmarked police ca...,The crash happened about 07:20 GMT at the junc...


**Observations**

- We took the **train** set of the dataset, and applied a **summarization** model.
- You can use any summarization model by simply **visiting** the Hugging Face website.
- We mentioned the **temprature** and **token length**.
- According to that, the model **summarized** the document.

<a name = Section52></a>
### **5.2 Sentiment Analysis**

In [None]:
# Loading the data
poem_dataset = load_dataset(
    "poem_sentiment", version="1.0.0"
)
poem_sample = poem_dataset["train"].select(range(10))

# Converting data into pandas dataframe
display(poem_sample.to_pandas())

Downloading builder script:   0%|          | 0.00/3.10k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/5.51k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/19.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.51k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.44k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/892 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/105 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/104 [00:00<?, ? examples/s]

Unnamed: 0,id,verse_text,label
0,0,with pale blue berries. in these peaceful shad...,1
1,1,"it flows so long as falls the rain,",2
2,2,"and that is why, the lonesome day,",0
3,3,"when i peruse the conquered fame of heroes, an...",3
4,4,of inward strife for truth and liberty.,3
5,5,the red sword sealed their vows!,3
6,6,and very venus of a pipe.,2
7,7,"who the man, who, called a brother.",2
8,8,"and so on. then a worthless gaud or two,",0
9,9,to hide the orb of truth--and every throne,2


In [None]:
# Building pipeline
sentiment_classifier = pipeline(
    task="text-classification",
    model="nickwong64/bert-base-uncased-poems-sentiment",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/923 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/348 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [None]:
results = sentiment_classifier(poem_sample["verse_text"])

In [None]:
joined_data = (
    pd.DataFrame.from_dict(results)
    .rename({"label": "predicted_label"}, axis=1)
    .join(pd.DataFrame.from_dict(poem_sample).rename({"label": "true_label"}, axis=1))
)

In [None]:
# Results
sentiment_labels = {0: "negative", 1: "positive", 2: "no_impact", 3: "mixed"}
joined_data = joined_data.replace({"true_label": sentiment_labels})

display(joined_data[["predicted_label", "true_label", "score", "verse_text"]])

Unnamed: 0,predicted_label,true_label,score,verse_text
0,positive,positive,0.996594,with pale blue berries. in these peaceful shad...
1,no_impact,no_impact,0.998741,"it flows so long as falls the rain,"
2,negative,negative,0.995966,"and that is why, the lonesome day,"
3,mixed,mixed,0.968735,"when i peruse the conquered fame of heroes, an..."
4,mixed,mixed,0.975967,of inward strife for truth and liberty.
5,mixed,mixed,0.96658,the red sword sealed their vows!
6,no_impact,no_impact,0.998639,and very venus of a pipe.
7,no_impact,no_impact,0.998611,"who the man, who, called a brother."
8,negative,negative,0.996557,"and so on. then a worthless gaud or two,"
9,no_impact,no_impact,0.998519,to hide the orb of truth--and every throne


**Observations**

- Using the **text classification model**, we classified the sentiment of the poem dataset.
- We calculated the **accuracy score** between the predictd and true label.

<a name = Section53></a>
### **5.3 Zero Shot Classification**

In [None]:
# Building pipeline
zero_shot_pipeline = pipeline(
    task="zero-shot-classification",
    model="cross-encoder/nli-deberta-v3-small"
)

# Defining helper function for categorizing article
def categorize_article(article: str) -> None:
    """
    This helper function defines the categories (labels) which the model must use to label articles.
    Note that our model was NOT fine-tuned to use these specific labels,
    but it "knows" what the labels mean from its more general training.

    This function then prints out the predicted labels alongside their confidence scores.
    """
    results = zero_shot_pipeline(
        article,
        candidate_labels=[
            "politics",
            "finance",
            "sports",
            "science and technology",
            "pop culture",
            "breaking news",
        ],
    )
    # Print the results nicely
    del results["sequence"]
    display(pd.DataFrame(results))

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/568M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/418 [00:00<?, ?B/s]

Downloading spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/156 [00:00<?, ?B/s]



In [None]:
# Article 1

categorize_article(
    """
Simone Favaro got the crucial try with the last move of the game, following earlier touchdowns by Chris Fusaro, Zander Fagerson and Junior Bulumakau.
Rynard Landman and Ashton Hewitt got a try in either half for the Dragons.
Glasgow showed far superior strength in depth as they took control of a messy match in the second period.
Home coach Gregor Townsend gave a debut to powerhouse Fijian-born Wallaby wing Taqele Naiyaravoro, and centre Alex Dunbar returned from long-term injury, while the Dragons gave first starts of the season to wing Aled Brew and hooker Elliot Dee.
Glasgow lost hooker Pat McArthur to an early shoulder injury but took advantage of their first pressure when Rory Clegg slotted over a penalty on 12 minutes.
It took 24 minutes for a disjointed game to produce a try as Sarel Pretorius sniped from close range and Landman forced his way over for Jason Tovey to convert - although it was the lock's last contribution as he departed with a chest injury shortly afterwards.
Glasgow struck back when Fusaro drove over from a rolling maul on 35 minutes for Clegg to convert.
But the Dragons levelled at 10-10 before half-time when Naiyaravoro was yellow-carded for an aerial tackle on Brew and Tovey slotted the easy goal.
The visitors could not make the most of their one-man advantage after the break as their error count cost them dearly.
It was Glasgow's bench experience that showed when Mike Blair's break led to a short-range score from teenage prop Fagerson, converted by Clegg.
Debutant Favaro was the second home player to be sin-binned, on 63 minutes, but again the Warriors made light of it as replacement wing Bulumakau, a recruit from the Army, pounced to deftly hack through a bouncing ball for an opportunist try.
The Dragons got back within striking range with some excellent combined handling putting Hewitt over unopposed after 72 minutes.
However, Favaro became sinner-turned-saint as he got on the end of another effective rolling maul to earn his side the extra point with the last move of the game, Clegg converting.
Dragons director of rugby Lyn Jones said: "We're disappointed to have lost but our performance was a lot better [than against Leinster] and the game could have gone either way.
"Unfortunately too many errors behind the scrum cost us a great deal, though from where we were a fortnight ago in Dublin our workrate and desire was excellent.
"It was simply error count from individuals behind the scrum that cost us field position, it's not rocket science - they were correct in how they played and we had a few errors, that was the difference."
Glasgow Warriors: Rory Hughes, Taqele Naiyaravoro, Alex Dunbar, Fraser Lyle, Lee Jones, Rory Clegg, Grayson Hart; Alex Allan, Pat MacArthur, Zander Fagerson, Rob Harley (capt), Scott Cummings, Hugh Blake, Chris Fusaro, Adam Ashe.
Replacements: Fergus Scott, Jerry Yanuyanutawa, Mike Cusack, Greg Peterson, Simone Favaro, Mike Blair, Gregor Hunter, Junior Bulumakau.
Dragons: Carl Meyer, Ashton Hewitt, Ross Wardle, Adam Warren, Aled Brew, Jason Tovey, Sarel Pretorius; Boris Stankovich, Elliot Dee, Brok Harris, Nick Crosswell, Rynard Landman (capt), Lewis Evans, Nic Cudd, Ed Jackson.
Replacements: Rhys Buckley, Phil Price, Shaun Knight, Matthew Screech, Ollie Griffiths, Luc Jones, Charlie Davies, Nick Scott.
"""
)

Unnamed: 0,labels,scores
0,sports,0.469011
1,breaking news,0.223165
2,science and technology,0.107025
3,pop culture,0.104471
4,politics,0.05739
5,finance,0.038938


In [None]:
# Article 2

categorize_article(
    """
The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.
"It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten," she said.
"That may not be true but it is perhaps my perspective over the last few days.
"Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"
Meanwhile, a flood alert remains in place across the Borders because of the constant rain.
Peebles was badly hit by problems, sparking calls to introduce more defences in the area.
Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs.
The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand.
He said it was important to get the flood protection plan right but backed calls to speed up the process.
"I was quite taken aback by the amount of damage that has been done," he said.
"Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."
He said it was important that "immediate steps" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans.
Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.
"""
)

Unnamed: 0,labels,scores
0,breaking news,0.208211
1,politics,0.17379
2,pop culture,0.173753
3,science and technology,0.157181
4,sports,0.154562
5,finance,0.132503


**Observations**

- With Zero shot classification model, we **categorized** the article into defined labels.
- After categorization, we can see the **accuracy score** of the classification.

<a name = Section54></a>
### **5.4 Few Shot Learning**

In [None]:
# Building pipeline
# Limiting the response length with max_new_tokens

few_shot_pipeline = pipeline(
    task="text-generation",
    model="EleutherAI/gpt-neo-1.3B",
    max_new_tokens=50
)

# Get the token ID for "###", which we will use as the EOS token below.
eos_token_id = few_shot_pipeline.tokenizer.encode("###")[0]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

In [None]:
# Training examples

prompt ="""Generate a book summary from the title:

[book title]: "Stranger in a Strange Land"
[book description]: "This novel tells the story of Valentine Michael Smith, a human who comes to Earth in early adulthood after being born on the planet Mars and raised by Martians, and explores his interaction with and eventual transformation of Terran culture."
###
[book title]: "The Adventures of Tom Sawyer"
[book description]: "This novel is about a boy growing up along the Mississippi River. It is set in the 1840s in the town of St. Petersburg, which is based on Hannibal, Missouri, where Twain lived as a boy. In the novel, Tom Sawyer has several adventures, often with his friend Huckleberry Finn."
###
[book title]: "Dune"
[book description]: "This novel is set in the distant future amidst a feudal interstellar society in which various noble houses control planetary fiefs. It tells the story of young Paul Atreides, whose family accepts the stewardship of the planet Arrakis. While the planet is an inhospitable and sparsely populated desert wasteland, it is the only source of melange, or spice, a drug that extends life and enhances mental abilities.  The story explores the multilayered interactions of politics, religion, ecology, technology, and human emotion, as the factions of the empire confront each other in a struggle for the control of Arrakis and its spice."
###
[book title]: "Blue Mars"
[book description]:"""

In [None]:
results = few_shot_pipeline(prompt, do_sample=True, eos_token_id=eos_token_id)

print(results[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


Generate a book summary from the title:

[book title]: "Stranger in a Strange Land"
[book description]: "This novel tells the story of Valentine Michael Smith, a human who comes to Earth in early adulthood after being born on the planet Mars and raised by Martians, and explores his interaction with and eventual transformation of Terran culture."
###
[book title]: "The Adventures of Tom Sawyer"
[book description]: "This novel is about a boy growing up along the Mississippi River. It is set in the 1840s in the town of St. Petersburg, which is based on Hannibal, Missouri, where Twain lived as a boy. In the novel, Tom Sawyer has several adventures, often with his friend Huckleberry Finn."
###
[book title]: "Dune"
[book description]: "This novel is set in the distant future amidst a feudal interstellar society in which various noble houses control planetary fiefs. It tells the story of young Paul Atreides, whose family accepts the stewardship of the planet Arrakis. While the planet is an in

**Observations**

- In Few shot learning, we trained the model with some examples.
- According to the examples, the model was able to give the book description of the book

---
<a name = Section6></a>
# **6. Conclusion**
---

- By using **pre-trained models** of hugging face, we were able to get desired results.
- You can **experiment** with more models of hugging face by visiting [here](https://huggingface.co/models).
- Also, you can **explore** datasets and **build** your own LLM common applications by visiting [here](https://huggingface.co/datasets).