# Q1.What are the advantages and limitations of the transformers library?



**Advantages:**

1. **State-of-the-art Performance**: Transformer models have achieved state-of-the-art performance on a wide range of natural language processing (NLP) tasks, including text classification, machine translation, text generation, and more. They have surpassed previous approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in many cases.

2. **Parallelization**: Unlike RNNs, which process input sequentially, transformers can process all tokens in the input sequence simultaneously, making them highly parallelizable. This parallelization leads to faster training and inference times, especially on hardware like GPUs and TPUs.

3. **Long-range Dependencies**: Transformers excel at capturing long-range dependencies in sequences, which is crucial for tasks like machine translation where understanding context across distant words is essential.

4. **Attention Mechanism**: The self-attention mechanism in transformers allows the model to weigh the importance of different input tokens when generating an output token. This mechanism helps the model focus on relevant parts of the input sequence, improving performance.

5. **Transfer Learning**: Pre-trained transformer models can be fine-tuned on downstream tasks with relatively small amounts of task-specific data, thanks to transfer learning. This approach has democratized access to state-of-the-art NLP models, even for users with limited computational resources.

6. **Model Architectures**: Hugging Face Transformers library provides a vast collection of pre-trained transformer architectures, ranging from small models like DistilBERT to large ones like BERT and GPT, catering to various use cases and computational budgets.

**Limitations:**

1. **Computational Resources**: Training large transformer models requires significant computational resources, including powerful GPUs or TPUs and large amounts of memory. This can be a barrier for individuals or organizations with limited resources.

2. **Data Efficiency**: While transfer learning allows fine-tuning pre-trained models on downstream tasks with limited data, transformers still require large amounts of pre-training data to achieve state-of-the-art performance. Obtaining and preprocessing such data can be challenging and resource-intensive.

3. **Interpretability**: The attention mechanisms in transformers provide some interpretability by indicating which parts of the input sequence the model focuses on. However, understanding how the model arrives at its predictions and interpreting its decisions can still be challenging, especially for large models with millions or billions of parameters.

4. **Fine-tuning Challenges**: Fine-tuning transformer models on specific downstream tasks requires careful hyperparameter tuning and architectural modifications to achieve optimal performance. This process can be time-consuming and require domain expertise.

5. **Fixed-Length Input**: Transformers operate on fixed-length sequences, which can be a limitation for tasks involving variable-length inputs, such as document classification or summarization. While techniques like padding or truncation can be used to handle variable-length inputs, they may introduce information loss or computational overhead.



# Q2. What are the different supported tasks in HunggingFace transformers?

1. **Text Classification**: Assigning labels or categories to text, useful for sentiment analysis, topic classification, and spam detection.

2. **Named Entity Recognition (NER)**: Identifying and classifying named entities like names of people, organizations, and locations within text, crucial for information extraction tasks.

3. **Question Answering (QA)**: Providing answers to questions based on a given context, valuable for tasks like reading comprehension and customer support chatbots.

4. **Text Generation**: Generating coherent and contextually relevant text based on a given prompt, used for story generation, dialogue completion, and summarization.

5. **Machine Translation**: Translating text from one language to another, essential for cross-lingual communication and localization of content.

6. **Summarization**: Condensing longer text into shorter, coherent summaries while preserving key information, beneficial for distilling insights from large documents or articles.

7. **Text-to-Speech (TTS)**: Converting text input into synthesized speech output, enabling applications like virtual assistants and accessibility tools.

8. **Speech Recognition**: Transcribing spoken language into text, supporting tasks such as voice commands and dictation.

9. **Language Understanding**: Handling various NLP tasks such as text classification, sentiment analysis, and intent detection, pivotal for natural language understanding applications.

10. **Language Modeling**: Predicting the next word or token in a sequence of text, facilitating tasks like autocomplete and text generation.


# Q3. Write a code summarizing text using the transformers pipeline's'google-t5/t5-base' model.

In [None]:
pip install transformers



In [None]:
from transformers import pipeline

In [None]:
summerizer_pipeline = pipeline(task= "summarization",model = 'google-t5/t5-base',tokenizer = 't5-base')


In [None]:
# Input text to be summarized
input_text = """
    The Hugging Face Transformers library is a powerful tool for natural language processing tasks.
    It provides an easy-to-use interface for accessing state-of-the-art models like BERT, GPT, and T5.
    With Hugging Face Transformers, you can perform tasks such as text classification, question answering, and text generation.
    The library supports various architectures and pre-trained models, making it suitable for a wide range of applications.
"""

In [None]:
summary = summerizer_pipeline (input_text,max_length=50, min_length=20, do_sample=False)
summary

[{'summary_text': 'the Hugging Face Transformers library is a powerful tool for natural language processing tasks . it provides an easy-to-use interface for accessing state-of-the-art models like BERT, GPT, and T5'}]

In [None]:
summary[0]

{'summary_text': 'the Hugging Face Transformers library is a powerful tool for natural language processing tasks . it provides an easy-to-use interface for accessing state-of-the-art models like BERT, GPT, and T5'}

In [None]:
summary[0]['summary_text']

'the Hugging Face Transformers library is a powerful tool for natural language processing tasks . it provides an easy-to-use interface for accessing state-of-the-art models like BERT, GPT, and T5'

# Q4. write a code for sentiment analysis in the transformers pipeline.

In [None]:
sentiment_analysis_pipeline = pipeline(task = 'sentiment-analysis')


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
input_text = "I love using the Hugging Face Transformers library. It's incredibly powerful and easy to use!"

In [None]:
sentiment = sentiment_analysis_pipeline(input_text)


In [None]:
sentiment

[{'label': 'POSITIVE', 'score': 0.9997500777244568}]

In [None]:
sentiment[0]['label']

'POSITIVE'

In [None]:
sentiment[0]['score']

0.9997500777244568

#Q5. write a code for classifying given news headlines into four categories ["entertainment", "sports", "politics", "science"] Headlines :
 ◼ Oscars 2024 winning actor Cillian Murphy to play lead role in'Peaky Blinders' movie

 ◼ Australia beat India in the final of the ICC Cricket World Cup 2023 by six wickets

 ◼ Lok Sabha Election Date 2024: Seven-phase voting to beginfrom April 19, results on June 4

 ◼ After 6-month trip to ISS, NASA astronaut Loral O’Hara,others back on Earth

In [None]:
text_classification_pipeline = pipeline('zero-shot-classification')


No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
headlines = [
    "Oscars 2024 winning actor Cillian Murphy to play lead role in 'Peaky Blinders' movie",
    "Australia beat India in the final of the ICC Cricket World Cup 2023 by six wickets",
    "Lok Sabha Election Date 2024: Seven-phase voting to begin from April 19, results on June 4",
    "After 6-month trip to ISS, NASA astronaut Loral O’Hara, others back on Earth"
]


In [None]:
candidate_labels = ["entertainment", "sports", "politics", "science"]

In [None]:
text_classification = text_classification_pipeline(headlines,candidate_labels)


In [None]:
# Print the classification results for each headline
for i, headline in enumerate(headlines):
    print(f"Headline: {headline}")
    print("Predicted Category:", text_classification [i]['labels'][0])
    print("Confidence Score:", text_classification [i]['scores'][0])
    print()

Headline: Oscars 2024 winning actor Cillian Murphy to play lead role in 'Peaky Blinders' movie
Predicted Category: entertainment
Confidence Score: 0.9071799516677856

Headline: Australia beat India in the final of the ICC Cricket World Cup 2023 by six wickets
Predicted Category: sports
Confidence Score: 0.9708090424537659

Headline: Lok Sabha Election Date 2024: Seven-phase voting to begin from April 19, results on June 4
Predicted Category: politics
Confidence Score: 0.9470001459121704

Headline: After 6-month trip to ISS, NASA astronaut Loral O’Hara, others back on Earth
Predicted Category: entertainment
Confidence Score: 0.4154137670993805



# Q6. initialize tokenizer for distilbert-base-uncased model

In [None]:
from transformers import AutoTokenizer

tokenizer= AutoTokenizer.from_pretrained("distilbert-base-uncased")

input_text = "Initialize tokenizer for DistilBERT-base-uncased model."

tokens = tokenizer(input_text)
print(tokens)

{'input_ids': [101, 3988, 4697, 19204, 17629, 2005, 4487, 16643, 23373, 1011, 2918, 1011, 4895, 28969, 2944, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


# Q7. what input_ids and attention_mask are in above cell output

**'input_ids'** is a list of token IDs representing the input text after tokenization which includes special tokens as start and end of sequence

**'attention_mask'**  is a list indicating which tokens should be attended to (1) and which ones should be ignored the padded sequence as (0). In this case, all tokens are attended to (all values are 1).

# Q8. how to return pytorch tensors in the above code.

In [None]:
from transformers import AutoTokenizer

tokenizer= AutoTokenizer.from_pretrained("distilbert-base-uncased")

input_text = "Initialize tokenizer for DistilBERT-base-uncased model."

tokens = tokenizer(input_text,return_tensors = 'pt')
tokens


{'input_ids': tensor([[  101,  3988,  4697, 19204, 17629,  2005,  4487, 16643, 23373,  1011,
          2918,  1011,  4895, 28969,  2944,  1012,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

# Q9. What do these special tokens represent in the BERT model:['[PAD]', '[UNK]', '[CLS]', '[SEP]', '[MASK]']?

In BERT (Bidirectional Encoder Representations from Transformers) and similar transformer-based models, special tokens have specific purposes within the input sequence:

1. `[PAD]`: This token is used for padding sequences to equalize their length. It's added to the end of sequences that are shorter than the maximum sequence length in a batch during training or inference.

2. `[UNK]`: This token represents unknown or out-of-vocabulary (OOV) words. When a word in the input text is not found in the model's vocabulary, it's replaced with this token.

3. `[CLS]`: This token marks the beginning of a sequence and is prepended to the input text. It's used specifically for classification tasks where the final hidden state corresponding to this token is used as the aggregate sequence representation for classification tasks.

4. `[SEP]`: This token is used to separate two different sequences in the input. It's appended to the end of each sequence, indicating the end of one segment and the beginning of another.

5. `[MASK]`: This token is used during the pre-training phase of the model. It's randomly substituted for some tokens in the input sequence, and the model's objective is to predict the original tokens based on the context provided by the surrounding tokens.


# Q10. Convert the given sentence into input IDs using the BERT tokenizer:“This is an outstanding product. I love it.”

In [None]:
from transformers import Pipeline,AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
sentence = "This is an outstanding product. I love it."
tokens = tokenizer.tokenize(sentence)
tokens = ["[CLS]"] + tokens + ["[SEP]"]
input_ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids


[101, 2023, 2003, 2019, 5151, 4031, 1012, 1045, 2293, 2009, 1012, 102]

#Q11.Convert the given list of input IDs into tokens using the BERT tokenizer:[0, 100, 101, 102, 103]

In [5]:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
input_ids = [0, 100, 101, 102, 103]
tokens = tokenizer.convert_ids_to_tokens(input_ids)
tokens

['[PAD]', '[UNK]', '[CLS]', '[SEP]', '[MASK]']

# Q12.How to download any model from the Hugging Face Hub using the Transformers library in PyTorch and TensorFlow?

In [7]:
# by default it will be return in Pytorch
from transformers import AutoModel
model_name = 'distilbert-base-uncased'
model = AutoModel.from_pretrained(model_name)
model


config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0-5): 6 x TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias=True)
          (lin2): Li

In [8]:
# for tensorflow
from transformers import TFAutoModel
model_name = 'distilbert-base-uncased'
model = TFAutoModel.from_pretrained(model_name)
model


Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertModel: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_transform.bias', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFDistilBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertModel for predictions without further training.


<transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertModel at 0x7fc79a30b700>

# Q.13 The above code gives output. What does each number represent?

In [13]:
from transformers import AutoTokenizer,AutoModel

model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
inputs = tokenizer('who is MS Dhoni?',return_tensors = 'pt')
output = model(**inputs)
output
output.last_hidden_state.shape




torch.Size([1, 8, 768])

In [None]:
# here
# 1 means batch of one sentence
# 8 means no of tokens in batch
# 768 model's hidden state


# Q14. Explain Trainer and TrainingArguments classes in the Transformers library.

Trainer Class: Handles the entire training process, including the training loop, evaluation, checkpointing, logging, gradient accumulation, and mixed precision training.

TrainingArguments Class: Specifies training configurations and hyperparameters such as epochs, batch sizes, learning rate, logging directories, save intervals, evaluation strategies, and reporting platforms.

# Q15. What are autoregressive models?

autoregressive models predict future values based on past observations, assuming that the current value depends on its previous values.

They are widely used in time series analysis and forecasting, and in deep learning, they refer to models that generate output sequentially based on previous outputs, such as autoregressive language models like GPT.

#Q 16. output = model.generate(**input, num_beams=5,no_repeat_ngram_size=2,max_new_tokens=50)
what is function of **input, num_beams, no_repeat_ngram_size
and max_new_tokens in above code

****input**: The in Python is used to pass a dictionary as keyword arguments. So, **input implies that the input variable is a dictionary containing the input data for the model. It could include information such as prompts or initial tokens.

**num_beams**: This parameter controls the number of beams used during beam search decoding. Beam search is a technique used to generate multiple sequences in parallel and select the most likely ones. Increasing the number of beams can lead to better results but also increases computational complexity.

**no_repeat_ngram_size**: This parameter prevents the model from generating n-grams (sequences of tokens) that have appeared in the output sequence within a certain window size. It helps avoid repetitive or redundant output by specifying the maximum size of n-grams to be avoided.

**max_new_tokens**: This parameter limits the maximum number of new tokens (words or subwords) that can be generated in the output sequence. It controls the length of the generated text by specifying the maximum number of tokens to be added to the input.


# Q17. How do temperature, top_p, and top_k affect generation?

**1.Temperature**:
Function: Temperature adjusts the likelihood scores of tokens during sampling.

Effect: Higher temperature values result in more diverse and creative outputs, as the likelihoods of all tokens are increased relatively equally, leading to more randomness in the sampling process. Lower temperature values result in more conservative outputs, favoring tokens with higher probabilities and producing more deterministic text.

Typical Range: Temperature values typically range from 0 to 1, with values closer to 0 producing more deterministic output and values closer to 1 producing more diverse output.

**2.Top-p (Nucleus Sampling)**:
Function: Top-p, also known as nucleus sampling, dynamically adjusts the set of candidate tokens based on their cumulative probability until the cumulative probability exceeds a predefined threshold (p).

Effect: Top-p sampling focuses on selecting a subset of the most likely tokens, ensuring diversity while avoiding low-probability tokens. It allows for more controlled generation compared to temperature sampling.

Typical Range: The top-p threshold (p) typically ranges from 0 to 1, with higher values resulting in a smaller subset of tokens being considered.

**3.Top-k (Top-k Sampling)**:
Function: Top-k sampling limits the selection of tokens to the top k most likely tokens according to their probabilities.

Effect: Top-k sampling selects from a fixed number of most likely tokens, providing a balance between diversity and control. It ensures that the model generates fluent and coherent text by considering only a limited set of highly probable tokens.

Typical Range: The top-k value (k) is an integer representing the number of tokens to consider. Typical values depend on the vocabulary size and task complexity but often range from 10 to 50.

# Q18. Write code for BLEU score, METEOR, ROUGE, and perplexity evaluation using the Hugging Face evaluation module.

**BLEU Score**: BLEU (Bilingual Evaluation Understudy) is a metric used to evaluate the quality of machine-translated text by comparing it to one or more reference translations. It measures the similarity between generated and reference texts using n-gram overlap.

**ROUGE Score**: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used to evaluate the quality of summaries by comparing them to one or more reference summaries. It measures the overlap of n-grams between the generated and reference texts.

**METEOR Score**: METEOR (Metric for Evaluation of Translation with Explicit Ordering) is a metric for automatic machine translation evaluation. It computes the harmonic mean of precision and recall with stemming and synonymy matching.

**Perplexity**: Perplexity is a measure of how well a probability distribution or a language model predicts a sample. In the context of language modeling, perplexity measures the uncertainty of a language model in predicting the next token in a sequence. Lower perplexity values indicate better performance.

In [16]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.19.0-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub>=0.21.2 (from datasets)
  Downloading huggingface_hub-0.22.2-py3-none-an

In [4]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/84.1 kB[0m [31m1.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Installing collected packages: responses, evaluate
Successfully installed evaluate-0.4.1 responses-0.18.0


In [5]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=990d3d14bda7962f89ccd49d52947cd82486de2f188e3c0569fb6ec05b11e8a4
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [2]:
import evaluate
bleu = evaluate.load('bleu')
prediction = ["the cat sat on the mat"]
reference = ["the cat is on mat"]
bleu.compute(predictions=prediction, references=reference, max_order=2)

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

{'bleu': 0.36514837167011077,
 'precisions': [0.6666666666666666, 0.2],
 'brevity_penalty': 1.0,
 'length_ratio': 1.2,
 'translation_length': 6,
 'reference_length': 5}

In [4]:
reference = ["The NASA Opportunity rover is battling a massive dust storm on mars."]
candidate1 = ["The Opportunity rover is combating a big sandstorm on mars."]
candidate2 = ["A NASA rover is fighting a massive storm on Mars."]

meteor = evaluate.load('meteor')
meteor.compute(predictions=candidate2, references=reference)

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


{'meteor': 0.6722608024691359}

In [6]:
rouge = evaluate.load("rouge")

rouge.compute(predictions=candidate1, references=reference, rouge_types=['rouge1', 'rouge2', 'rougeL'])

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

{'rouge1': 0.6363636363636365, 'rouge2': 0.3, 'rougeL': 0.6363636363636365}

In [8]:
perplexity = evaluate.load("perplexity")
text = ["we are learning machin learning"]

perplexity.compute(model_id="gpt2", predictions=text)

  0%|          | 0/1 [00:00<?, ?it/s]

{'perplexities': [3143.050048828125], 'mean_perplexity': 3143.050048828125}