# Question Answering

Types of QA systems:

    Extractive QA systems: These systems extract the answer directly from the given text by identifying the relevant section of text that contains the answer.

    Abstractive QA systems: These systems generate a new answer by understanding the meaning of the question and synthesizing information from various sources.

Classical (before deep neural learning) QA systems:

    Information Retrieval based QA systems: These systems use information retrieval techniques to search for relevant documents and retrieve the most relevant answers.

    Knowledge Graph based QA systems: These systems represent information in a structured format and use graph-based algorithms to answer questions.

    Watson QA system: This system, developed by IBM, uses a combination of natural language processing, machine learning, and information retrieval techniques to answer questions in a wide range of domains.

Evaluation of QA and Stanford Question Answering Dataset (SQuAD):

SQuAD is a popular dataset used for evaluating QA systems. It consists of a large number of questions and answers, along with the corresponding passages of text that contain the answers. The dataset is used to evaluate the accuracy and performance of different QA systems.

Language models for QA systems:

    BiDAF (Bidirectional Attention Flow): This model uses a bidirectional attention mechanism to encode the question and the passage and identify the most relevant words and phrases.

    Encoder-decoder transformers: These models use transformer networks to encode the input text and generate the output answer.

    SpanBERT: This model is an extension of the BERT (Bidirectional Encoder Representations from Transformers) model and uses a span-based approach to answer questions. It considers all possible spans in the input text to generate the final answer.

In [15]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Load the BiDAF model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('deepset/bert-base-cased-squad2')
model = AutoModelForQuestionAnswering.from_pretrained('deepset/bert-base-cased-squad2')

# Define a sample question and passage
question = "What is the capital of France?"
passage = "France, officially the French Republic, is a country primarily located in Western Europe, consisting of metropolitan France and several overseas regions and territories. Paris is the capital and most populous city of France."

# Encode the question and passage using the tokenizer
inputs = tokenizer.encode_plus(question, passage, return_tensors='pt', max_length=512, truncation=True, truncation_strategy='only_second')
input_ids = inputs['input_ids']
token_type_ids = inputs['token_type_ids']
attention_mask = inputs['attention_mask']

# Pass the encoded input through the BiDAF model
outputs = model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids, return_dict=True)
start_logits = outputs.start_logits
end_logits = outputs.end_logits

# Decode the predicted start and end positions to get the answer
start_index = torch.argmax(start_logits)
end_index = torch.argmax(end_logits) + 1

input_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer_tokens = input_ids[0][start_index:end_index]
answer = tokenizer.decode(answer_tokens)

# Skip over any tokens before the start position or after the end position
for i, token in enumerate(answer_tokens):
    if token == tokenizer.cls_token_id:
        start_index += 1
    elif token == tokenizer.sep_token_id:
        end_index -= 1
answer_tokens = input_ids[0][start_index:end_index]

# Decode the answer tokens to get the final answer
answer = tokenizer.decode(answer_tokens)
print("Answer:", answer)

Answer: Paris


# Task 1.1
Task Description: In this task, you will be given a set of questions and a corresponding set of passages. Your goal is to use a QA model to find the answer to each question in its corresponding passage.

Materials Required: For this task, you will need a pre-trained QA model and a tokenizer. You can use any pre-trained QA model and tokenizer, such as BERT or RoBERTa, that is compatible with the Transformers library.

Instructions:

    Begin by introducing the participants to the task and the QA model that will be used. Provide a brief overview of how the model works and how it can be used to find answers to questions.

    Divide the participants into small groups, with each group consisting of 2-3 people. Provide each group with a set of questions and a corresponding set of passages.

    Instruct the participants to use the QA model to find the answer to each question in its corresponding passage. They should start by encoding the question and passage using the tokenizer, and then pass the encoded input through the QA model to obtain the predicted answer.

    Once the participants have obtained the predicted answer, they should decode the answer from the corresponding tokens using the tokenizer, and then compare the predicted answer to the actual answer.

    After each group has finished answering all the questions, bring the participants together and review the answers to each question. Discuss any common mistakes or misconceptions that arose during the task, and provide feedback and guidance to help the participants improve their performance.

    To wrap up the task, ask the participants to reflect on what they learned and how they can apply this knowledge in their work or studies.

Example Questions and Passages:

Question 1: What is the capital of the United States?

Passage 1: The capital of the United States is Washington, D.C. It is located on the east coast of the country, and is home to many important government buildings and monuments.

Question 2: Who wrote the novel "To Kill a Mockingbird"?

Passage 2: "To Kill a Mockingbird" is a novel written by Harper Lee. It was published in 1960 and has since become a classic of American literature.

Question 3: What is the largest country in the world by area?

Passage 3: Russia is the largest country in the world by area. It covers more than 17 million square kilometers and spans 11 time zones.

Question 4: What is the capital of France?

Passage 4: Paris is the capital and most populous city of France. It is located in the north-central part of the country, and is known for its rich history, art, and culture.

Question 5: Who was the first president of the United States?

Passage 5: George Washington was the first president of the United States. He served from 1789 to 1797, and is widely regarded as one of the most important figures in American history.

In [17]:
# Solution for reference

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Load the QA model and tokenizer
model_name = "distilbert-base-cased-distilled-squad"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

# Define a set of questions and passages
questions = [
    "What is the capital of the United States?",
    "Who wrote the novel \"To Kill a Mockingbird\"?",
    "What is the largest country in the world by area?",
    "What is the capital of France?",
    "Who was the first president of the United States?"
]
passages = [
    "The capital of the United States is Washington, D.C. It is located on the east coast of the country, and is home to many important government buildings and monuments.",
    "\"To Kill a Mockingbird\" is a novel written by Harper Lee. It was published in 1960 and has since become a classic of American literature.",
    "Russia is the largest country in the world by area. It covers more than 17 million square kilometers and spans 11 time zones.",
    "Paris is the capital and most populous city of France. It is located in the north-central part of the country, and is known for its rich history, art, and culture.",
    "George Washington was the first president of the United States. He served from 1789 to 1797, and is widely regarded as one of the most important figures in American history."
]

# Loop over each question and passage, and use the QA model to find the answer
for i, (question, passage) in enumerate(zip(questions, passages)):
    # Encode the question and passage using the tokenizer
    # Encode the question and passage using the tokenizer
    inputs = tokenizer.encode_plus(question, passage, return_tensors='pt', max_length=512, truncation_strategy='only_second')
    input_ids = inputs['input_ids']
    attention_mask = inputs['attention_mask']

    # Pass the encoded input through the QA model
    outputs = model(input_ids, attention_mask=attention_mask, return_dict=True)
    start_logits = outputs.start_logits
    end_logits = outputs.end_logits


    # Decode the predicted start and end positions to get the answer
    start_index = torch.argmax(start_logits)
    end_index = torch.argmax(end_logits) + 1

    # Skip over any tokens before the start position or after the end position
    for j, token_id in enumerate(input_ids[0]):
        if j < start_index or j >= end_index:
            input_ids[0][j] = tokenizer.pad_token_id

    # Decode the answer from the corresponding tokens
    answer_tokens = input_ids[0][start_index:end_index]
    answer = tokenizer.decode(answer_tokens)

    # Print the question, passage, and answer
    print("Question {}: {}".format(i+1, question))
    print("Passage {}: {}".format(i+1, passage))
    print("Answer {}: {}\n".format(i+1, answer))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Question 1: What is the capital of the United States?
Passage 1: The capital of the United States is Washington, D.C. It is located on the east coast of the country, and is home to many important government buildings and monuments.
Answer 1: Washington, D. C

Question 2: Who wrote the novel "To Kill a Mockingbird"?
Passage 2: "To Kill a Mockingbird" is a novel written by Harper Lee. It was published in 1960 and has since become a classic of American literature.
Answer 2: Harper Lee

Question 3: What is the largest country in the world by area?
Passage 3: Russia is the largest country in the world by area. It covers more than 17 million square kilometers and spans 11 time zones.
Answer 3: Russia

Question 4: What is the capital of France?
Passage 4: Paris is the capital and most populous city of France. It is located in the north-central part of the country, and is known for its rich history, art, and culture.
Answer 4: Paris

Question 5: Who was the first president of the United States

# Task 1.2
Task: Test a question answering model with the SQuAD dataset using the Hugging Face Transformers library.

Steps:

    Load a pre-trained question answering model and a tokenizer from the Hugging Face Transformers library.

    Load the SQuAD dataset. Link: https://rajpurkar.github.io/SQuAD-explorer/

    Iterate over the questions in the SQuAD dataset, tokenize the question and context using the tokenizer, and then generate the answer using the question answering pipeline.

    Print the question, context, and answer to the console.

In [None]:
import json
from transformers import pipeline, AutoTokenizer

# Load the question answering pipeline and the tokenizer
qa_pipeline = pipeline('question-answering', model='distilbert-base-cased-distilled-squad')
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')

# Load the SQuAD dataset
with open('path/to/squad.json', 'r', encoding='utf-8') as f:
    squad = json.load(f)

# Iterate over the questions in the SQuAD dataset
for article in squad['data']:
    for paragraph in article['paragraphs']:
        context = paragraph['context']
        for question in paragraph['qas']:
            # Tokenize the question and context
            inputs = tokenizer.encode_plus(question['question'], context, add_special_tokens=True, return_tensors='pt')

            # Generate the answer using the question answering pipeline
            answer = qa_pipeline(inputs['input_ids'], inputs['attention_mask'])[0]['answer']

            # Print the question, context, and answer
            print("Question: ", question['question'])
            print("Context: ", context)
            print("Answer: ", answer)
            print()

# Text Summarisation

Guidelines for Training Language Models: This topic covers best practices for training language models, including data preparation, model selection, hyperparameter tuning, and evaluation metrics. Participants will learn how to preprocess text data, train and fine-tune language models, and evaluate model performance using metrics such as perplexity and accuracy.

Overview of Natural Language Generation (NLG): This topic provides an introduction to natural language generation, which involves using machine learning models to generate human-like text. Participants will learn about the different types of NLG, including template-based generation, rule-based generation, and machine learning-based generation. They will also learn about the applications of NLG in areas such as chatbots, content generation, and data-to-text generation.

Generative Pre-trained Transformer (GPT, GPT-3): This topic covers the Generative Pre-trained Transformer (GPT) model and its variants, including the widely used GPT-3 model. Participants will learn about the architecture of these models, their pre-training process, and their applications in natural language generation, language modeling, and text completion.

Text Summarization Methods Based on Language Models: This topic covers methods for text summarization using language models. Participants will learn about two approaches to text summarization: zero-shot and few-shot. They will also learn about the advantages and limitations of these methods, and how to evaluate their performance using metrics such as ROUGE.

Evaluation of Summarization Models: This topic covers evaluation metrics for text summarization models, with a focus on the ROUGE metric. Participants will learn about the different variants of ROUGE, how to interpret its scores, and how to use it to evaluate the performance of text summarization models. They will also implement and run simple text summarization code using a language model.

In summary, this module covers various aspects of training language models and their applications in natural language generation and text summarization. Participants will gain hands-on experience in training and fine-tuning language models, as well as evaluating their performance using metrics such as perplexity, accuracy, and ROUGE. They will also learn about the latest advances in language modeling, including the GPT model and its variants, and how to use these models for text generation and summarization.

In [19]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the GPT-2 tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Set the maximum length of the generated text
max_length = 100

# Define the input prompt
prompt = "The quick brown fox"

# Encode the input prompt using the tokenizer
input_ids = tokenizer.encode(prompt, return_tensors='pt')

# Generate the text using the GPT-2 model
output_ids = model.generate(input_ids=input_ids, max_length=max_length, do_sample=True)

# Decode the generated text using the tokenizer
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Print the generated text
print(output_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The quick brown fox has been on a killing spree in Canada, going down in flames to several different locations including the Toronto suburb of Mississauga. A few fires were reported in Edmonton and Edmonton city limits but all were contained to the east of town.

"All he had left there was the small kitchen. It was like he was walking down an unfinished alley," said his mom and stepfather, the couple recently separated. "He was talking to himself. He was doing his job, it


In [24]:
# Text summarisation example

#!pip install rouge

import torch
from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import pipeline
from rouge import Rouge

# Load the BART tokenizer and model for abstractive summarization
tokenizer_abstractive = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model_abstractive = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

# Load the pipeline for extractive summarization
pipeline_extractive = pipeline('summarization')

# Define the input text
input_text = "The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing."

# Define the target summary
target_summary = "The quick brown fox jumps over the lazy dog. This is a test sentence for summarization."

# Perform abstractive summarization using BART
inputs = tokenizer_abstractive([input_text], max_length=1024, truncation=True, padding='max_length', return_tensors='pt')
outputs = model_abstractive.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=60, num_beams=4, length_penalty=2.0)
summary_abstractive = tokenizer_abstractive.decode(outputs[0], skip_special_tokens=True)

# Perform extractive summarization using pipeline
summary_extractive = pipeline_extractive(input_text, max_length=60)[0]['summary_text']

# Evaluate the summaries using the ROUGE metric
rouge = Rouge()
scores_abstractive = rouge.get_scores(summary_abstractive, target_summary)
scores_extractive = rouge.get_scores(summary_extractive, target_summary)

# Print the summaries and ROUGE scores
print("Input Text: ", input_text)
print("Target Summary: ", target_summary)
print("Abstractive Summary: ", summary_abstractive)
print("ROUGE Scores for Abstractive Summary: ", scores_abstractive)
print("Extractive Summary: ", summary_extractive)
print("ROUGE Scores for Extractive Summary: ", scores_extractive)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your max_length is set to 60, but you input_length is only 28. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=14)


Input Text:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing.
Target Summary:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization.
Abstractive Summary:  The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing. The quick brownfox jumps over a lazy dog to get to the other side of the road. The lazy dog is the one who falls over the quick
ROUGE Scores for Abstractive Summary:  [{'rouge-1': {'r': 1.0, 'p': 0.5517241379310345, 'f': 0.7111111065283952}, 'rouge-2': {'r': 1.0, 'p': 0.3488372093023256, 'f': 0.5172413754756243}, 'rouge-l': {'r': 1.0, 'p': 0.5517241379310345, 'f': 0.7111111065283952}}]
Extractive Summary:   The quick brown fox jumps over the lazy dog. This is a test sentence for summarization. Here is another sentence for testing. Here are another sentences for summarizing. The test sentence is a shor

# Task -2 
Task: Given a news article, summarize its content in 3-4 sentences.

Dataset: You can use a dataset of news articles such as the CNN/DailyMail dataset, which contains news articles and their corresponding summaries.

Instructions:

    Load the news articles and their summaries from the dataset.
    Choose an extractive or abstractive summarization method, and preprocess the news articles accordingly.
    Implement the summarization method and generate the summary for each news article.
    Evaluate the performance of the summarization method using a metric such as ROUGE.
    Compare the performance of different summarization methods and discuss their strengths and limitations.

This task can be adapted for different levels of expertise and time constraints. For example, beginners can start with a simpler extractive summarization method and a small dataset, while advanced participants can explore more complex abstractive summarization methods and larger datasets. The task can also be extended to include additional challenges, such as summarizing news articles in different languages, summarizing multiple news articles into a single summary, or summarizing news articles with conflicting viewpoints.


Try to debug this warning 'Using a pipeline without specifying a model name and revision in production is not recommended.'

In [1]:
# Solution for reference
import pandas as pd
from transformers import pipeline
from rouge import Rouge

# Load the CNN/DailyMail dataset
df = pd.read_csv('./daily_cnn.csv')

# Define the extractive summarization pipeline
pipeline_extractive = pipeline('summarization')

# Define the target summary length
max_length = 100

# Define the ROUGE metric
rouge = Rouge()

# Iterate over the news articles and generate summaries
for index, row in df.iterrows():
    # Extract the article and the reference summary
    article = row['article']
    reference_summary = row['highlights']

    # Generate the summary using extractive summarization
    summary = pipeline_extractive(article, max_length=max_length)[0]['summary_text']

    # Compute the ROUGE score for the generated summary
    scores = rouge.get_scores(summary, reference_summary)[0]

    # Print the article, reference summary, generated summary, and ROUGE scores
    print("Article:\n", article)
    print("Reference Summary:\n", reference_summary)
    print("Generated Summary:\n", summary)
    print("ROUGE Scores:\n", scores)
    print()

2023-03-31 11:34:34.448321: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-31 11:34:34.535262: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-03-31 11:34:34.537719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-03-31 11:34:34.537727: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore 

Article:
 The United States is the third most populous country in the world, with over 330 million people. It is a diverse country, with a wide range of ethnic and cultural backgrounds. The most populous state in the US is California, with over 39 million people.
Reference Summary:
 The United States is the third most populous country in the world.
Generated Summary:
  The United States is the third most populous country in the world, with over 330 million people . It is a diverse country, with a wide range of ethnic and cultural backgrounds . The most populous state in the US is California with over 39 million people, the most populous .
ROUGE Scores:
 {'rouge-1': {'r': 0.9090909090909091, 'p': 0.3125, 'f': 0.4651162752623039}, 'rouge-2': {'r': 0.9090909090909091, 'p': 0.23255813953488372, 'f': 0.37037036712620025}, 'rouge-l': {'r': 0.9090909090909091, 'p': 0.3125, 'f': 0.4651162752623039}}



Your max_length is set to 100, but you input_length is only 68. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=34)


Article:
 The Amazon rainforest is the largest tropical rainforest in the world, covering an area of over 6 million square kilometers. It is home to a vast array of plant and animal species, many of which are found nowhere else on Earth. The rainforest is also a vital carbon sink, helping to regulate the Earth's climate.
Reference Summary:
 The Amazon rainforest is the largest tropical rainforest in the world.
Generated Summary:
  The Amazon rainforest is the largest tropical rainforest in the world, covering an area of over 6 million square kilometers . It is home to a vast array of plant and animal species, many of which are found nowhere else on Earth . Rainforest is also a vital carbon sink, helping to regulate the Earth's climate .
ROUGE Scores:
 {'rouge-1': {'r': 0.8888888888888888, 'p': 0.17777777777777778, 'f': 0.29629629351851855}, 'rouge-2': {'r': 0.9, 'p': 0.16981132075471697, 'f': 0.2857142830435878}, 'rouge-l': {'r': 0.8888888888888888, 'p': 0.17777777777777778, 'f': 0.296

Your max_length is set to 100, but you input_length is only 56. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)


Article:
 Water covers over 70% of the Earth's surface, with the majority of it being in the form of saltwater. Freshwater, which is essential for human and animal life, makes up less than 3% of the Earth's total water supply. The largest freshwater lake in the world is Lake Superior, located in North America.
Reference Summary:
 Water covers over 70% of the Earth's surface.
Generated Summary:
  Water covers over 70% of the Earth's surface, with the majority of it being in the form of saltwater . Freshwater makes up less than 3% of Earth's total water supply . The largest freshwater lake in the world is Lake Superior, located in North America .
ROUGE Scores:
 {'rouge-1': {'r': 0.875, 'p': 0.2, 'f': 0.32558139232017314}, 'rouge-2': {'r': 0.8571428571428571, 'p': 0.14285714285714285, 'f': 0.2448979567346939}, 'rouge-l': {'r': 0.875, 'p': 0.2, 'f': 0.32558139232017314}}



Your max_length is set to 100, but you input_length is only 51. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=25)


Article:
 The European Union is a political and economic union of 27 member states located primarily in Europe. It was formed in the aftermath of World War II with the aim of promoting peace and economic prosperity among its members. The euro is the official currency of 19 of the member states.
Reference Summary:
 The European Union is a political and economic union of 27 member states.
Generated Summary:
  The European Union is a political and economic union of 27 member states located primarily in Europe . It was formed in the aftermath of World War II with the aim of promoting peace and economic prosperity . The euro is the official currency of 19 of the member states and is the euro's official currency .
ROUGE Scores:
 {'rouge-1': {'r': 1.0, 'p': 0.37142857142857144, 'f': 0.5416666627170139}, 'rouge-2': {'r': 1.0, 'p': 0.24489795918367346, 'f': 0.3934426197903789}, 'rouge-l': {'r': 1.0, 'p': 0.37142857142857144, 'f': 0.5416666627170139}}



Your max_length is set to 100, but you input_length is only 58. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=29)


Article:
 China is the most populous country in the world, with over 1.4 billion people. It is also the second-largest economy in the world, after the United States. The capital of China is Beijing, and the official language is Mandarin.
Reference Summary:
 China is the most populous country in the world.
Generated Summary:
  China is the most populous country in the world, with over 1.4 billion people . The capital of China is Beijing, and the official language is Mandarin . It is also the second-largest economy in world, after the United States . The official language of the country is Mandarin and the national capital is Beijing .
ROUGE Scores:
 {'rouge-1': {'r': 0.875, 'p': 0.22580645161290322, 'f': 0.3589743557133465}, 'rouge-2': {'r': 0.875, 'p': 0.14583333333333334, 'f': 0.2499999975510205}, 'rouge-l': {'r': 0.875, 'p': 0.22580645161290322, 'f': 0.3589743557133465}}



Your max_length is set to 100, but you input_length is only 61. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=30)


Article:
 The human brain is the most complex organ in the body, consisting of over 100 billion neurons and trillions of connections between them. It controls everything we do, from our thoughts and emotions to our movements and senses. The brain is divided into different regions that are responsible for different functions.
Reference Summary:
 The human brain is the most complex organ in the body.
Generated Summary:
  The human brain is the most complex organ in the body, consisting of over 100 billion neurons and trillions of connections between them . It controls everything we do, from our thoughts and emotions to our movements and senses . The brain is divided into different regions that are responsible for different functions .
ROUGE Scores:
 {'rouge-1': {'r': 0.9, 'p': 0.21428571428571427, 'f': 0.34615384304733726}, 'rouge-2': {'r': 0.9, 'p': 0.1836734693877551, 'f': 0.30508474294742893}, 'rouge-l': {'r': 0.9, 'p': 0.21428571428571427, 'f': 0.34615384304733726}}



Your max_length is set to 100, but you input_length is only 57. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=28)


Article:
 The Great Barrier Reef is the largest coral reef system in the world, stretching over 2,300 kilometers along the coast of Australia. It is home to thousands of species of marine life, including sea turtles, dolphins, and sharks. The reef is under threat from climate change and other environmental factors.
Reference Summary:
 The Great Barrier Reef is the largest coral reef system in the world.
Generated Summary:
  The Great Barrier Reef is the largest coral reef system in the world, stretching over 2,300 kilometers along the coast of Australia . It is home to thousands of species of marine life, including sea turtles, dolphins, and sharks . The reef is under threat from climate change and other environmental factors .
ROUGE Scores:
 {'rouge-1': {'r': 0.9166666666666666, 'p': 0.2682926829268293, 'f': 0.41509433611961555}, 'rouge-2': {'r': 0.9166666666666666, 'p': 0.22448979591836735, 'f': 0.36065573454447736}, 'rouge-l': {'r': 0.9166666666666666, 'p': 0.2682926829268293, 'f': 

Your max_length is set to 100, but you input_length is only 63. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=31)


Article:
 The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth. It is a joint project between five space agencies, including NASA, Roscosmos, and the European Space Agency. The ISS is used for scientific research, space exploration, and international cooperation.
Reference Summary:
 The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth.
Generated Summary:
  The International Space Station (ISS) is a habitable artificial satellite that orbits the Earth . It is a joint project between NASA, Roscosmos, and the European Space Agency . The ISS is used for scientific research, space exploration, and international cooperation . The station is used by five space agencies, including NASA .
ROUGE Scores:
 {'rouge-1': {'r': 1.0, 'p': 0.3684210526315789, 'f': 0.5384615345266273}, 'rouge-2': {'r': 1.0, 'p': 0.2826086956521739, 'f': 0.44067796266590065}, 'rouge-l': {'r': 1.0, 'p': 0.3684210526315789, 'f': 0.53846153

Your max_length is set to 100, but you input_length is only 54. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)


Article:
 The African elephant is the largest land animal in the world, weighing up to 6,000 kilograms and standing over three meters tall. It is found in 37 countries in sub-Saharan Africa and is known for its distinctive trunk and large, floppy ears. The elephant is under threat from habitat loss and poaching.
Reference Summary:
 The African elephant is the largest land animal in the world.
Generated Summary:
  The African elephant is the largest land animal in the world weighing up to 6,000 kilograms and standing over three meters tall . It is found in 37 countries in sub-Saharan Africa and is known for its distinctive trunk and large, floppy ears . The elephant is under threat from habitat loss and poaching .
ROUGE Scores:
 {'rouge-1': {'r': 1.0, 'p': 0.24390243902439024, 'f': 0.3921568595924645}, 'rouge-2': {'r': 1.0, 'p': 0.2, 'f': 0.33333333055555564}, 'rouge-l': {'r': 1.0, 'p': 0.24390243902439024, 'f': 0.3921568595924645}}

Article:
 The Eiffel Tower is an iconic landmark in P