# Chunk Size Pipeline

https://blog.llamaindex.ai/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5

https://medium.com/@eren9677/text-summarization-387836c9e178

https://www.nltk.org/

In [51]:
import nest_asyncio

nest_asyncio.apply()

from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)

from llama_index.evaluation import (
    DatasetGenerator,
    QueryResponseDataset,
    FaithfulnessEvaluator,
    RelevancyEvaluator,
)

from llama_index.llms import OpenAI

import openai
import time
import os

openai.api_key = os.getenv("OPENAI_API_KEY")

In [4]:
# !mkdir -p 'data/10k/'
# !wget 'https://raw.githubusercontent.com/jerryjliu/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'

# !mkdir -p 'data/paul_graham/'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [48]:
documents = SimpleDirectoryReader("../../resource/data/GC").load_data()

In [49]:
import csv

file_name = "../../resource/data/GC/eval_questions.csv"

# Generate and write
data_generator = DatasetGenerator.from_documents(documents)
# eval_questions = data_generator.generate_questions_from_nodes()
# with open(file_name, mode="w", newline="") as file:
#     writer = csv.writer(file)
#     for value in eval_questions:
#         writer.writerow([value])
# Read
eval_questions = []
with open(file_name, mode="r") as file:
    reader = csv.reader(file)
    for row in reader:
        eval_questions.append(row[0])

  return cls(


In [54]:
gpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt = ServiceContext.from_defaults(llm=gpt)
vector_index = VectorStoreIndex.from_documents(
    documents, service_context=service_context_gpt
)
responses = []
query_engine = vector_index.as_query_engine()
for q in eval_questions:
    response = query_engine.query(q)
    responses.append(response)
    print(response)

The file type of the document "Notice_Assurances_Carte_Hello_Prime.pdf" is PDF.
I'm sorry, but I cannot answer that question based on the given context information.
The underwriter of the insurance contract mentioned in the document is not provided in the given context information.
Chapter 4 of the document provides information about the guarantee for vehicle rental.
The file size of "Notice_Assurances_Carte_Hello_Prime.pdf" is not provided in the given context information.
The company responsible for managing insurance claims on behalf of Europ Assistance is the "gestionnaires de sinistres" or claims managers.
The creation date of the document "Notice_Assurances_Carte_Hello_Prime.pdf" is not provided in the given context information.
The file "Notice_Assurances_Carte_Hello_Prime.pdf" is located in the directory ../../resource/data/GC/.
The last modified date of the document "Notice_Assurances_Carte_Hello_Prime.pdf" is not provided in the given context information.
Chapter 3 of the doc

In [26]:
eval_documents = documents

# Take irrelative documents for context relativeness
eval_documents_inrelative = SimpleDirectoryReader(
    "../../resource/data/10k"
).load_data()[0 : len(eval_documents)]

ratios = [1 / 9, 2 / 8, 3 / 7, 4 / 6, 5 / 5, 6 / 4, 7 / 3, 8 / 2, 9 / 1]

mixed_documents_list = []
for ratio in ratios:
    mid_index = int(len(eval_documents) * ratio)
    mixed_documents = eval_documents[:mid_index] + eval_documents_inrelative[mid_index:]
    mixed_documents_list.append(mixed_documents)

29
29
29
29
29
29
29
29
29


In [47]:
import nltk
from nltk import word_tokenize, pos_tag
from nltk.corpus import stopwords
from nltk.tree import Tree
import string

nltk.download("punkt")
nltk.download("averaged_perceptron_tagger")
nltk.download("stopwords")
nltk.download("maxent_ne_chunker")
nltk.download("words")


# Data preparation


def get_chunks(sentence):
    words = word_tokenize(sentence)
    tags = pos_tag(words)
    chunk_tree = nltk.ne_chunk(tags)
    chunks = []
    for subtree in chunk_tree:
        if isinstance(subtree, Tree):
            chunks.append(" ".join([word for word, tag in subtree.leaves()]))
        else:
            chunks.append(subtree[0])
    return chunks


# Question without quesition words and other stop words
def reorder_question_simplified(question):
    stop_words = set(stopwords.words("english"))
    chunks = get_chunks(question)

    # Remove stopwords and reorder non-stopword chunks
    reordered_chunks = [chunk for chunk in chunks if chunk.lower() not in stop_words]

    # Reconstruct the sentence
    reordered_sentence = " ".join(reordered_chunks)

    return reordered_sentence


# Question without quesition words
def reorder_question_without_question_words(question):
    stop_words = set(stopwords.words("english"))
    question_words_mapping = {
        "what": "The thing",
        "when": "The time",
        "where": "The place",
        "why": "The reason",
        "how": "The method",
    }

    words = word_tokenize(question)

    reordered_words = []
    for word in words:
        lower_word = word.lower()
        if lower_word in question_words_mapping:
            reordered_words.append(question_words_mapping[lower_word])
        else:
            # Remove punctuation
            word_without_punct = word.translate(
                str.maketrans("", "", string.punctuation)
            )
            reordered_words.append(word_without_punct)

    reordered_question = " ".join(reordered_words)

    # Remove question marks
    reordered_question = reordered_question.replace("?", "")

    return reordered_question


reorder_question_simplified_list = []
reorder_question_without_question_words_list = []
for q in eval_questions:
    reorder_question_simplified_list.append(reorder_question_simplified(q))
    reorder_question_without_question_words_list.append(
        reorder_question_without_question_words(q)
    )

[nltk_data] Downloading package punkt to /home/lujun/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/lujun/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /home/lujun/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /home/lujun/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /home/lujun/nltk_data...
[nltk_data]   Package words is already up-to-date!


In [57]:
combined_lists = list(
    zip(
        eval_questions,
        reorder_question_simplified_list,
        reorder_question_without_question_words_list,
        responses,
    )
)

csv_file_path = "../../resource/data/GC/eval_questions_multiple_Dataset.csv"

with open(csv_file_path, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(
        [
            "Original Question",
            "Reordered Simplified",
            "Reordered Without Question Words",
            "Formal Responses",
        ]
    )
    writer.writerows(combined_lists)

In [79]:
from nltk.translate.bleu_score import sentence_bleu
from nltk.metrics.distance import edit_distance
from tqdm import tqdm
import json


csv_file_path = "../../resource/data/GC/eval_questions_multiple_Dataset.csv"
eval_questions = []
reordered_simplified = []
reordered_without_question_words = []
eval_reference_answers = []

with open(csv_file_path, mode="r", encoding="utf-8") as file:
    reader = csv.reader(file)
    header = next(reader)
    for row in reader:
        eval_questions.append(row[0])
        reordered_simplified.append(row[1])
        reordered_without_question_words.append(row[2])
        eval_reference_answers.append(row[3])


gpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt = ServiceContext.from_defaults(llm=gpt)
faithfulness_gpt = FaithfulnessEvaluator(service_context=service_context_gpt)
relevancy_gpt = RelevancyEvaluator(service_context=service_context_gpt)


def calculate_bleu(reference, candidate):
    reference = [reference.split()]
    candidate = candidate.split()
    return sentence_bleu(reference, candidate)


def calculate_rouge_l(reference, candidate):
    lcs_length = edit_distance(
        reference, candidate, substitution_cost=2, transpositions=True
    )

    recall = lcs_length / len(reference)
    precision = lcs_length / len(candidate)

    if precision + recall == 0:
        rouge_l_score = 0
    else:
        rouge_l_score = 2 * (precision * recall) / (precision + recall)

    return rouge_l_score


def evaluate_response_time_and_accuracy(chunk_size, eval_documents, eval_questions):
    total_response_time = 0
    total_faithfulness = 0
    total_relevancy = 0
    total_bleu_score = 0
    total_rouge_l_score = 0

    llm = OpenAI(model="gpt-3.5-turbo")
    service_context = ServiceContext.from_defaults(llm=llm, chunk_size=chunk_size)
    vector_index = VectorStoreIndex.from_documents(
        eval_documents, service_context=service_context
    )

    query_engine = vector_index.as_query_engine()
    num_questions = len(eval_questions)
    evaluation_results = {}
    for i in tqdm(
        range(num_questions), desc=f"Evaluating with chunk size {chunk_size}"
    ):
        question = eval_questions[i]
        start_time = time.time()
        response_vector = query_engine.query(question)
        elapsed_time = time.time() - start_time

        faithfulness_result = faithfulness_gpt.evaluate_response(
            response=response_vector
        ).passing

        relevancy_result = relevancy_gpt.evaluate_response(
            query=question, response=response_vector
        ).passing

        total_response_time += elapsed_time
        total_faithfulness += faithfulness_result
        total_relevancy += relevancy_result

        reference_answer = eval_reference_answers[i]
        generated_answer = response_vector.response

        bleu_score = calculate_bleu(reference_answer, generated_answer)
        rouge_l_score = calculate_rouge_l(reference_answer, generated_answer)

        total_bleu_score += bleu_score
        total_rouge_l_score += rouge_l_score

        question_key = f"Q{i + 1}"
        evaluation_results[question_key] = {
            "question": question,
            "response_time": elapsed_time,
            "faithfulness": faithfulness_result,
            "relevancy": relevancy_result,
            "bleu_score": bleu_score,
            "rouge_l_score": rouge_l_score,
            "reference_answer": reference_answer,
            "generated_answer": generated_answer,
        }

    evaluation_results["overall"] = {
        "average_response_time": total_response_time / num_questions,
        "average_faithfulness": total_faithfulness / num_questions,
        "average_relevancy": total_relevancy / num_questions,
        "average_bleu_score": total_bleu_score / num_questions,
        "average_rouge_l_score": total_rouge_l_score / num_questions,
    }
    return evaluation_results


# Iterate over different chunk sizes to evaluate the metrics to help fix the chunk size.
for chunk_size in [256, 512, 768, 1024, 1280, 1536, 1792, 2048]:
    for quesition_list in [
        {"table_name": "eval_questions", "table": eval_questions[10:13]},
        {
            "table_name": "reorder_question_simplified_list",
            "table": reorder_question_simplified_list[10:13],
        },
        {
            "table_name": "reorder_question_without_question_words_list",
            "table": reorder_question_without_question_words_list[10:13],
        },
    ]:
        for document in [eval_documents, eval_documents_inrelative]:
            if document == eval_documents:
                ratio = 1
            else:
                ratio = 0
            evaluation_results = evaluate_response_time_and_accuracy(
                chunk_size, document, quesition_list["table"]
            )

            #  Print
            print(f"Question List Name{quesition_list['table_name']}")
            print(f"Chunk Size: {chunk_size}")
            print(f"Document Ratio: {ratio}")
            evaluation_results["overall"]["Question List Name"] = quesition_list[
                "table_name"
            ]
            evaluation_results["overall"]["Chunk Size"] = chunk_size
            evaluation_results["overall"]["Document Ratio"] = ratio
            # Put everything to json
            output_filename = f"evaluation_results_chunk_{chunk_size}_ratio_{ratio}_quesitons_{quesition_list['table_name']}.json"
            with open(output_filename, "w") as json_file:
                json.dump(evaluation_results, json_file)

The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:11<00:00,  3.94s/it]


Question List Nameeval_questions
Chunk Size: 256
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:11<00:00,  3.81s/it]


Question List Nameeval_questions
Chunk Size: 256
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:13<00:00,  4.36s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 256
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:18<00:00,  6.18s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 256
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:12<00:00,  4.07s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 256
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 256: 100%|██████████| 3/3 [00:13<00:00,  4.46s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 256
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:12<00:00,  4.14s/it]


Question List Nameeval_questions
Chunk Size: 512
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:14<00:00,  4.83s/it]


Question List Nameeval_questions
Chunk Size: 512
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:22<00:00,  7.42s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 512
Document Ratio: 1


The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:11<00:00,  3.72s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 512
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:16<00:00,  5.52s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 512
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 512: 100%|██████████| 3/3 [00:18<00:00,  6.19s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 512
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:26<00:00,  8.91s/it]


Question List Nameeval_questions
Chunk Size: 768
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:17<00:00,  5.76s/it]


Question List Nameeval_questions
Chunk Size: 768
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:17<00:00,  5.75s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 768
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:11<00:00,  3.89s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 768
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:29<00:00,  9.82s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 768
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 768: 100%|██████████| 3/3 [00:18<00:00,  6.11s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 768
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:16<00:00,  5.65s/it]


Question List Nameeval_questions
Chunk Size: 1024
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:13<00:00,  4.36s/it]


Question List Nameeval_questions
Chunk Size: 1024
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:18<00:00,  6.14s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1024
Document Ratio: 1


The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:12<00:00,  4.16s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1024
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:15<00:00,  5.16s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1024
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1024: 100%|██████████| 3/3 [00:17<00:00,  5.70s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1024
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:14<00:00,  4.68s/it]


Question List Nameeval_questions
Chunk Size: 1280
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:13<00:00,  4.43s/it]


Question List Nameeval_questions
Chunk Size: 1280
Document Ratio: 0


The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:16<00:00,  5.66s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1280
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:15<00:00,  5.18s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1280
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:14<00:00,  4.69s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1280
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1280: 100%|██████████| 3/3 [00:17<00:00,  5.68s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1280
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:16<00:00,  5.59s/it]


Question List Nameeval_questions
Chunk Size: 1536
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:17<00:00,  5.91s/it]


Question List Nameeval_questions
Chunk Size: 1536
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:20<00:00,  6.72s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1536
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:13<00:00,  4.39s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1536
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:15<00:00,  5.29s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1536
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1536: 100%|██████████| 3/3 [00:15<00:00,  5.32s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1536
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:16<00:00,  5.65s/it]


Question List Nameeval_questions
Chunk Size: 1792
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:18<00:00,  6.31s/it]


Question List Nameeval_questions
Chunk Size: 1792
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:21<00:00,  7.32s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1792
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:26<00:00,  9.00s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 1792
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:21<00:00,  7.26s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1792
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 1792: 100%|██████████| 3/3 [00:29<00:00,  9.95s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 1792
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:17<00:00,  5.94s/it]


Question List Nameeval_questions
Chunk Size: 2048
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:28<00:00,  9.61s/it]


Question List Nameeval_questions
Chunk Size: 2048
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:21<00:00,  7.08s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 2048
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:26<00:00,  8.81s/it]


Question List Namereorder_question_simplified_list
Chunk Size: 2048
Document Ratio: 0


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:18<00:00,  6.33s/it]


Question List Namereorder_question_without_question_words_list
Chunk Size: 2048
Document Ratio: 1


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
Evaluating with chunk size 2048: 100%|██████████| 3/3 [00:29<00:00,  9.75s/it]

Question List Namereorder_question_without_question_words_list
Chunk Size: 2048
Document Ratio: 0



