# Uncertainty Estimation in Large Language Models to Support Biodiversity Conservation

**María Mora-Cross and Saúl Calderón-Ramírez**
Costa Rica Institute of Technology
{maria.mora, sacalderon}@itcr.ac.cr

July - December, 2023

### Abstract

Large Language Models (LLM) provide significant value in question answering (QA) scenarios and have practical application in complex decision-making contexts, such as biodiversity conservation. However, despite substantial performance improvements, they may still produce inaccurate outcomes. Consequently, incorporating uncertainty quantification alongside predictions is essential for mitigating the potential risks associated with their use. This study introduces an exploratory analysis of the application of Monte Carlo Dropout (MCD) and Expected Calibration Error (ECE) to assess the uncertainty of generative language models. To that end, we analyzed two publicly available language models (Falcon-7B and DistilGPT- 2). Our findings suggest the viability of employing ECE as a metric to estimate uncertainty in generative LLM. 

The findings from this research contribute to a broader project aiming at facilitating free and open access to standardized and integrated data and services about Costa Rica’s biodiversity to support the development of science, education, and biodiversity conservation.

**Detailed description of the experiment is available at:**
Maria Mora-Cross and Saul Calderon-Ramirez (2024). Uncertainty Estimation in Large Language Models to Support Biodiversity Conservation. Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies.  

This notebook was used to run the experiments detailed in the associated publication.

Please note that after the project's completion, the authors discovered that the ELI5 dataset is no longer available on Hugging Face. The dataset page has the message: "Dataset eli5 is defunct and no longer accessible due to unavailability of the source data".

In [7]:
# Libraries
import torch
import pandas as pd
import os
import numpy as np
from tqdm import tqdm
import random
import torch.nn.functional as F
from datetime import date

# sklearn
from sklearn.preprocessing import MinMaxScaler

# imbalace library
from imblearn.under_sampling import RandomUnderSampler

# Hugging Face
from transformers import AutoTokenizer, AutoModelForCausalLM, GPT2Config
from huggingface_hub import notebook_login
import datasets
from datasets import load_dataset
from datasets import DatasetDict
from collections import defaultdict

# Metrics
from datasets import load_metric
from transformers import pipeline

import matplotlib.pyplot as plt

# hugging face metrics
from evaluate import load

# It is recomended to install accelerate 0.21
import accelerate
accelerate.__version__


# Utilities 
from utilityV02 import *
setup()


Using transformers v4.36.2
Using datasets v2.14.4


In [8]:
# Variables
device = "cuda" if torch.cuda.is_available() else "cpu"


In [17]:
# Accessing huggingface

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Load and preprocess the dataset

**Dataset:**

The **ELI5 dataset is an English-language dataset of questions and answers gathered from three subreddits (service to manage communities)** where users ask factual questions requiring paragraph-length or **longer answers**. The dataset was created to support the task of **open-domain long form abstractive question answering, and covers questions about:** general topics in its r/explainlikeimfive subset, **science in it r/askscience subset**, and History in its r/AskHistorians subset.

**Data Fields**

- q_id: a string question identifier for each example, corresponding to its ID in the Pushshift.io Reddit submission dumps.
- subreddit: One of explainlikeimfive, askscience, or AskHistorians, indicating which subreddit the question came from
- title: title of the question, with URLs extracted and replaced by URL_n tokens
- title_urls: list of the extracted URLs, the nth element of the list was replaced by URL_n
- selftext: either an empty string or an elaboration of the question
- selftext_urls: similar to title_urls but for self_text
- answers: **a list of answers**, each answer has:
   - a_id: a string answer identifier for each answer, corresponding to its ID in the Pushshift.io Reddit comments dumps.
   - text: the answer text with the URLs normalized
   - score: the number of upvotes the answer had received when the dumps were created
- answers_urls: a list of the extracted URLs. All answers use the same list, the numbering of the normalization token continues across answer texts

Fan, A. Jernite, Y., Perez, E., Grangier, D., Weston, J & Aulihttps, M. ().
Dataset Card for ELI5.  https://huggingface.co/datasets/eli5

In [7]:
# Exploring the contents of the dataset
eli5_test = load_dataset("eli5", split="test_asks[:]" )
print(eli5_test)

Dataset({
    features: ['q_id', 'title', 'selftext', 'document', 'subreddit', 'answers', 'title_urls', 'selftext_urls', 'answers_urls'],
    num_rows: 4462
})


In [9]:
print(eli5_test[0]['answers']['text'])
print(eli5_test[0]['answers']['a_id'])

['Muscles are grouped in either voluntary or involuntary muscle. From there it breaks down into skeletal\n muscle ( biceps triceps etc) which is considered voluntary, smooth muscle is involuntary, this includes the muscles in your intestines to digest food or pupils in your eyes to dilatue or not, and lastly to answer your question there is cardiac muscle. It is also involuntary. It is able to generate its own stimulus to contract when i needed. \n\n\nSorry for errors. On phone :p', "Cardiac tissue isn't under voluntary control. The parts of the brain that control things like breathing, digestion, heartbeat ect are controllable only to a certain extent. For example, if you concentrate you can slow your heart rate, but it will keep beating. Similarly you can control your breathing, but you can't make yourself suffocate by holding your breath, because eventually your body will force you to breath again. I'm not sure how this has evolved, but it certainly works out for the best that way. 

In [11]:
# How the data looks like

sample_seq = 4461

#Question
print(eli5_test[sample_seq]['title'])

print("==============================")
print("Cantidad de respuestas", len(eli5_test[sample_seq]['answers']['text']))
print("==============================")

# answer 10 for a question
print(eli5_test[sample_seq]['answers']['text'][1])
print(eli5_test[sample_seq]['answers']['score'])

Are there any limits to what science can explain?
Cantidad de respuestas 4
Science attempts to observe and explain the facts of the universe. However, I would argue (though some may disagree) that it is beyond the realm of science to answer questions of value. In other words, science seeks to understand the origin of the universe but is not interested in providing any kind of philosophical rationale. Nor is science interested in/capable of determining whether or not a work of art is good or if your life has any meaning. Even though we may someday fully understand the neurochemical mechanisms and evolutionary prompts for the human creation of value, it is up to us to assign any significance to it. That's why people who say that either science or spirituality/philosophy must eliminate the other are naïve -- the two should stick to asking entirely different sets of questions.
[8, 7, 5, 3]


### Load the model, the tokenizer and define the model dropout


In [None]:
# 1)For GPT-2
# Load the model, the tokenizer and define the model dropout
model_name_tokenizer = "distilgpt2"
model_name = "mariamoracrossitcr/distilgpt2_finetuneWithEli5V2"

#config = GPT2Config.from_pretrained(model_name_tokenizer)
#model = AutoModelForCausalLM.from_pretrained(model_name, config=config).to(device)

model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

# Load a DistilGPT2 tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_tokenizer)
tokenizer.pad_token_id = tokenizer.eos_token_id

In [12]:
# 2) For Falcon-2b
# Falcon 7B was quantized using 4 bits and saved locally

from transformers import  FalconForCausalLM, BitsAndBytesConfig
#model = "tiiuae/falcon-7b"
model_name = "./FineTunningFalcon/results06"
model_name_tokenizer = "./FineTunningFalcon/results06"

tokenizer = AutoTokenizer.from_pretrained(model_name)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    #trust_remote_code=True,
    device_map="auto"
)
model.config.use_cache = False

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Calibration

Process:
- Call to the calibration function that executes the algorithm and returns the three elements required to perform the ECE:
    - model_confidence_per_sample,
    - avg_predictions_per_sample,
    - variance_predictions_per_sample
- The results are normalized with MinMaxScaler.
- Data is saved on the ECE database.

In [13]:
### Calibration for an experiment
# Hyperparameters
model_type = 'falcon'

#### with dropout
dropout = 0.04
#####
answer_length = 512

#falcon 7b
# 5.2. Generating Text with Contrastive Search: : https://huggingface.co/blog/introducing-csearch
penalty_alpha = 0.1 
topk=5

#bertscore 
bertscore_lang = 'en'

# For gpt2
topp=0.95

# Random selection of samples 
samples_test = select_samples(eli5_test, 500)

num_bins =10

num_repetions_per_sample = 10

experiment_id = 22

# CALIBRATION
model_confidence_per_sample, avg_predictions_per_sample, variance_predictions_per_sample  = \
            calibration (num_bins, dropout, samples_test, num_repetions_per_sample, tokenizer, model, \
                         answer_length,topk, topp,  experiment_id, model_type, penalty_alpha,\
                         bertscore_lang, padToken_id=50256)


# Update database
""" Postprocesing and statistics
    For an experiment_id update database ECE
       Read data form QUESTION and SAMPLING tables, normalize perplexity and divide it into ranges.
       Save data in QUESTION_SAMPLING
"""
# 11 bins generate 10 clases using linespace
num_bins = 11
postprocessing_tasks_QUESTION_SAMPLING(experiment_id, num_bins)


# Extract data from QUESTION_SAMPLING from the database (nromalized data) and save data in
# Question_sampling_stat table
query_text = "select q.q_id, q.experiment_id,  q.mod_perplexity_norm, \
                               q.samp_perplexity_norm, q.bin from question_sampling q \
                                  where q.experiment_id = " + str(experiment_id)
postprocessing_tasks_QUESTION_SAMPLING_STAT(query_text)

                                                                                

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|k9e7s|           22|     2024-01-10|I had to ask in /...|[I had to ask in ...|3.908203125|   5|0.95|   0.04|        512|0.8033592104911804|0.7652708292007446|0.8454376459121704|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|j0ebk|           22|     2024-01-10|Is it plausible t...|[Is it plausible ...|2.83203125|   5|0.95|   0.04|        512|0.7868825197219849|0.7403596043586731|0.8396442532539368|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|1tj6fo|           22|     2024-01-10|Since neutrinos a...|[Since neutrinos ...|3.017578125|   5|0.95|   0.04|        512|0.8281917572021484|0.8377081751823425|0.8188890218734741|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|ixy3v|           22|     2024-01-10|Need Help Making ...|[Need Help Making...|5.16015625|   5|0.95|   0.04|        512|0.8182324767112732|0.8197071552276611|0.8167629837989807|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|10j7z4|           22|     2024-01-10|Someone posted a ...|[Someone posted a...|2.416015625|   5|0.95|   0.04|        512|0.7855162620544434|0.7230498790740967|0.8597967028617859|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|u9abz|           22|     2024-01-10|Where in the scie...|[Where in the sci...|2.77734375|   5|0.95|   0.04|        512|0.7938931584358215|0.781606912612915|0.8065717220306396|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|ybhed|           22|     2024-01-10|[Weekly Discussio...|[[Weekly Discussi...|4.33984375|   5|0.95|   0.04|        512|0.8000369071960449|0.7974190711975098|0.8026720285415649|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|290ykx|           22|     2024-01-10|What causes bays ...|[What causes bays...|2.361328125|   5|0.95|   0.04|        512|0.8066962361335754|0.7769883871078491|0.8387661576271057|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|187ibe|           22|     2024-01-10|Does music sound ...|[Does music sound...|       4.5|   5|0.95|   0.04|        512|0.8135408759117126|0.8120381236076355|0.8150490522384644|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|o2prl|           22|     2024-01-10|A question about ...|[A question about...|2.22265625|   5|0.95|   0.04|        512|0.7902577519416809|0.7707135677337646|0.8108189702033997|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|5ncfdu|           22|     2024-01-10|How many OH and m...|[How many OH and ...|4.35546875|   5|0.95|   0.04|        512|0.8224431872367859|0.844935953617096|0.8011168837547302|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|mj8vg|           22|     2024-01-10|Why are repetitiv...|[Why are repetiti...|  3.421875|   5|0.95|   0.04|        512|0.7929863333702087|0.7912518382072449|0.7947283983230591|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|pgkgi|           22|     2024-01-10|Are there any lim...|[Are there any li...|2.837890625|   5|0.95|   0.04|        512|0.8201762437820435|0.8286517858505249|0.8118723630905151|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|2yqosg|           22|     2024-01-10|What would happen...|[What would happe...|2.66796875|   5|0.95|   0.04|        512|0.760465681552887|0.7433466911315918|0.7783917784690857|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|116qn4|           22|     2024-01-11|Omnidirectional I...|[Omnidirectional ...|6.62890625|   5|0.95|   0.04|        512|0.8307012915611267|0.8164325952529907|0.8454776406288147|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|5zkfgy|           22|     2024-01-11|Are there any big...|[Are there any bi...|   2.96875|   5|0.95|   0.04|        512|0.8030558228492737|0.7908989191055298|0.8155922889709473|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+-----------------+
|4fy82p|           22|     2024-01-11|Why does the hypo...|[Why does the hyp...|4.00390625|   5|0.95|   0.04|        512|0.8219936490058899|0.812795877456665|0.831402063369751|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|18fwun|           22|     2024-01-11|Question about th...|[Question about t...|       2.5|   5|0.95|   0.04|        512|0.7964836359024048|0.7708814144134521|0.8238447308540344|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|2vdw7k|           22|     2024-01-11|Generally in sci-...|[Generally in sci...|4.65234375|   5|0.95|   0.04|        512|0.8238973021507263|0.8128405809402466|0.835258960723877|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|6g4jwk|           22|     2024-01-11|Death - what is i...|[Death - what is ...|  3.453125|   5|0.95|   0.04|        512|0.8168654441833496|0.810334324836731|0.8235027194023132|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|88wy46|           22|     2024-01-11|Is there any way ...|[Is there any way...|3.291015625|   5|0.95|   0.04|        512|0.8306335806846619|0.8346015214920044|0.8267033100128174|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|3hd7rw|           22|     2024-01-11|What's the math t...|[What's the math ...| 2.6640625|   5|0.95|   0.04|        512|0.8016236424446106|0.7983138561248779|0.804961085319519|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|238mbf|           22|     2024-01-11|Where does the wi...|[Where does the w...|4.66015625|   5|0.95|   0.04|        512|0.8031213283538818|0.8055399656295776|0.8007171154022217|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+------------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|  perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+------------+----+----+-------+-----------+------------------+------------------+------------------+
|3ogzkj|           22|     2024-01-11|Efficiency of hig...|[Efficiency of hi...|1.6748046875|   5|0.95|   0.04|        512|0.7630094289779663|0.7503607273101807|0.7760919332504272|
+------+-------------+---------------+--------------------+--------------------+------------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|1edx4r|           22|     2024-01-11|Is Powerball actu...|[Is Powerball act...|   2.40625|   5|0.95|   0.04|        512|0.7877386808395386|0.766416609287262|0.8102811574935913|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|2vpqmz|           22|     2024-01-11|How does my Gamec...|[How does my Game...|  3.640625|   5|0.95|   0.04|        512|0.7872135639190674|0.7848563194274902|0.7895849347114563|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|a69lsk|           22|     2024-01-11|Is there a way to...|[Is there a way t...|3.517578125|   5|0.95|   0.04|        512|0.8048315048217773|0.7958285808563232|0.8140404224395752|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|5ugk4h|           22|     2024-01-11|How do programs l...|[How do programs ...|3.89453125|   5|0.95|   0.04|        512|0.795689046382904|0.8181554675102234|0.7744234800338745|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|keobm|           22|     2024-01-11|I am able to hear...|[I am able to hea...|3.330078125|   5|0.95|   0.04|        512|0.7842522859573364|0.7539588212966919|0.8170819282531738|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|13pzus|           22|     2024-01-11|What's the consen...|[What's the conse...|4.85546875|   5|0.95|   0.04|        512|0.8147167563438416|0.8126376867294312|0.816806435585022|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|197rxn|           22|     2024-01-11|I picked up this ...|[I picked up this...|2.98828125|   5|0.95|   0.04|        512|0.8028514385223389|0.7756469249725342|0.832033634185791|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|4o15co|           22|     2024-01-11|What is wrong wit...|[What is wrong wi...|    2.4375|   5|0.95|   0.04|        512|0.8014385104179382|0.781898558139801|0.8219801187515259|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|42kfmo|           22|     2024-01-11|How does one hast...|[How does one has...|3.369140625|   5|0.95|   0.04|        512|0.8180918097496033|0.8223172426223755|0.8139094710350037|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|tqcpl|           22|     2024-01-11|Hi AskScience! Do...|[Hi AskScience! D...|6.36328125|   5|0.95|   0.04|        512|0.8033430576324463|0.7947571277618408|0.8121166229248047|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|7r8bwu|           22|     2024-01-12|How does a paraco...|[How does a parac...|3.439453125|   5|0.95|   0.04|        512|0.7993740439414978|0.7844583988189697|0.8148677945137024|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|mydgn|           22|     2024-01-12|r/askscience pane...|[r/askscience pan...| 4.9453125|   5|0.95|   0.04|        512|0.8128411769866943|0.781883955001831|0.8463507890701294|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1m0umx|           22|     2024-01-12|What happens if y...|[What happens if ...|3.18359375|   5|0.95|   0.04|        512|0.8081039190292358|0.7982432246208191|0.8182111978530884|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|12siel|           22|     2024-01-12|Do dock leaves ha...|[Do dock leaves h...|2.904296875|   5|0.95|   0.04|        512|0.7882795333862305|0.7646647691726685|0.8133992552757263|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|7hlsmh|           22|     2024-01-12|If swelling is pa...|[If swelling is p...|3.42578125|   5|0.95|   0.04|        512|0.8150724172592163|0.7815873622894287|0.8515549302101135|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|7hqmqp|           22|     2024-01-12|how does certific...|[how does certifi...|2.447265625|   5|0.95|   0.04|        512|0.7611750960350037|0.7366939783096313|0.7873393297195435|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|nyqcq|           22|     2024-01-12|When I turn on my...|[When I turn on m...|3.40234375|   5|0.95|   0.04|        512|0.8127516508102417|0.7703193426132202|0.8601311445236206|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|1hozmj|           22|     2024-01-12|Why does a comput...|[Why does a compu...|3.30859375|   5|0.95|   0.04|        512|0.8111518025398254|0.777042031288147|0.8483936786651611|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|9tlwqt|           22|     2024-01-12|Why does "turning...|[Why does "turnin...|2.798828125|   5|0.95|   0.04|        512|0.7829318642616272|0.7514103651046753|0.8172137141227722|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|3dh12z|           22|     2024-01-12|What is so signif...|[What is so signi...|3.494140625|   5|0.95|   0.04|        512|0.8221174478530884|0.8117148876190186|0.8327901363372803|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|4noifr|           22|     2024-01-12|Suppose you throw...|[Suppose you thro...|3.314453125|   5|0.95|   0.04|        512|0.809394121170044|0.8331421613693237|0.7869623899459839|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|zqkie|           22|     2024-01-12|Please help ident...|[Please help iden...|  5.234375|   5|0.95|   0.04|        512|0.8134163022041321|0.7864237427711487|0.8423275947570801|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1mvoff|           22|     2024-01-12|Is there a foolpr...|[Is there a foolp...|   2.65625|   5|0.95|   0.04|        512|0.8042161464691162|0.7935986518859863|0.8151215314865112|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|2ulqi4|           22|     2024-01-12|Is it possible to...|[Is it possible t...|  3.328125|   5|0.95|   0.04|        512|0.7900394797325134|0.7666338682174683|0.8149191737174988|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|17j2hu|           22|     2024-01-12|With all these se...|[With all these s...|2.833984375|   5|0.95|   0.04|        512|0.7961516380310059|0.7490862011909485|0.8495279550552368|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|jx55a|          1|After seeing so m...|    5.5625|   5|0.95|   0.04|        512|           22|0.8169700503349304|
|jx55a|          2|After seeing so m...|5.30078125|   5|0.95|   0.04|        512|           22|0.7389756441116333|
|jx55a|          3|After seeing so m...| 5.3515625|   5|0.95|   0.04|        512|           22|0.8203624486923218|
|jx55a|          4|After seeing so m...|5.86328125|   5|0.95|   0.04|        512|           22|0.8199554085731506|
|jx55a|          5|After seeing so m...|4.83984375|   5|0.95|   0.04|        512|           22|0.8095964193344116|
|jx55a|          6|After seeing so m...|5.72265625|   5|0.95|   0.04|        512

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1of3oi|ccrjp9i|    4|The "active ingre...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|1of3oi|          1|Why doesn't a rhe...|  5.4609375|   5|0.95|   0.04|        512|           22|0.8044908046722412|
|1of3oi|          2|Why doesn't a rhe...|3.435546875|   5|0.95|   0.04|        512|           22|0.8144739270210266|
|1of3oi|          3|Why doesn't a rhe...| 4.30078125|   5|0.95|   0.04|        512|           22|0.8160281181335449|
|1of3oi

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|2o5p8l|          1|Ask Anything Wedn...|  5.9140625|   5|0.95|   0.04|        512|           22| 0.793583869934082|
|2o5p8l|          2|Ask Anything Wedn...|3.044921875|   5|0.95|   0.04|        512|           22|0.7953444719314575|
|2o5p8l|          3|Ask Anything Wedn...|   4.546875|   5|0.95|   0.04|        512|           22|0.7944787740707397|
|2o5p8l|          4|Ask Anything Wedn...| 4.66796875|   5|0.95|   0.04|        512|           22|0.7937726974487305|
|2o5p8l|          5|Ask Anything Wedn...|2.595703125|   5|0.95|   0.04|        512|           22|0.7656604647636414|
|2o5p8l|          6|Ask Anything Wedn...| 4.76171875|   5|0.95| 

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1exe0h|ca4peal|    7|Quantum mechanics...|           22|
|1exe0h|ca4rtaf|    5|Schrodinger's equ...|           22|
|1exe0h|ca4ptl6|    4|Everything is aff...|           22|
|1exe0h|ca4q051|    2|Color is quantum ...|           22|
|1exe0h|ca4t0ni|    2|The question of w...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|1exe0h|          1|Do quantum mechan...| 7.63671875|   5|0.95|   0.04|        512|           22|0.8296823501586914|
|1exe0h| 

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|15pe5w|c7op11q|    2|Not sure if I'm d...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|15pe5w|          1|What is actually ...|  4.4453125|   5|0.95|   0.04|        512|           22|0.7820372581481934|
|15pe5w|          2|What is actually ...|  3.7265625|   5|0.95|   0.04|        512|           22| 0.776974081993103|
|15pe5w|          3|What is actually ...|3.490234375|   5|0.95|   0.04|        512|           22|0.7810114622116089|
|15pe5w

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1d6ukr|c9nh0y7|   13|This is a very de...|           22|
|1d6ukr|c9nj8gy|    8|All of physics is...|           22|
|1d6ukr|c9nk8b4|    3|One thing that co...|           22|
|1d6ukr|c9njtan|    3|What is reality? ...|           22|
|1d6ukr|c9nk819|    3|To a physicist, i...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|1d6ukr|          1|Are equations lik...|4.77734375|   5|0.95|   0.04|        512|           22|0.7971522808074951|
|1d6ukr|     

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|ig8cn|           22|     2024-01-13|Can anyone explai...|[Can anyone expla...|2.451171875|   5|0.95|   0.04|        512|0.7803729176521301|0.7608640193939209|0.8009085655212402|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|x8bjw|           22|     2024-01-13|Why can a human h...|[Why can a human ...|4.13671875|   5|0.95|   0.04|        512|0.7967013716697693|0.8063262104988098|0.787303626537323|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|ire3l|           22|     2024-01-13|So lightning trie...|[So lightning tri...|4.75390625|   5|0.95|   0.04|        512|0.8076081871986389|0.8089408874511719|0.8062798976898193|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|ivqox|           22|     2024-01-13|In-depth AC Curre...|[In-depth AC Curr...|2.744140625|   5|0.95|   0.04|        512|0.8185232281684875|0.8252367973327637|0.8119180202484131|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|18skrh|           22|     2024-01-13|Why do I appear t...|[Why do I appear ...|2.470703125|   5|0.95|   0.04|        512|0.7996916174888611|0.773158073425293|0.8281110525131226|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|2y74lh|           22|     2024-01-13|How do today's co...|[How do today's c...| 4.0546875|   5|0.95|   0.04|        512|0.8266816139221191|0.817297637462616|0.8362836837768555|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|mk0hf|           22|     2024-01-13|Tidal effects on ...|[Tidal effects on...|3.263671875|   5|0.95|   0.04|        512|0.8209373950958252|0.8292094469070435|0.8128288388252258|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|lpfoi|           22|     2024-01-13|Why do my neighbo...|[Why do my neighb...|    4.1875|   5|0.95|   0.04|        512|0.8068931698799133|0.7911851406097412|0.8232375383377075|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|39kl55|           22|     2024-01-13|AskScience AMA Se...|[AskScience AMA S...|3.111328125|   5|0.95|   0.04|        512|0.7715451121330261|0.7275182008743286|0.821243941783905|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|ttxj1|          1|Questions about E...|2.974609375|   5|0.95|   0.04|        512|           22|0.7859689593315125|
|ttxj1|          2|Questions about E...| 13.6953125|   5|0.95|   0.04|        512|           22|0.7753408551216125|
|ttxj1|          3|Questions about E...| 5.36328125|   5|0.95|   0.04|        512|           22| 0.793346107006073|
|ttxj1|          4|Questions about E...|    3.15625|   5|0.95|   0.04|        512|           22|0.7888255715370178|
|ttxj1|          5|Questions about E...|3.314453125|   5|0.95|   0.04|        512|           22|0.7678654789924622|
|ttxj1|          6|Questions about E...|  7.8359375|   5|0.95|   0.04|  

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|25c3gs|          1|Are there any ong...|5.46484375|   5|0.95|   0.04|        512|           22|0.8126732110977173|
|25c3gs|          2|Are there any ong...|  4.296875|   5|0.95|   0.04|        512|           22|0.8023010492324829|
|25c3gs|          3|Are there any ong...|4.80859375|   5|0.95|   0.04|        512|           22|0.8090031147003174|
|25c3gs|          4|Are there any ong...|5.40234375|   5|0.95|   0.04|        512|           22|0.8082781434059143|
|25c3gs|          5|Are there any ong...|3.41015625|   5|0.95|   0.04|        512|           22|0.7961015701293945|
|25c3gs|          6|Are there any ong...|5.75390625|   5|0.95|   0.04|  

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|13j48m|c7dldt7|    3|I wouldn't go so ...|           22|
|13j48m|c7elncg|    2|Check out the Enn...|           22|
|13j48m|c74pzj4|    2|Sorry OP, I don't...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|13j48m|          1|Knowing that the ...| 5.34765625|   5|0.95|   0.04|        512|           22|0.8229785561561584|
|13j48m|          2|Knowing that the ...|  6.4140625|   5|0.95|   0.04|        512|           22|0.8204338550567627|
|13j48m|

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|4xds4l|d6eltak|   19|That definition o...|           22|
|4xds4l|d6fg3p0|    3|As /u/Midtek poin...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|4xds4l|          1|Can the barycente...|3.615234375|   5|0.95|   0.04|        512|           22|0.8069972991943359|
|4xds4l|          2|Can the barycente...|3.931640625|   5|0.95|   0.04|        512|           22|0.8006327748298645|
|4xds4l|          3|Can the barycente...|  4.5390625|   5|0.95|   

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1d59rl|c9n0eda|   34|[Observe closely ...|           22|
|1d59rl|c9n0ekl|   22|Conservation of a...|           22|
|1d59rl|c9n0iib|    5|Master in Physic ...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|1d59rl|          1|How is a pulsar m...|3.759765625|   5|0.95|   0.04|        512|           22|0.7816426157951355|
|1d59rl|          2|How is a pulsar m...| 4.66796875|   5|0.95|   0.04|        512|           22|0.7892708778381348|
|1d59rl|

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|s13lz|           22|     2024-01-13|Where do the laws...|[Where do the law...| 3.5078125|   5|0.95|   0.04|        512|0.8069979548454285|0.7716290354728699|0.8457651138305664|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|7ez32n|           22|     2024-01-13|How do we know wh...|[How do we know w...| 6.2265625|   5|0.95|   0.04|        512|0.8371094465255737|0.8402694463729858|0.8339730501174927|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|41x64v|           22|     2024-01-13|What is the curre...|[What is the curr...| 4.6953125|   5|0.95|   0.04|        512|0.8185563683509827|0.8248023986816406|0.8124042749404907|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|2oy6h7|           22|     2024-01-13|How are there sti...|[How are there st...|3.544921875|   5|0.95|   0.04|        512|0.8037664294242859|0.8174750804901123|0.7905099391937256|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|4lvxop|           22|     2024-01-13|If you're stuck i...|[If you're stuck ...|4.51953125|   5|0.95|   0.04|        512|0.825114905834198|0.8372371792793274|0.8133386969566345|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|1l0lt3|           22|     2024-01-14|Why do we use Cae...|[Why do we use Ca...|3.935546875|   5|0.95|   0.04|        512|0.8153501152992249|0.8093662261962891|0.8214231729507446|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|tjyhn|           22|     2024-01-14|Are there studies...|[Are there studie...|3.373046875|   5|0.95|   0.04|        512|0.8040626049041748|0.791022539138794|0.8175398111343384|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|45z7k7|           22|     2024-01-14|How does this jpg...|[How does this jp...|   7.15625|   5|0.95|   0.04|        512|0.8273161053657532|0.8329873085021973|0.8217216730117798|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|iqdo4|           22|     2024-01-14|Which is more "gr...|[Which is more "g...|   3.53125|   5|0.95|   0.04|        512|0.7965657711029053|0.7568868398666382|0.8406350612640381|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|y1f2x|           22|     2024-01-14|In a hypothetical...|[In a hypothetica...|3.517578125|   5|0.95|   0.04|        512|0.8030703663825989|0.8078778386116028|0.7983197569847107|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|boj86f|           22|     2024-01-14|How do we tell th...|[How do we tell t...|3.662109375|   5|0.95|   0.04|        512|0.7987279295921326|0.8161487579345703|0.782035231590271|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1y24cu|           22|     2024-01-14|Why don't we shie...|[Why don't we shi...|   3.28125|   5|0.95|   0.04|        512|0.8034979701042175|0.7980591654777527|0.8090114593505859|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|bh1bdv|           22|     2024-01-14|How do we know it...|[How do we know i...|3.30859375|   5|0.95|   0.04|        512|0.8016996383666992|0.7896528840065002|0.8141196370124817|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|2en0rs|           22|     2024-01-14|How is the landin...|[How is the landi...|3.486328125|   5|0.95|   0.04|        512|0.7933474779129028|0.7705841064453125|0.8174967169761658|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|c2iqoq|           22|     2024-01-14|AskScience AMA Se...|[AskScience AMA S...|3.708984375|   5|0.95|   0.04|        512|0.796703577041626|0.7729617357254028|0.8219501376152039|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|moi60|          1|If you mix 1 gall...|   3.234375|   5|0.95|   0.04|        512|           22|0.7984679341316223|
|moi60|          2|If you mix 1 gall...|3.369140625|   5|0.95|   0.04|        512|           22|0.7895218133926392|
|moi60|          3|If you mix 1 gall...|    3.84375|   5|0.95|   0.04|        512|           22|0.8031167387962341|
|moi60|          4|If you mix 1 gall...| 4.44921875|   5|0.95|   0.04|        512|           22|0.8141524195671082|
|moi60|          5|If you mix 1 gall...|3.689453125|   5|0.95|   0.04|        512|           22|0.7995749115943909|
|moi60|          6|If you mix 1 gall...|   4.109375|   5|0.95|   0.04|  

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|s1u61|c4ago8d|   42|Dilution of absin...|           22|
|s1u61|c4agp5q|   23|Louching. Differe...|           22|
|s1u61|c4ahbdm|   13|Louching is corre...|           22|
|s1u61|c4af1lu|   12|Alcohols are noto...|           22|
|s1u61|c4ah5z0|    2|Try pouring out a...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|s1u61|          1|Just what exactly...| 6.67578125|   5|0.95|   0.04|        512|           22|0.8211256265640259|
|s1u61|          2|Jus

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|7yey6x|dugp0ki|    5|There is dust eve...|           22|
|7yey6x|dugbt7s|    5|It's easier than ...|           22|
|7yey6x|dughy58|    3|On the Rosetta pr...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|7yey6x|          1|How do they catch...| 6.24609375|   5|0.95|   0.04|        512|           22|0.8429498076438904|
|7yey6x|          2|How do they catch...| 4.04296875|   5|0.95|   0.04|        512|           22|0.8322837352752686|
|7yey6x|

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|kkln5|c2l0aeo|    7|Here's the [curre...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|kkln5|          1|Went to shoot som...|     4.3125|   5|0.95|   0.04|        512|           22|0.8145556449890137|
|kkln5|          2|Went to shoot som...|  5.5390625|   5|0.95|   0.04|        512|           22|0.8069400191307068|
|kkln5|          3|Went to shoot som...|  5.1484375|   5|0.95|   0.04|        512|           22|0.8148607015609741|
|kkln5|          4

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|soim6|c4fot30|   42|The neurons in yo...|           22|
|soim6|c4fu70m|   24|We don't really k...|           22|
|soim6|c4fvkd4|    3|As a sub-question...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|soim6|          1|How does getting ...|3.955078125|   5|0.95|   0.04|        512|           22|0.7795219421386719|
|soim6|          2|How does getting ...| 2.82421875|   5|0.95|   0.04|        512|           22|0.7260559797286987|
|soim6|          3|H

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|26ytew|chvs6tg|   72|It depends on the...|           22|
|26ytew|chw7xb4|   12|There have been a...|           22|
|26ytew|chvrfrn|    7|General medicatio...|           22|
|26ytew|chvybfh|    4|Most drugs don't ...|           22|
|26ytew|chwar9c|    2|Thats hard to say...|           22|
|26ytew|chvxutr|    2|They don't. It's ...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|26ytew|          1|How do medication...|3.658203125|   5|0.95|   0.

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|11gsi1|c6mea7t|   98|The tensile stren...|           22|
|11gsi1|c6mdlhv|   18|Also another ques...|           22|
|11gsi1|c6mgl58|   14|An important thin...|           22|
|11gsi1|c6mdpy5|    6|Another question:...|           22|
|11gsi1|c6mpb4k|    5|metric conversion...|           22|
|11gsi1|c6mlzop|    4|Not really an ans...|           22|
|11gsi1|c6mdetd|    4|They state somewh...|           22|
|11gsi1|c6mfi15|    3|How does the ball...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1eab0n|c9yfcy4|    2|This seems to me ...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|1eab0n|          1|[Econ] Would trac...|  6.609375|   5|0.95|   0.04|        512|           22|0.8305953145027161|
|1eab0n|          2|[Econ] Would trac...|4.83984375|   5|0.95|   0.04|        512|           22|0.8080331683158875|
|1eab0n|          3|[Econ] Would trac...|3.87109375|   5|0.95|   0.04|        512|           22|0.7980952262878418|
|1eab0n|     

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|nqsmp|c3b98ie|   33|As elchip stated,...|           22|
|nqsmp|c3b85x6|    8|One of the nutrit...|           22|
|nqsmp|c3b7lfx|    7|frying them of co...|           22|
|nqsmp|c3b90i7|    6|He ate raw eggs b...|           22|
|nqsmp|c3b79fy|    5|Apparently, there...|           22|
|nqsmp|c3bam7f|    5|I don't think tha...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|nqsmp|          1|So i was watching...|     6.625|   5|0.95|   0.04|        512|   

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|v5a90|c51gnkl|   10|What is he a prof...|           22|
|v5a90|c51h085|    9|As other commente...|           22|
|v5a90|c51gnii|    6|There is no diffe...|           22|
|v5a90|c51h1no|    4|> Microevolution ...|           22|
|v5a90|c51grc1|    2|Basically, every ...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|v5a90|          1|This letter to th...|3.73046875|   5|0.95|   0.04|        512|           22|0.8013885021209717|
|v5a90|          2|This le

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|ym13k|c5wspc2|    2|Guy with master's...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|ym13k|          1|Field Biologists/...|   5.21875|   5|0.95|   0.04|        512|           22|0.8045538067817688|
|ym13k|          2|Field Biologists/...|5.77734375|   5|0.95|   0.04|        512|           22|0.8067424893379211|
|ym13k|          3|Field Biologists/...| 5.0234375|   5|0.95|   0.04|        512|           22|0.7906307578086853|
|ym13k|          4|Field

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|18sdkr|c8hjsdm|    6|The shadow is bei...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|18sdkr|          1|Askscience, can y...|   4.40625|   5|0.95|   0.04|        512|           22|0.7955167889595032|
|18sdkr|          2|Askscience, can y...|  5.171875|   5|0.95|   0.04|        512|           22|   0.8078573346138|
|18sdkr|          3|Askscience, can y...|  3.171875|   5|0.95|   0.04|        512|           22|0.7836781144142151|
|18sdkr|     

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|j2qo9|c28nins|   25|The central idea ...|           22|
|j2qo9|c28nknt|   16|If you want to ac...|           22|
|j2qo9|c28njt4|   15|There isn't *real...|           22|
|j2qo9|c28nnzl|    8|The major hurdle ...|           22|
|j2qo9|c28p176|    6|There seem to be ...|           22|
|j2qo9|c28nl8r|    6|Can you explain w...|           22|
|j2qo9|c28pjh1|    3|Explaining quantu...|           22|
|j2qo9|c28o9qq|    2|If you want to ge...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|tnmay|c4o5g3z|    4|The condition is ...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|tnmay|          1|saw this picture ...|2.873046875|   5|0.95|   0.04|        512|           22|0.7584500312805176|
|tnmay|          2|saw this picture ...|   9.359375|   5|0.95|   0.04|        512|           22|0.7931640148162842|
|tnmay|          3|saw this picture ...| 4.81640625|   5|0.95|   0.04|        512|           22|0.7597366571426392|
|tnmay|          4

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|3hfq33|cu6zx7h|   16|My question is a ...|           22|
|3hfq33|cu6ybku|   12|What technology w...|           22|
|3hfq33|cu6yc6i|   10|If you were a bet...|           22|
|3hfq33|cu71916|    6|Hi pfisico! What ...|           22|
|3hfq33|cu75xqv|    6|When you build yo...|           22|
|3hfq33|cu6zdxm|    5|What's your take ...|           22|
|3hfq33|cu6zjpz|    5|Do you work with ...|           22|
|3hfq33|cu79cfh|    2|Are you hiring he...|           22|
|3hfq33|cu72uou|    2|As a high school ...|           22|
|3hfq33|cu74m86|    2|Question on Antar...|           22|
|3hfq33|cu7bfh8|    2|Been trying to ge...|           22|
|3hfq33|cu75lmq|    2|Hi there. I have ...|           22|
|3hfq33|cu7evc6|    2|Oh man, I know so...|           22|
|3hfq33|cu72u60|    2|People often scof...|           22|
|3hfq33|cu72oh

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|2mrx6b|           22|     2024-01-15|How does an equal...|[How does an equa...|3.029296875|   5|0.95|   0.04|        512|0.8073487877845764|0.8082796335220337|0.806420087814331|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|20ercb|          1|FAQ Friday: Pi Da...| 4.81640625|   5|0.95|   0.04|        512|           22|0.7998464107513428|
|20ercb|          2|FAQ Friday: Pi Da...| 3.80078125|   5|0.95|   0.04|        512|           22|0.7985919713973999|
|20ercb|          3|FAQ Friday: Pi Da...|  4.6328125|   5|0.95|   0.04|        512|           22|0.7835824489593506|
|20ercb|          4|FAQ Friday: Pi Da...| 4.27734375|   5|0.95|   0.04|        512|           22|0.8196255564689636|
|20ercb|          5|FAQ Friday: Pi Da...| 3.09765625|   5|0.95|   0.04|        512|           22|0.7843578457832336|
|20ercb|          6|FAQ Friday: Pi Da...|3.833984375|   5|0.95| 

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1x2qsv|cf7mdrj|   10|Hey everyone :) \...|           22|
|1x2qsv|cf7my09|    9|What are the big ...|           22|
|1x2qsv|cf7rubo|    7|Every time I see ...|           22|
|1x2qsv|cf7pxgl|    7|Hi, I have always...|           22|
|1x2qsv|cf7p5s0|    6|What exactly is a...|           22|
|1x2qsv|cf7ojxu|    6|Edit: If volume i...|           22|
|1x2qsv|cf7to86|    6|Why is Calculus i...|           22|
|1x2qsv|cf7nboi|    5|Can someone give ...|           22|
|1x2qsv|cf7mhkz|    5|I'm not sure if t...|           22|
|1x2qsv|cf7nhko|    5|If everything was...|           22|
|1x2qsv|cf7m2go|    5|What exactly is i...|           22|
|1x2qsv|cf7r4yy|    4|Specific ChemE qu...|           22|
|1x2qsv|cf7odfy|    4|Mathematics quest...|           22|
|1x2qsv|cf7qr3j|    4|With unlimited re...|           22|
|1x2qsv|cf7mka

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|bw9o1n|          1|How / Is domestic...| 6.87890625|   5|0.95|   0.04|        512|           22|0.8280754089355469|
|bw9o1n|          2|How / Is domestic...|3.130859375|   5|0.95|   0.04|        512|           22|0.8097304105758667|
|bw9o1n|          3|How / Is domestic...|  8.3203125|   5|0.95|   0.04|        512|           22|0.8299633264541626|
|bw9o1n|          4|How / Is domestic...|   5.484375|   5|0.95|   0.04|        512|           22| 0.810135543346405|
|bw9o1n|          5|How / Is domestic...| 6.01953125|   5|0.95|   0.04|        512|           22|0.8100488185882568|
|bw9o1n|          6|How / Is domestic...|  7.8828125|   5|0.95| 

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|15jvz2|           22|     2024-01-15|Is this one scene...|[Is this one scen...|     4.375|   5|0.95|   0.04|        512|0.7891635894775391|0.7667708992958069|0.8129034042358398|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|tklkm|           22|     2024-01-15|Is there an entym...|[Is there an enty...|5.26171875|   5|0.95|   0.04|        512|0.8089430928230286|0.8039867877960205|0.8139609098434448|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|xvtwz|           22|     2024-01-15|What is (or can) ...|[What is (or can)...|4.69140625|   5|0.95|   0.04|        512|0.8049991726875305|0.8037896156311035|0.806212306022644|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|1oi2rx|           22|     2024-01-15|Question about He...|[Question about H...|2.29296875|   5|0.95|   0.04|        512|0.7595802545547485|0.7483642101287842|0.771137535572052|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|16hwjc|           22|     2024-01-15|What is the retro...|[What is the retr...|   5.09375|   5|0.95|   0.04|        512|0.8014703392982483|0.7891312837600708|0.814201295375824|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|we3d5|           22|     2024-01-15|I have a lamp tha...|[I have a lamp th...|3.24609375|   5|0.95|   0.04|        512|0.7884899377822876|0.7626361846923828|0.8161580562591553|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|1lp9or|          1|What actually hap...| 4.4140625|   5|0.95|   0.04|        512|           22| 0.793377161026001|
|1lp9or|          2|What actually hap...|4.83203125|   5|0.95|   0.04|        512|           22|0.7950102686882019|
|1lp9or|          3|What actually hap...|4.76953125|   5|0.95|   0.04|        512|           22|0.8011575937271118|
|1lp9or|          4|What actually hap...|3.50390625|   5|0.95|   0.04|        512|           22|0.7678574919700623|
|1lp9or|          5|What actually hap...|  4.703125|   5|0.95|   0.04|        512|           22|0.7938244938850403|
|1lp9or|          6|What actually hap...|4.87109375|   5|0.95|   0.04|  

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|4kzlyh|d3j7l4e|   28|The universe is a...|           22|
|4kzlyh|d3jprh8|    8|Close minded may ...|           22|
|4kzlyh|d3j7lt7|    3|The Goldilocks zo...|           22|
|4kzlyh|d3k54zh|    3|Life is a chemica...|           22|
|4kzlyh|d3jo5ph|    2|It's where life *...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|4kzlyh|          1|When looking for ...|  7.140625|   5|0.95|   0.04|        512|           22| 0.825448751449585|
|4kzlyh|     

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1820zu|           22|     2024-01-15|Is there a maximu...|[Is there a maxim...|2.02734375|   5|0.95|   0.04|        512|0.7800736427307129|0.7649280428886414|0.7958310842514038|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|1rtwlk|           22|     2024-01-15|How can I find th...|[How can I find t...|3.041015625|   5|0.95|   0.04|        512|0.816680908203125|0.8131568431854248|0.8202357292175293|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+----------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|          recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+----------------+
|ktpn9|           22|     2024-01-15|Not sure where to...|[Not sure where t...|3.23828125|   5|0.95|   0.04|        512|0.7952874898910522|0.7549537420272827|0.84017413854599|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+----------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+----

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|lgcbr|           22|     2024-01-15|How does current ...|[How does current...|3.47265625|   5|0.95|   0.04|        512|0.7989594340324402|0.8033317923545837|0.7946344017982483|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|2exq0t|           22|     2024-01-15|I am learning abo...|[I am learning ab...| 5.9140625|   5|0.95|   0.04|        512|0.8213942050933838|0.827143669128418|0.8157241344451904|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|2v1iqu|           22|     2024-01-15|If I'm travelling...|[If I'm travellin...|2.728515625|   5|0.95|   0.04|        512|0.8161048889160156|0.7836393713951111|0.8513766527175903|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|17ysps|           22|     2024-01-16|Settle a planet s...|[Settle a planet ...|  2.859375|   5|0.95|   0.04|        512|0.7948578596115112|0.7787840366363525|0.811609148979187|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|mtfms|           22|     2024-01-16|School "tradition...|[School "traditio...|  5.578125|   5|0.95|   0.04|        512|0.804888904094696|0.7888216972351074|0.8216242790222168|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+-----------------+
|142uer|           22|     2024-01-16|So my mom was dia...|[So my mom was di...|4.58984375|   5|0.95|   0.04|        512|0.775231122970581|0.7691047191619873|0.781455934047699|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|w74y9|           22|     2024-01-16|What animal does ...|[What animal does...|2.806640625|   5|0.95|   0.04|        512|0.7844117879867554|0.757047176361084|0.8138288259506226|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|rxa0y|           22|     2024-01-16|If I hung a tank ...|[If I hung a tank...|  4.765625|   5|0.95|   0.04|        512|0.8265182375907898|0.837007999420166|0.8162881135940552|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|5jlwcj|           22|     2024-01-16|Is there such a t...|[Is there such a ...|4.03515625|   5|0.95|   0.04|        512|0.8331998586654663|0.8144122362136841|0.8528748154640198|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|1mj7uc|           22|     2024-01-16|Is natural select...|[Is natural selec...|3.080078125|   5|0.95|   0.04|        512|0.7922447919845581|0.7959842681884766|0.7885403633117676|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1ff1sd|           22|     2024-01-16|How are new surgi...|[How are new surg...|4.58984375|   5|0.95|   0.04|        512|0.8101515173912048|0.8074473738670349|0.8128737211227417|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|vx8dp|          1|I saw the sky nor...|  7.3515625|   5|0.95|   0.04|        512|           22|0.8244578242301941|
|vx8dp|          2|I saw the sky nor...|        6.0|   5|0.95|   0.04|        512|           22|0.8009543418884277|
|vx8dp|          3|I saw the sky nor...| 5.08984375|   5|0.95|   0.04|        512|           22|0.7955801486968994|
|vx8dp|          4|I saw the sky nor...|  8.9140625|   5|0.95|   0.04|        512|           22|0.8241574168205261|
|vx8dp|          5|I saw the sky nor...|3.658203125|   5|0.95|   0.04|        512|           22|0.7972234487533569|
|vx8dp|          6|I saw the sky nor...| 4.39453125|   5|0.95|   0.04|  

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|r4a0f|c42vkg3|    2|A great Dane and ...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|r4a0f|          1|Have we yet bred ...|  5.484375|   5|0.95|   0.04|        512|           22|0.7970539331436157|
|r4a0f|          2|Have we yet bred ...|4.81640625|   5|0.95|   0.04|        512|           22|0.8019782900810242|
|r4a0f|          3|Have we yet bred ...|5.40234375|   5|0.95|   0.04|        512|           22|0.8007242679595947|
|r4a0f|          4|Have 

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|16reh2|           22|     2024-01-16|Since electricity...|[Since electricit...|3.357421875|   5|0.95|   0.04|        512|0.8085730671882629|0.7930305004119873|0.8247370719909668|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|le5ps|           22|     2024-01-16|I've been out of ...|[I've been out of...| 6.5859375|   5|0.95|   0.04|        512|0.8337713479995728|0.8238707184791565|0.8439128398895264|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|11gprd|           22|     2024-01-16|I've read that ex...|[I've read that e...| 8.1953125|   5|0.95|   0.04|        512|0.8350102305412292|0.8373740911483765|0.8326597213745117|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|buz4j9|           22|     2024-01-16|Red sprites what ...|[Red sprites what...| 2.7109375|   5|0.95|   0.04|        512|0.7977606654167175|0.7999634146690369|0.7955701351165771|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|nv8ed|           22|     2024-01-16|Does breathing th...|[Does breathing t...| 4.9765625|   5|0.95|   0.04|        512|0.8181844353675842|0.808514416217804|0.8280885815620422|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|2xl0x9|          1|Why are the 3 mos...| 4.96484375|   5|0.95|   0.04|        512|           22|0.8159142136573792|
|2xl0x9|          2|Why are the 3 mos...|     5.0625|   5|0.95|   0.04|        512|           22|0.8128487467765808|
|2xl0x9|          3|Why are the 3 mos...| 4.33203125|   5|0.95|   0.04|        512|           22|0.8080530762672424|
|2xl0x9|          4|Why are the 3 mos...|  5.0234375|   5|0.95|   0.04|        512|           22|0.7970060706138611|
|2xl0x9|          5|Why are the 3 mos...| 4.53515625|   5|0.95|   0.04|        512|           22|0.8117544054985046|
|2xl0x9|          6|Why are the 3 mos...|  4.0390625|   5|0.95| 

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1fzrel|caff3j4|   61|Marine biologist ...|           22|
|1fzrel|cafhem4|   11|I'm not sure exac...|           22|
|1fzrel|cafekgy|    4|There have been n...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|1fzrel|          1|Would there be ne...|7.89453125|   5|0.95|   0.04|        512|           22|0.8180540204048157|
|1fzrel|          2|Would there be ne...| 4.5390625|   5|0.95|   0.04|        512|           22|0.8196537494659424|
|1fzrel|     

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|15gsnw|c7mb2cj|   17|You *can*, but yo...|           22|
|15gsnw|c7mbjjs|    6|It's more accurat...|           22|
|15gsnw|c7mbqk8|    5|I had a long disc...|           22|
|15gsnw|c7mb9sd|    2|I think people sa...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|15gsnw|          1|Can we really say...| 5.25390625|   5|0.95|   0.04|        512|           22|0.8106337189674377|
|15gsnw|          2|Can we really say...|3.736328125|   5|0.95|   0

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|32ai88|           22|     2024-01-16|Theoretically, co...|[Theoretically, c...|2.103515625|   5|0.95|   0.04|        512|0.7925633192062378|0.7635771632194519|0.8238369226455688|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|3qa403|           22|     2024-01-16|If I dug down to ...|[If I dug down to...|2.78515625|   5|0.95|   0.04|        512|0.8104963898658752|0.7814352512359619|0.8418025970458984|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|liqfp|           22|     2024-01-17|Is there a way to...|[Is there a way t...|7.55078125|   5|0.95|   0.04|        512|0.8279849886894226|0.8157148361206055|0.8406299352645874|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|3q6g54|           22|     2024-01-17|How does the Born...|[How does the Bor...|3.810546875|   5|0.95|   0.04|        512|0.8012317419052124|0.8117799758911133|0.7909542322158813|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|bqx0zd|           22|     2024-01-17|How do you calcul...|[How do you calcu...|2.744140625|   5|0.95|   0.04|        512|0.7662392854690552|0.7527003288269043|0.7802742123603821|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1uuju0|           22|     2024-01-17|When doing a extr...|[When doing a ext...| 3.8828125|   5|0.95|   0.04|        512|0.8076701164245605|0.7843775749206543|0.8323884010314941|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|37aty8|           22|     2024-01-17|Is there a way to...|[Is there a way t...|  2.640625|   5|0.95|   0.04|        512|0.7841000556945801|0.7721912860870361|0.796381950378418|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+
|vxdhy|           22|     2024-01-17|World's oldest ax...|[World's oldest a...| 4.0546875|   5|0.95|   0.04|        512|0.8135702610015869|0.8033981919288635|0.824003279209137|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+-----------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|


+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|r3hte|           22|     2024-01-17|What were those t...|[What were those ...|6.62890625|   5|0.95|   0.04|        512|0.8254246115684509|0.8004842400550842|0.8519691228866577|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|           recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+
|5dpu0z|           22|     2024-01-17|What is the faste...|[What is the fast...|3.197265625|   5|0.95|   0.04|        512|0.8023340702056885|0.7978113889694214|0.806908369064331|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+-----------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+
|1yxv0t|           22|     2024-01-17|The recent talk a...|[The recent talk ...|5.04296875|   5|0.95|   0.04|        512|0.820966899394989|0.807094156742096|0.8353248238563538|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|70wiw4|           22|     2024-01-17|There is a video ...|[There is a video...|11.9921875|   5|0.95|   0.04|        512|0.8161483407020569|0.8297010660171509|0.8030312061309814|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+
|3thdzs|           22|     2024-01-17|Can you distingui...|[Can you distingu...|2.318359375|   5|0.95|   0.04|        512|0.7848502397537231|0.758049488067627|0.8136154413223267|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|175krg|           22|     2024-01-17|How cold would it...|[How cold would i...|3.80078125|   5|0.95|   0.04|        512|0.8103695511817932|0.7834510207176208|0.8392037153244019|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+
|55raco|           22|     2024-01-17|If everything con...|[If everything co...|4.12109375|   5|0.95|   0.04|        512|0.790366530418396|0.7744258046150208|0.8069772720336914|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1oqq1z|           22|     2024-01-17|Is there a physic...|[Is there a physi...|5.69140625|   5|0.95|   0.04|        512|0.8264742493629456|0.8257762789726257|0.8271733522415161|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1osa8g|           22|     2024-01-17|What's the psycho...|[What's the psych...|2.76953125|   5|0.95|   0.04|        512|0.7932797074317932|0.7783790826797485|0.8087617754936218|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+
|1dpax3|           22|     2024-01-17|Motherboard Circu...|[Motherboard Circ...|3.45703125|   5|0.95|   0.04|        512|0.7914592027664185|0.772700846195221|0.8111510872840881|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experime

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|               f1|        precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+
|7i1xvy|           22|     2024-01-17|Can you bend spac...|[Can you bend spa...|   4.09375|   5|0.95|   0.04|        512|0.823735237121582|0.819415807723999|0.8281004428863525|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+-----------------+-----------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|8r7xwe|           22|     2024-01-17|What happened to ...|[What happened to...|3.09765625|   5|0.95|   0.04|        512|0.8233450651168823|0.8136363625526428|0.8332881927490234|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|cihgyf|           22|     2024-01-17|Why don't autogyr...|[Why don't autogy...|     3.625|   5|0.95|   0.04|        512|0.8136127591133118|0.7978034615516663|0.8300611972808838|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|r5few|           22|     2024-01-17|Upwelling vs Down...|[Upwelling vs Dow...|3.01171875|   5|0.95|   0.04|        512|0.8192566633224487|0.8126188516616821|0.8260038495063782|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|               f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+
|1syy4a|           22|     2024-01-17|Avagadro's consta...|[Avagadro's const...|2.771484375|   5|0.95|   0.04|        512|0.802669107913971|0.8135932087898254|0.7920345067977905|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+-----------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|2pta7j|           22|     2024-01-18|Would it be possi...|[Would it be poss...|3.333984375|   5|0.95|   0.04|        512|0.8062828183174133|0.7827916145324707|0.8312276005744934|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|ua00e|           22|     2024-01-18|AskScience AMA Se...|[AskScience AMA S...|3.619140625|   5|0.95|   0.04|        512|0.7609689235687256|0.7368762493133545|0.7866902351379395|
+-----+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|exper

+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
| q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|ikjdt|           22|     2024-01-18|Which major would...|[Which major woul...| 4.6796875|   5|0.95|   0.04|        512|0.8127272725105286|0.7881322503089905|0.8389067649841309|
+-----+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|4w3w9d|           22|     2024-01-18|Are there any exa...|[Are there any ex...|2.845703125|   5|0.95|   0.04|        512|0.8011584281921387|0.7868137955665588|0.8160357475280762|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|2wp8vb|          1|Does the surface ...|  4.2265625|   5|0.95|   0.04|        512|           22|0.8132811784744263|
|2wp8vb|          2|Does the surface ...| 4.25390625|   5|0.95|   0.04|        512|           22|0.8419330716133118|
|2wp8vb|          3|Does the surface ...|  5.1015625|   5|0.95|   0.04|        512|           22|0.8179787397384644|
|2wp8vb|          4|Does the surface ...|3.541015625|   5|0.95|   0.04|        512|           22|0.8164732456207275|
|2wp8vb|          5|Does the surface ...|   6.046875|   5|0.95|   0.04|        512|           22|0.8359659910202026|
|2wp8vb|          6|Does the surface ...|     4.1875|   5|0.95| 

+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer|perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+----------+----+----+-------+-----------+-------------+------------------+
|k3npw|          1|with all this gro...| 6.1796875|   5|0.95|   0.04|        512|           22|0.8263835310935974|
|k3npw|          2|with all this gro...| 5.6328125|   5|0.95|   0.04|        512|           22|0.8188007473945618|
|k3npw|          3|with all this gro...|      9.25|   5|0.95|   0.04|        512|           22|0.8284591436386108|
|k3npw|          4|with all this gro...|      5.25|   5|0.95|   0.04|        512|           22|0.8245086073875427|
|k3npw|          5|with all this gro...|  9.828125|   5|0.95|   0.04|        512|           22|0.8255050778388977|
|k3npw|          6|with all this gro...|   5.40625|   5|0.95|   0.04|        512

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|1gza0c|capcl2v|  148|Please everyone r...|           22|
|1gza0c|capd0dr|  108|The individual be...|           22|
|1gza0c|capiutx|   75|I haven't been ab...|           22|
|1gza0c|capqy2a|    7|I've just tried i...|           22|
|1gza0c|capgrrr|    7|Has anyone mentio...|           22|
|1gza0c|capnpid|    5|Could it be that ...|           22|
|1gza0c|caqqqal|    5|I found some actu...|           22|
|1gza0c|carc34c|    3|Steve Mould just ...|           22|
|1gza0c|ccgmgmm|    2|Let us model this...|           22|
+------+-------+-----+--------------------+-------------+

+------+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|  q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+------+---

+-----+-------+-----+--------------------+-------------+
| q_id|   a_id|score|         true_answer|experiment_id|
+-----+-------+-----+--------------------+-------------+
|n8516|c377fdc|    2|This happened to ...|           22|
+-----+-------+-----+--------------------+-------------+

+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
| q_id|sampling_id|              answer| perplexity|topk|topp|dropout|text_length|experiment_id|      bertscore_f1|
+-----+-----------+--------------------+-----------+----+----+-------+-----------+-------------+------------------+
|n8516|          1|Why do certain th...| 6.37109375|   5|0.95|   0.04|        512|           22|0.7920876741409302|
|n8516|          2|Why do certain th...|  6.1328125|   5|0.95|   0.04|        512|           22| 0.806871235370636|
|n8516|          3|Why do certain th...|     4.5625|   5|0.95|   0.04|        512|           22|0.8013501167297363|
|n8516|          4

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|experiment_id|
+------+-------+-----+--------------------+-------------+
|5jkwtx|dbgwyiy|  455|If you're wonderi...|           22|
|5jkwtx|dbh611y|  323|How does someone ...|           22|
|5jkwtx|dbh5ut4|  188|Is it possible th...|           22|
|5jkwtx|dbh0x9p|   90|What effect, if a...|           22|
|5jkwtx|dbhbpok|   76|I read the BBC ar...|           22|
|5jkwtx|dbh7oj8|   62|Why is there such...|           22|
|5jkwtx|dbgy1pw|   40|Would a black hol...|           22|
|5jkwtx|dbh7rvp|   30|The paper says th...|           22|
|5jkwtx|dbhv6dg|   23|What are the chan...|           22|
|5jkwtx|dbhkyyt|   21|Hey. This isn't e...|           22|
|5jkwtx|dbh8kay|   20|Observation here,...|           22|
|5jkwtx|dbhc0zu|   15|Correct me if I'm...|           22|
|5jkwtx|dbh53sn|   15|Here's yet anothe...|           22|
|5jkwtx|dbh9ndf|   11|any ideas why mat...|           22|
|5jkwtx|dbhcsk

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|287l02|           22|     2024-01-18|why would natural...|[why would natura...|6.87890625|   5|0.95|   0.04|        512|0.8206769227981567|0.8242698311805725|0.8171151876449585|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer| perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+
|1ggjcg|           22|     2024-01-18|Do "eye crusties"...|[Do "eye crusties...|3.986328125|   5|0.95|   0.04|        512|0.8182999491691589|0.7983654737472534|0.8392554521560669|
+------+-------------+---------------+--------------------+--------------------+-----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answe

+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|  q_id|experiment_id|experiment_date|            question|        model_answer|perplexity|topk|topp|dropout|text_length|                f1|         precision|            recall|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+
|1necyi|           22|     2024-01-18|Can people teach ...|[Can people teach...|2.78515625|   5|0.95|   0.04|        512|0.8291912078857422|0.8168652057647705|0.8418948650360107|
+------+-------------+---------------+--------------------+--------------------+----------+----+----+-------+-----------+------------------+------------------+------------------+

+------+-------+-----+--------------------+-------------+
|  q_id|   a_id|score|         true_answer|exp

                                                                                

+------+-------------+--------------+-----------+---------------+-------------------+--------------------+---+
|  q_id|experiment_id|mod_perplexity|sampling_id|samp_perplexity|mod_perplexity_norm|samp_perplexity_norm|bin|
+------+-------------+--------------+-----------+---------------+-------------------+--------------------+---+
|100ofr|           22|        5.6875|          6|     4.52734375|  0.368220742150333| 0.08115789473684212|  4|
|100ofr|           22|        5.6875|          7|     4.66015625|  0.368220742150333| 0.08473684210526317|  4|
|100ofr|           22|        5.6875|          8|     3.40234375|  0.368220742150333|0.050842105263157904|  4|
|100ofr|           22|        5.6875|          9|     3.43359375|  0.368220742150333|  0.0516842105263158|  4|
|100ofr|           22|        5.6875|         10|     4.25390625|  0.368220742150333| 0.07378947368421054|  4|
|100ofr|           22|        5.6875|          1|     4.06640625|  0.368220742150333| 0.06873684210526315|  4|
|

## References
Fan, A., Jernite, Y., Perez, E., Grangier, D., Weston, J. & Auli, M. (2019). ELI5: Long Form Question Answering. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3558–3567. https://aclanthology.org/P19-1346.pdf


Zhang, T., Kishore, V., Wu, F., Weinberger, K., & Artzi, Y. (2020). Bertscore: Eval- uating text generation with bert. https://openreview.net/pdf?id=SkeHuCVFDr

Michael, H. & Ondrej, B.(2021). Fine-Grained Analysis of BERTScore. Proceedings of the Sixth Conference on Machine Translation (WMT), pages 507–517. https://aclanthology.org/2021.wmt-1.59.pdf
