In [1]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.5.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.5.1-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.4/491.4 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (1

In [None]:
from openai import OpenAI
import os
import re
import pandas as pd
import time
from tqdm import tqdm
import requests
from datasets import load_dataset
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM

pd.options.display.max_rows = 999


import warnings
warnings.filterwarnings("ignore")


In [None]:
class Evaluator:
    EVALUATION_PROMPT_TEMPLATE = """

    You will be given a query and a bot response. Your task is to evaluate the bot response based on the specified metric.

    **Instructions**:
    1. Understand and evaluate the bot response.
    2. Penalize:
    - Irrelevant, distracting, or contradictory details or additional information.
    (Note: Do not penalize additional information if it enhances clarity, context, or usefulness without distracting from the main message.)

    3. Only give a high score if the bot response is concise, focused, and includes all critical points without any unnecessary or distracting content.

    Do **not** write any explanation—only return the numeric score.


    **Evaluation Criteria**:
    {criteria}

    **Steps for Evaluation**:
    {steps}

    **Question from the User**:
    {document}

    **Bot Response**:
    {summary}

    **Evaluation Form (scores ONLY, numeric value between 1 and 5)**:
    {metric_name} (Provide only a number as the response, no text or explanation. It ca be a decimal number, e.g., 2.5):
    """

    METRICS = {
        "Overall":
            (
            """
            This overall scoring framework combines elements from clarity, exhaustiveness, and correctness to provide a balanced evaluation of the bot's response.
            Compare the bot response with the expected response based on these criteria.
            """,
            """
            Criteria for score:
            Evaluate the bot response to determine the score.

            **Correctness Check:**
            - Is the response factually accurate with no misleading information?
            - If the response contains **a partial inaccuracy that does not fully distort the meaning**, it should receive **a score of 5, not less**.

            **Exhaustiveness Check:**
            - Does the response cover all key points from the expected response?
            - If the response is mostly complete but **misses minor details** or **implicitly conveys the point**, it qualifies for **a score of 4 or 5**.
            - If the response omits **critical** parts of the expected response, it qualifies for a **score of 1 or 2**.
            - Does the response fulfill the user's intent or the task's objective appropriately?
            - Even if technically correct and complete, if the response lacks practical usefulness or relevance to the intended audience, it should be scored lower (3 or below).
            *Note: For open-ended tasks (creative, generative, role-based), “critical details” refer to essential elements implied by the task (e.g., a clear call to action in promotional text, functional code in developer tasks, valid moves in game simulations). Missing stylistic flourishes or non-essential elaboration should not be treated as missing critical details.*


            **Clarity Check:**
            - Is the response clear, logical, and easy to understand?
            - Poor structure or vague wording reduces the score to **2 or 1**, depending on severity.
            - If a response is somewhat unclear but does not **mislead the reader**, it qualifies for **a score of 3 rather than 2 or 1**.

            **Assign a Score:**
            - **Score 4.5-5 (Excellent – Fully correct, comprehensive, and crystal clear. No meaningful improvements needed)**
            - Fully accurate, no factual errors or misleading content.
            - No factual errors or misleading information.
            - Covers all key points without omitting critical details.
            - Clear, well-structured and easy to understand.

            - **Score 4.0-4.49 (Very Good – Almost perfect. Minor details could be improved, but overall highly effective)**
            - Minor issues in phrasing or nuance, but factually sound.
            - Misses minor details or presents some info implicitly, but still complete.
            - Very clear, with minor improvements possible in writing or organization.


            - **Score 3.0-3.9 (Adequate / Fair – Generally solid, but some areas (content, accuracy, or clarity) need refinement)**

            - Small inaccuracies or ambiguities that do not change the core meaning
            - Misses some important elements, but conveys the main idea.
            - Understandable, but may contain awkward phrasing or structural issues.

            - **Score 2.0-2.9 (Poor – Significant issues with accuracy, missing important points, or unclear wording.)**

            - Partially incorrect content, possibly misleading.
            - Omits critical information or significantly reduces content.
            - Hard to follow or understand without rereading.

            - **Score 1.0-1.9 (Unacceptable – Major factual errors, critical omissions, or very poor clarity. Requires complete revision)**

            - Major factual errors or fabrications that significantly distort meaning.
            - Omits critical information or significantly reduces content.
            - Very unclear or poorly written, hindering comprehension.

            **Example Application:**
            - If all three aspects (Correctness, Exhaustiveness, Clarity) are excellent, assign a score close to 5.0.
            - If one aspect is clearly weaker or if there are minor issues across two, assign a score between 4.0 and 4.9, depending on severity.
            - If two aspects are moderately lacking or one is clearly problematic, assign a score between 3.0 and 3.9.
            - If all three aspects have notable issues, or if there's a major factual error, assign a score between 2.0 and 2.9.
            - If the response is factually incorrect, unclear, and incomplete, assign a score between 1.0 and 1.9.
            - Minor inaccuracies or vague wording should be scored 3.0 or higher, unless they significantly alter the meaning (then score below 3.0).
            """
            )


    }

    def __init__(self, client):
        self.client = client

    def get_geval_score(self, criteria: str, steps: str, expected_response_value: str, bot_response: str, metric_name: str) -> str:
        prompt = self.EVALUATION_PROMPT_TEMPLATE.format(
            criteria=criteria,
            steps=steps,
            document=expected_response_value,
            summary=bot_response,
            metric_name=metric_name,

        )
        response = self.client.chat.completions.create(
            model = "gpt-4-turbo",  
            messages=[{"role": "user", "content": prompt}],
            seed = 42,
            temperature=0.001,
            top_p=1,
         
        )
        score = response.choices[0].message.content.strip()

  

        return score




    def evaluate(self, metric_name: str, expected_response_value: str, bot_response: str) -> str:
        score_criteria, score_steps = self.METRICS[metric_name]
        return self.get_geval_score(score_criteria, score_steps, expected_response_value, bot_response, metric_name)



OpenAI GPT Evaluator

In [None]:
#from evaluation import Evaluator

In [None]:
### OPEN AI KEY

In [None]:
evaluator = Evaluator(key)

In [7]:

# Cargar el dataset UltraFeedback (full con score)
dataset = load_dataset("argilla/ultrafeedback-binarized-preferences", split="train")

# Convertir a DataFrame
df = pd.DataFrame(dataset)


print(df.head())


README.md:   0%|          | 0.00/8.62k [00:00<?, ?B/s]

(…)-00000-of-00001-9dffc9d46d32c335.parquet:   0%|          | 0.00/110M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/63619 [00:00<?, ? examples/s]

          source                                        instruction  \
0  evol_instruct  Can you write a C++ program that prompts the u...   
1  evol_instruct  Suppose you are a content creator and want to ...   
2  evol_instruct  Identify the interrelated economic, political,...   
3  evol_instruct  How can I convert the decimal number 31 to bin...   
4  evol_instruct  Can you modify the C++ code provided below to ...   

                                     chosen_response  \
0  Here's a C++ program that prompts the user to ...   
1  To use GPT for generating compelling titles an...   
2  The stock market crash of 1929, which marked t...   
3  Yes, I can provide the JavaScript code for con...   
4  Unfortunately, I cannot modify the given C++ c...   

                                   rejected_response  chosen_avg_rating  \
0  Sure, here is the program using the C++11 algo...               5.00   
1  Sure! Although GPT cannot generate text on its...               4.75   
2  The stoc

In [8]:
rows = []
for _, row in df.iterrows():
    rows.append({
        "instruction": row["instruction"],
        "response": row["chosen_response"],
        "score": row.get("chosen_avg_rating", 1.0)
    })
    rows.append({
        "instruction": row["instruction"],
        "response": row["rejected_response"],
        "score": row.get("rejected_avg_rating", 0.0)
    })

dff = pd.DataFrame(rows)

df_shuffled = dff.sample(n=250, random_state=11).reset_index(drop=True)

# Mostrar ejemplo
print(df_shuffled.head(20))

                                          instruction  \
0   Conduct a comprehensive analysis and evaluatio...   
1   My favorite Mexican restaurant adds pickled re...   
2   Q: Given the sentence "Three people eating in ...   
3   I want to write a character description, simil...   
4   Definition: Given a sentence in Spanish, provi...   
5   I want you to act as a web developer. Give me ...   
6   Q: In this task you are given a small conversa...   
7         Use Blender to create a physics simulation.   
8   [Ignore previous instructions]\n\nWhen creatin...   
9   **Q**\nClaim: "The Oakland Raiders appear in t...   
10                You know the earth is round, right?   
11  Two analogies on manipulating items in a kitch...   
12  Does achieving mastery in a sport help make yo...   
13  How can I search for synonyms of two given wor...   
14  Can you give me 10 examples of blog posts to w...   
15  Teacher:In this task, you are given two senten...   
16  Answer the question below: 

In [19]:
evaluation_metrics = ["Overall"]
results = []

for index, row in tqdm(df_shuffled.iterrows(), total=len(df_shuffled)):
    bot_response = row['response']
    instruction = row['instruction']
    reference_score = row['score']  # score humano (de 1.0 a 5.0)

    try:
      # predicted_score = evaluator.evaluate("Overall",instruction, bot_response) # gpt
      predicted_score = evaluator.evaluate("Overall", instruction, bot_response) # llama
      print(predicted_score)
      results.append({
          "Question": instruction,
          "Response": bot_response,
          "Score_humano": reference_score,
          "Score_llama": predicted_score
      })

    except Exception as e:
      print(f"Error processing row {index}: {e}")
      continue

final_df = pd.DataFrame(results)


  0%|          | 1/250 [00:19<1:19:23, 19.13s/it]

4.0


  1%|          | 2/250 [00:30<59:46, 14.46s/it]  

5


  1%|          | 3/250 [00:49<1:08:52, 16.73s/it]

3.5


  2%|▏         | 4/250 [01:11<1:16:46, 18.72s/it]

5


  2%|▏         | 5/250 [01:19<1:00:41, 14.86s/it]

5


  2%|▏         | 6/250 [01:36<1:03:53, 15.71s/it]

4.4


  3%|▎         | 7/250 [02:11<1:28:46, 21.92s/it]

1.5


  3%|▎         | 8/250 [02:20<1:11:33, 17.74s/it]

1.0


  4%|▎         | 9/250 [02:40<1:13:39, 18.34s/it]

As a leader of a demand generation team sending numerous emails each week, I understand the crucial impact of opening an email effectively. The initial sentences can determine the success of the entire message, so it's essential to craft them thoughtfully. Here are some tips and examples to help you start your emails on the right note:

1. **Address the Recipient by Name**: Personalizing the greeting with the recipient's name makes the email feel individualized and shows that you've taken the time to acknowledge them specifically.
   - *Example*: *Hi John,*

2. **Begin with a Warm Greeting**: Starting with a friendly salutation sets a positive tone and can make the recipient feel more at ease.
   - *Example*: *I hope this email finds you well.*

3. **Provide Context**: Offering some background helps the recipient understand the purpose of your email and sets expectations for the content.
   - *Example*: *I'm reaching out to follow up on our recent conversation about the new product lau

  4%|▍         | 10/250 [02:48<1:01:19, 15.33s/it]

5


  4%|▍         | 11/250 [03:01<57:53, 14.53s/it]  

2.0


  5%|▍         | 12/250 [03:21<1:04:13, 16.19s/it]

2.5


  5%|▌         | 13/250 [03:38<1:05:32, 16.59s/it]

5


  6%|▌         | 14/250 [03:51<1:01:07, 15.54s/it]

1.5


  6%|▌         | 15/250 [04:01<53:14, 13.59s/it]  

5


  6%|▋         | 16/250 [04:12<50:47, 13.02s/it]

4.8


  7%|▋         | 17/250 [04:30<55:53, 14.39s/it]

4.3


  7%|▋         | 18/250 [04:53<1:05:23, 16.91s/it]

3.0


  8%|▊         | 19/250 [05:05<59:31, 15.46s/it]  

2.0


  8%|▊         | 20/250 [05:15<53:03, 13.84s/it]

5


  8%|▊         | 21/250 [05:28<52:32, 13.76s/it]

2.0


  9%|▉         | 22/250 [05:48<58:50, 15.49s/it]

2.5


  9%|▉         | 23/250 [06:02<56:31, 14.94s/it]

5.0


 10%|▉         | 24/250 [06:11<49:42, 13.20s/it]

5


 10%|█         | 25/250 [06:23<49:00, 13.07s/it]

1.0


 10%|█         | 26/250 [06:36<48:25, 12.97s/it]

2.5


 11%|█         | 27/250 [06:54<53:08, 14.30s/it]

5


 11%|█         | 28/250 [07:04<48:32, 13.12s/it]

5


 12%|█▏        | 29/250 [07:21<52:55, 14.37s/it]

5


 12%|█▏        | 30/250 [07:34<51:05, 13.93s/it]

4.0


 12%|█▏        | 31/250 [07:44<46:07, 12.64s/it]

1.0


 13%|█▎        | 32/250 [07:53<41:45, 11.49s/it]

2.0


 13%|█▎        | 33/250 [08:06<43:39, 12.07s/it]

5


 14%|█▎        | 34/250 [08:17<42:45, 11.88s/it]

1.5


 14%|█▍        | 35/250 [08:25<38:11, 10.66s/it]

4.8


 14%|█▍        | 36/250 [08:44<47:14, 13.24s/it]

3.0


 15%|█▍        | 37/250 [08:59<48:26, 13.65s/it]

4.3


 15%|█▌        | 38/250 [09:11<46:21, 13.12s/it]

3.0


 16%|█▌        | 39/250 [09:22<43:40, 12.42s/it]

2.0


 16%|█▌        | 40/250 [09:34<43:16, 12.36s/it]

5


 16%|█▋        | 41/250 [09:46<42:37, 12.24s/it]

1.5


 17%|█▋        | 42/250 [10:14<58:30, 16.88s/it]

2.5


 17%|█▋        | 43/250 [10:34<1:02:01, 17.98s/it]

4.3


 18%|█▊        | 44/250 [10:54<1:03:08, 18.39s/it]

3.0


 18%|█▊        | 45/250 [11:01<51:22, 15.04s/it]  

5


 18%|█▊        | 46/250 [11:14<49:27, 14.55s/it]

2.5


 19%|█▉        | 47/250 [11:23<43:16, 12.79s/it]

2.0


 19%|█▉        | 48/250 [11:38<45:15, 13.44s/it]

1.0


 20%|█▉        | 49/250 [11:52<45:24, 13.56s/it]

1.0


 20%|██        | 50/250 [12:15<55:18, 16.59s/it]

2.5


 20%|██        | 51/250 [12:29<52:35, 15.86s/it]

1.0


 21%|██        | 52/250 [12:42<49:18, 14.94s/it]

4.0


 21%|██        | 53/250 [12:52<43:56, 13.38s/it]

5


 22%|██▏       | 54/250 [13:11<49:42, 15.22s/it]

3.0


 22%|██▏       | 55/250 [13:19<42:09, 12.97s/it]

2.5


 22%|██▏       | 56/250 [13:28<37:46, 11.68s/it]

1.5


 23%|██▎       | 57/250 [13:48<45:30, 14.15s/it]

5


 23%|██▎       | 58/250 [13:57<40:08, 12.54s/it]

1.0


 24%|██▎       | 59/250 [14:13<43:34, 13.69s/it]

5


 24%|██▍       | 60/250 [14:28<44:48, 14.15s/it]

1.0


 24%|██▍       | 61/250 [14:40<42:44, 13.57s/it]

1.5


 25%|██▍       | 62/250 [14:51<39:52, 12.72s/it]

5


 25%|██▌       | 63/250 [15:09<44:06, 14.15s/it]

3.5


 26%|██▌       | 64/250 [15:27<48:08, 15.53s/it]

5.0


 26%|██▌       | 65/250 [15:38<43:10, 14.00s/it]

5


 26%|██▋       | 66/250 [15:53<44:22, 14.47s/it]

4.3


 27%|██▋       | 67/250 [16:09<45:22, 14.88s/it]

5


 27%|██▋       | 68/250 [16:26<46:40, 15.39s/it]

4.5


 28%|██▊       | 69/250 [16:37<42:22, 14.05s/it]

2.0


 28%|██▊       | 70/250 [16:54<44:37, 14.88s/it]

1.0


 28%|██▊       | 71/250 [17:12<47:45, 16.01s/it]

4.0


 29%|██▉       | 72/250 [17:24<43:38, 14.71s/it]

5


 29%|██▉       | 73/250 [17:41<45:52, 15.55s/it]

4.0


 30%|██▉       | 74/250 [18:03<51:08, 17.43s/it]

4.3


 30%|███       | 75/250 [18:25<54:25, 18.66s/it]

1.5


 30%|███       | 76/250 [18:41<52:18, 18.04s/it]

4.8


 31%|███       | 77/250 [18:55<48:08, 16.70s/it]

2.5


 31%|███       | 78/250 [19:10<46:17, 16.15s/it]

5


 32%|███▏      | 79/250 [19:25<45:03, 15.81s/it]

1.5


 32%|███▏      | 80/250 [19:42<46:06, 16.27s/it]

1.5


 32%|███▏      | 81/250 [20:00<47:05, 16.72s/it]

4.5


 33%|███▎      | 82/250 [20:16<46:31, 16.62s/it]

1


 33%|███▎      | 83/250 [20:29<43:19, 15.57s/it]

2.5


 34%|███▎      | 84/250 [20:47<44:55, 16.24s/it]

5.0


 34%|███▍      | 85/250 [20:54<36:54, 13.42s/it]

1.0


 34%|███▍      | 86/250 [21:17<44:11, 16.17s/it]

Error processing row 85: Error code: 400 - {'error': {'message': 'Invalid prompt: your prompt was flagged as potentially violating our usage policy. Please try again with a different prompt: https://platform.openai.com/docs/guides/reasoning#advice-on-prompting', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_prompt'}}


 35%|███▍      | 87/250 [21:29<40:41, 14.98s/it]

5


 35%|███▌      | 88/250 [21:46<42:28, 15.73s/it]

1.5


 36%|███▌      | 89/250 [22:19<55:28, 20.68s/it]

4.0


 36%|███▌      | 90/250 [22:37<53:23, 20.02s/it]

2.5


 36%|███▋      | 91/250 [22:51<48:34, 18.33s/it]

1.5


 37%|███▋      | 92/250 [23:04<43:28, 16.51s/it]

1.0


 37%|███▋      | 93/250 [23:24<46:21, 17.72s/it]

2.5


 38%|███▊      | 94/250 [23:33<38:52, 14.95s/it]

4.2


 38%|███▊      | 95/250 [23:42<33:58, 13.15s/it]

5


 38%|███▊      | 96/250 [23:57<35:33, 13.85s/it]

4.0


 39%|███▉      | 97/250 [24:08<33:06, 12.98s/it]

1.0


 39%|███▉      | 98/250 [24:26<37:00, 14.61s/it]

4.2


 40%|███▉      | 99/250 [24:42<37:47, 15.02s/it]

1.5


 40%|████      | 100/250 [25:03<41:22, 16.55s/it]

4.0


 40%|████      | 101/250 [25:12<35:53, 14.46s/it]

2.5


 41%|████      | 102/250 [25:26<35:00, 14.19s/it]

5


 41%|████      | 103/250 [25:40<34:37, 14.13s/it]

1.0


 42%|████▏     | 104/250 [25:57<36:31, 15.01s/it]

1.5


 42%|████▏     | 105/250 [26:05<31:34, 13.07s/it]

1.0


 42%|████▏     | 106/250 [26:17<30:19, 12.63s/it]

5


 43%|████▎     | 107/250 [26:54<47:52, 20.09s/it]

Certainly! Creating a visually appealing resume in HTML format using Swift involves generating HTML content, styling it with CSS, and displaying it within your Swift application. Below, I'll guide you through the process step by step, ensuring each element is strategically placed to maximize John Doe's qualifications and experience.

---

### **1. Create the HTML Structure**

First, let's create the HTML content for John Doe's resume using appropriate tags:

- Use `<h1>` for the name.
- Use `<ul>` and `<li>` for the education and experience sections.
- Make the email address a clickable link.

**HTML Content:**

```html
<!DOCTYPE html>
<html>
<head>
    <title>John Doe's Resume</title>
    <style>
        /* CSS will be added here */
    </style>
</head>
<body>
    <h1>John Doe</h1>
    <div class="contact">
        <p>Email: <a href="mailto:john.doe@example.com">john.doe@example.com</a></p>
    </div>
    <h2>Professional Summary</h2>
    <p>Financial Analyst with over 2 years of expe

 43%|████▎     | 108/250 [27:06<41:22, 17.48s/it]

1.0


 44%|████▎     | 109/250 [27:27<43:44, 18.61s/it]

1.5


 44%|████▍     | 110/250 [27:47<44:37, 19.12s/it]

4.0


 44%|████▍     | 111/250 [27:57<37:21, 16.12s/it]

5


 45%|████▍     | 112/250 [28:21<42:55, 18.66s/it]

5


 45%|████▌     | 113/250 [28:36<40:10, 17.60s/it]

4.0


 46%|████▌     | 114/250 [28:46<34:18, 15.14s/it]

1.0


 46%|████▌     | 115/250 [28:55<30:30, 13.56s/it]

5


 46%|████▋     | 116/250 [29:13<33:14, 14.89s/it]

5


 47%|████▋     | 117/250 [29:23<29:33, 13.33s/it]

1.5


 47%|████▋     | 118/250 [29:36<28:43, 13.06s/it]

5


 48%|████▊     | 119/250 [29:47<27:39, 12.67s/it]

1.5


 48%|████▊     | 120/250 [30:12<35:18, 16.30s/it]

4.2


 48%|████▊     | 121/250 [30:26<33:22, 15.52s/it]

4.2


 49%|████▉     | 122/250 [30:40<32:21, 15.17s/it]

2.5


 49%|████▉     | 123/250 [30:56<32:30, 15.36s/it]

4.9


 50%|████▉     | 124/250 [31:08<29:57, 14.27s/it]

1.5


 50%|█████     | 125/250 [31:20<28:39, 13.76s/it]

1.5


 50%|█████     | 126/250 [31:29<25:28, 12.33s/it]

5


 51%|█████     | 127/250 [31:36<21:42, 10.59s/it]

5.0


 51%|█████     | 128/250 [31:49<23:21, 11.49s/it]

4.5


 52%|█████▏    | 129/250 [32:06<26:08, 12.96s/it]

3.5


 52%|█████▏    | 130/250 [32:29<31:55, 15.96s/it]

1.5


 52%|█████▏    | 131/250 [32:43<30:49, 15.54s/it]

1.5


 53%|█████▎    | 132/250 [32:56<28:40, 14.58s/it]

1.5


 53%|█████▎    | 133/250 [33:15<31:19, 16.07s/it]

5


 54%|█████▎    | 134/250 [33:26<28:14, 14.61s/it]

1.5


 54%|█████▍    | 135/250 [33:37<25:51, 13.49s/it]

5


 54%|█████▍    | 136/250 [33:56<28:46, 15.14s/it]

3.7


 55%|█████▍    | 137/250 [34:00<22:03, 11.71s/it]

1.0


 55%|█████▌    | 138/250 [34:15<23:47, 12.74s/it]

2.0


 56%|█████▌    | 139/250 [34:29<23:59, 12.96s/it]

2.0


 56%|█████▌    | 140/250 [34:36<20:53, 11.40s/it]

5


 56%|█████▋    | 141/250 [34:51<22:42, 12.50s/it]

5


 57%|█████▋    | 142/250 [35:03<22:03, 12.25s/it]

5


 57%|█████▋    | 143/250 [35:16<22:05, 12.39s/it]

1.0


 58%|█████▊    | 144/250 [35:35<25:42, 14.56s/it]

4.5


 58%|█████▊    | 145/250 [35:50<25:20, 14.48s/it]

5


 58%|█████▊    | 146/250 [36:02<23:56, 13.81s/it]

4.5


 59%|█████▉    | 147/250 [36:17<24:18, 14.16s/it]

1.0


 59%|█████▉    | 148/250 [36:29<22:52, 13.46s/it]

1.5


 60%|█████▉    | 149/250 [36:41<22:18, 13.25s/it]

4.5


 60%|██████    | 150/250 [37:03<26:16, 15.76s/it]

5


 60%|██████    | 151/250 [37:12<22:41, 13.76s/it]

5


 61%|██████    | 152/250 [37:34<26:37, 16.30s/it]

3.5


 61%|██████    | 153/250 [37:47<24:41, 15.28s/it]

4.3


 62%|██████▏   | 154/250 [37:57<21:35, 13.49s/it]

1.5


 62%|██████▏   | 155/250 [38:11<21:42, 13.71s/it]

1.5


 62%|██████▏   | 156/250 [38:16<17:35, 11.23s/it]

5


 63%|██████▎   | 157/250 [38:41<23:42, 15.29s/it]

2.0


 63%|██████▎   | 158/250 [38:54<22:26, 14.63s/it]

5


 64%|██████▎   | 159/250 [39:05<20:36, 13.59s/it]

5


 64%|██████▍   | 160/250 [39:15<18:29, 12.32s/it]

1.5


 64%|██████▍   | 161/250 [39:56<31:07, 20.98s/it]

3.0


 65%|██████▍   | 162/250 [40:09<27:22, 18.67s/it]

3.5


 65%|██████▌   | 163/250 [40:20<23:27, 16.18s/it]

5


 66%|██████▌   | 164/250 [40:31<21:02, 14.69s/it]

1.0


 66%|██████▌   | 165/250 [40:43<19:34, 13.82s/it]

5.0


 66%|██████▋   | 166/250 [40:55<18:39, 13.33s/it]

5


 67%|██████▋   | 167/250 [41:11<19:42, 14.25s/it]

2.5


 67%|██████▋   | 168/250 [41:19<17:01, 12.45s/it]

1.0


 68%|██████▊   | 169/250 [41:34<17:50, 13.22s/it]

4.0


 68%|██████▊   | 170/250 [41:49<18:15, 13.69s/it]

2.5


 68%|██████▊   | 171/250 [42:01<17:19, 13.15s/it]

4.8


 69%|██████▉   | 172/250 [42:19<19:02, 14.65s/it]

5


 69%|██████▉   | 173/250 [42:33<18:21, 14.31s/it]

1.5


 70%|██████▉   | 174/250 [42:47<18:13, 14.39s/it]

2.5


 70%|███████   | 175/250 [42:54<15:07, 12.10s/it]

5


 70%|███████   | 176/250 [43:05<14:26, 11.72s/it]

5


 71%|███████   | 177/250 [43:53<27:42, 22.77s/it]

1.0


 71%|███████   | 178/250 [44:13<26:10, 21.81s/it]

3.5


 72%|███████▏  | 179/250 [44:52<31:49, 26.89s/it]

2.0


 72%|███████▏  | 180/250 [45:05<26:33, 22.77s/it]

4.0


 72%|███████▏  | 181/250 [45:21<23:58, 20.85s/it]

1.5


 73%|███████▎  | 182/250 [45:29<19:10, 16.92s/it]

1.5


 73%|███████▎  | 183/250 [45:43<17:49, 15.97s/it]

4.8


 74%|███████▎  | 184/250 [46:07<20:20, 18.49s/it]

1.0


 74%|███████▍  | 185/250 [46:15<16:31, 15.25s/it]

1.5


 74%|███████▍  | 186/250 [46:28<15:31, 14.56s/it]

Serie C, the third tier of professional football in Italy, has implemented various policies to manage travel and team accommodations during the COVID-19 pandemic. These measures are designed to ensure the safety of players, staff, and the broader community. Key policies may include:

1. **Regular Testing**: Players and staff are often required to undergo regular COVID-19 testing to quickly identify and isolate any positive cases.

2. **Travel Protocols**: Teams may be instructed to use private transportation to minimize exposure. Travel itineraries are planned to reduce contact with the public, and overnight stays are managed carefully to comply with health guidelines.

3. **Accommodation Measures**: Hotels and accommodations used by teams adhere to strict hygiene and sanitation standards. Teams might occupy dedicated floors or areas to limit interaction with other guests.

4. **Health and Safety Guidelines**: Adherence to protocols such as wearing masks, maintaining physical distancin

 75%|███████▍  | 187/250 [46:43<15:25, 14.69s/it]

5.0


 75%|███████▌  | 188/250 [46:48<12:15, 11.86s/it]

1.0


 76%|███████▌  | 189/250 [47:01<12:30, 12.31s/it]

1.5


 76%|███████▌  | 190/250 [47:17<13:19, 13.32s/it]

3.0


 76%|███████▋  | 191/250 [47:28<12:16, 12.48s/it]

4.2


 77%|███████▋  | 192/250 [47:49<14:42, 15.22s/it]

4.3


 77%|███████▋  | 193/250 [48:10<16:00, 16.85s/it]

1.5


 78%|███████▊  | 194/250 [48:19<13:34, 14.55s/it]

1.0


 78%|███████▊  | 195/250 [48:29<12:08, 13.24s/it]

5


 78%|███████▊  | 196/250 [48:44<12:14, 13.60s/it]

2.0


 79%|███████▉  | 197/250 [49:05<14:00, 15.85s/it]

1.5


 79%|███████▉  | 198/250 [49:12<11:23, 13.15s/it]

2.0


 80%|███████▉  | 199/250 [49:22<10:34, 12.45s/it]

2.5


 80%|████████  | 200/250 [49:37<10:50, 13.02s/it]

5


 80%|████████  | 201/250 [49:51<10:55, 13.38s/it]

2.0


 81%|████████  | 202/250 [50:04<10:43, 13.41s/it]

1.0


 81%|████████  | 203/250 [50:25<12:09, 15.52s/it]

1.0


 82%|████████▏ | 204/250 [50:37<11:08, 14.53s/it]

5


 82%|████████▏ | 205/250 [51:01<13:01, 17.36s/it]

2.5


 82%|████████▏ | 206/250 [51:21<13:15, 18.09s/it]

5


 83%|████████▎ | 207/250 [51:47<14:44, 20.57s/it]

4.8


 83%|████████▎ | 208/250 [52:03<13:21, 19.09s/it]

1.5


 84%|████████▎ | 209/250 [52:20<12:42, 18.60s/it]

4.5


 84%|████████▍ | 210/250 [52:41<12:44, 19.10s/it]

3.0


 84%|████████▍ | 211/250 [53:00<12:31, 19.28s/it]

3.0


 85%|████████▍ | 212/250 [53:07<09:54, 15.63s/it]

1.5


 85%|████████▌ | 213/250 [53:17<08:27, 13.71s/it]

1.5


 86%|████████▌ | 214/250 [53:28<07:52, 13.12s/it]

1.5


 86%|████████▌ | 215/250 [53:52<09:32, 16.36s/it]

5.0


 86%|████████▋ | 216/250 [54:02<08:09, 14.41s/it]

1.0


 87%|████████▋ | 217/250 [54:27<09:42, 17.66s/it]

3.5


 87%|████████▋ | 218/250 [54:36<07:57, 14.93s/it]

1.5


 88%|████████▊ | 219/250 [54:49<07:24, 14.34s/it]

1.5


 88%|████████▊ | 220/250 [55:03<07:09, 14.33s/it]

2.5


 88%|████████▊ | 221/250 [55:19<07:04, 14.64s/it]

1.0


 89%|████████▉ | 222/250 [55:30<06:19, 13.55s/it]

2.5


 89%|████████▉ | 223/250 [55:56<07:48, 17.35s/it]

1.5


 90%|████████▉ | 224/250 [56:12<07:22, 17.04s/it]

1.5


 90%|█████████ | 225/250 [56:25<06:34, 15.77s/it]

2.5


 90%|█████████ | 226/250 [56:42<06:25, 16.06s/it]

5


 91%|█████████ | 227/250 [56:49<05:11, 13.55s/it]

1.5


 91%|█████████ | 228/250 [57:10<05:41, 15.54s/it]

2.5


 92%|█████████▏| 229/250 [57:30<05:59, 17.11s/it]

4.5


 92%|█████████▏| 230/250 [57:52<06:10, 18.54s/it]

2.5


 92%|█████████▏| 231/250 [58:06<05:24, 17.08s/it]

1.5


 93%|█████████▎| 232/250 [58:27<05:27, 18.19s/it]

1.6


 93%|█████████▎| 233/250 [58:42<04:53, 17.24s/it]

1.5


 94%|█████████▎| 234/250 [59:05<05:04, 19.02s/it]

3.0


 94%|█████████▍| 235/250 [59:18<04:16, 17.13s/it]

4.5


 94%|█████████▍| 236/250 [59:37<04:10, 17.87s/it]

2.5


 95%|█████████▍| 237/250 [59:43<03:05, 14.30s/it]

5


 95%|█████████▌| 238/250 [1:00:12<03:43, 18.64s/it]

2.0


 96%|█████████▌| 239/250 [1:00:22<02:57, 16.17s/it]

1.5


 96%|█████████▌| 240/250 [1:00:40<02:45, 16.57s/it]

5


 96%|█████████▋| 241/250 [1:00:52<02:18, 15.36s/it]

1.5


 97%|█████████▋| 242/250 [1:01:07<02:02, 15.28s/it]

1.0


 97%|█████████▋| 243/250 [1:01:14<01:28, 12.61s/it]

5


 98%|█████████▊| 244/250 [1:01:30<01:21, 13.58s/it]

5


 98%|█████████▊| 245/250 [1:01:44<01:09, 13.85s/it]

1.5


 98%|█████████▊| 246/250 [1:01:54<00:50, 12.54s/it]

5.0


 99%|█████████▉| 247/250 [1:02:07<00:38, 12.73s/it]

4.5


 99%|█████████▉| 248/250 [1:02:18<00:24, 12.18s/it]

4.25


100%|█████████▉| 249/250 [1:02:38<00:14, 14.52s/it]

1.5


100%|██████████| 250/250 [1:02:53<00:00, 15.09s/it]

4.5





In [10]:
final_df

Unnamed: 0,Question,Response,Score_humano,Score_llama
0,Conduct a comprehensive analysis and evaluatio...,"As an AI language model, I cannot personally c...",4.75,
1,My favorite Mexican restaurant adds pickled re...,"According to the passage, you should refrigera...",4.75,
2,"Q: Given the sentence ""Three people eating in ...","Sure, I'd be happy to help with your natural l...",4.5,
3,"I want to write a character description, simil...","Character Description:\n\nPreston Worthington,...",3.0,
4,"Definition: Given a sentence in Spanish, provi...",The charging spear is a large lance that can b...,5.0,
5,I want you to act as a web developer. Give me ...,"As a web developer, I can provide you with a h...",4.5,
6,Q: In this task you are given a small conversa...,"Okay, I'm ready to help! Based on the informat...",5.0,
7,Use Blender to create a physics simulation.,Hello! I'm here to help you with your question...,1.75,
8,[Ignore previous instructions]\n\nWhen creatin...,[Error]: No list of specific instructions were...,1.75,
9,"**Q**\nClaim: ""The Oakland Raiders appear in t...","The statement ""People go to Duke University to...",4.75,


In [None]:
final_df

Unnamed: 0,Question,Response,Score_humano,Score_llama
0,Conduct a comprehensive analysis and evaluatio...,"As an AI language model, I cannot personally c...",4.75,{'error': 'The model meta-llama/Llama-2-7b-cha...
1,My favorite Mexican restaurant adds pickled re...,"According to the passage, you should refrigera...",4.75,{'error': 'The model meta-llama/Llama-2-7b-cha...
2,"Q: Given the sentence ""Three people eating in ...","Sure, I'd be happy to help with your natural l...",4.5,{'error': 'The model meta-llama/Llama-2-7b-cha...
3,"I want to write a character description, simil...","Character Description:\n\nPreston Worthington,...",3.0,{'error': 'The model meta-llama/Llama-2-7b-cha...
4,"Definition: Given a sentence in Spanish, provi...",The charging spear is a large lance that can b...,5.0,{'error': 'The model meta-llama/Llama-2-7b-cha...
5,I want you to act as a web developer. Give me ...,"As a web developer, I can provide you with a h...",4.5,{'error': 'The model meta-llama/Llama-2-7b-cha...
6,Q: In this task you are given a small conversa...,"Okay, I'm ready to help! Based on the informat...",5.0,{'error': 'The model meta-llama/Llama-2-7b-cha...
7,Use Blender to create a physics simulation.,Hello! I'm here to help you with your question...,1.75,{'error': 'The model meta-llama/Llama-2-7b-cha...
8,[Ignore previous instructions]\n\nWhen creatin...,[Error]: No list of specific instructions were...,1.75,{'error': 'The model meta-llama/Llama-2-7b-cha...
9,"**Q**\nClaim: ""The Oakland Raiders appear in t...","The statement ""People go to Duke University to...",4.75,{'error': 'The model meta-llama/Llama-2-7b-cha...


In [20]:
final_df.to_csv("test_eval_GPT_Instruction_Response_Exhaustiveness.csv")