### Import dataset to dataframe

Note that the dataset might be coming from BigQuery or from a query on the local database (created from the SE Data Dump). The two data sources should be interchangeable in the code.

In [2]:
import json
import os
import pandas as pd
from pathlib import Path

DATASET_FILE = "saved_dataset.csv"      # The file name of the saved dataset (saved on / loaded from local disk)
cwd = Path().absolute()                 # Current working directory (note: possibly different from execution directory)

# Load a saved copy of the dataset from local disk (if it exists)
try:
    dataset_path = os.path.join(cwd, DATASET_FILE)
    results = pd.read_csv(dataset_path)
    results = results.astype({"creation_date": "datetime64[ns]"})
    print("Saved copy of dataset loaded from local disk.")
except FileNotFoundError:
    print("Saved dataset not found!")

Saved copy of dataset loaded from local disk.


### Cull / filter dataset

For the demo we're just arbitrarily culling the size of the dataset to make it more manageable, but you could also filter for other reasons such as focusing on a specific tag, or sampling based on answer and/or question scores.

A pandas dataframe can be sampled either:
* Using a fractional value: e.g., ``.sample(frac=0.01)`` will result in a number of samples equivalent to 1% of the dataset.
* Using an integer value: e.g., ``.sample(n=1000)`` will result in 1000 samples from the dataset.

In [3]:
# Randomly sample dataset
fd_tiny = results.sample(frac=0.0001)
fd_nano = results.sample(n=10)
fd_1k   = results.sample(n=1000)
fd_10k  = results.sample(n=10000)
fd_100k = results.sample(n=100000)

# Convenience: alias the filtered data so we can change it easily for later code
wd = fd_nano

# Dump info about the filtered result
print("Number of questions in currently selected filtered dataset:", len(wd))
wd.head()

Number of questions in currently selected filtered dataset: 10


Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer
166728,70703290,change event handler in angular not reading op...,<p>I am trying to get the onPageSizeUpdate fun...,70703621,96,<angular><typescript><event-handling>,2,0,2022-01-13 21:25:46,0,<p>Instead of</p>&#xA;<pre><code>onPageSizeUpd...
113902,70287406,how to replace all accented characters with En...,<p>Hi in my aura component below code is used ...,70288180,3194,<javascript><aura.js>,2,1,2021-12-09 09:23:19,6,"<pre class=""lang-js prettyprint-override""><cod..."
320689,71940097,Why are React defaultProps not passing values?,"<pre class=""lang-js prettyprint-override""><cod...",71940260,473,<css><reactjs><react-memo>,1,0,2022-04-20 12:58:22,0,<pre><code>Button.defaultProps = {&#xA; size:...
621929,74478463,Deploy sql workflow with DBX,<p>I am developing deployment via DBX to Azure...,74488928,315,<sql><databricks><databricks-dbx>,2,0,2022-11-17 15:56:55,1,<p>There are various ways to do that.</p>&#xA;...
275583,71571109,How to write multiple excel files with multipl...,<p>I would like to split a data frame in order...,71623188,893,<r><tidyverse><purrr>,1,0,2022-03-22 11:13:23,1,"<p>In your example, you are using the same fil..."


### Strip HTML from filtered dataset

We decided we would strip HTML and use this "stripped" version as our default for evaluations. The stripped text is appended as a separate column in our dataframe.

In [4]:
from bs4 import BeautifulSoup as soup

# Separate out the text columns we want for convenience
titles  = wd["title"]
bodies  = wd["body"]
answers = wd["stackoverflow_answer"]

# Sanity check (surely this will always be true, but *just in case*)
if len(titles) == len(bodies) and len(titles) == len(answers):
    pass
else:
    raise ValueError("columns are different lengths!")

# Create new lists to store HTML-stripped versions of text
s_titles  = []
s_bodies  = []
s_answers = []

# Iterating is slow(!) but comparatively easy to understand (modified Harvey approach)
for idx, *row in wd.itertuples():
    
    # Strip HTML from title, question, and answer; save to lists
    s_titles.append(soup(titles[idx], "html.parser").get_text())
    s_bodies.append(soup(bodies[idx], "html.parser").get_text())
    s_answers.append(soup(answers[idx], "html.parser").get_text())

# Add the populated lists into our dataframe
wd["stripped_title"] = s_titles
wd["stripped_body"] = s_bodies
wd["stripped_stackoverflow_answer"] = s_answers

### Install OpenAI library and Configure OpenAI API Key

Currently configured using secrets.json located at the root directory. An alternative method (which would require code changes) would be to read the system's environment variable.

Key can be generated from: https://platform.openai.com/account/api-keys

In [5]:
# Install OpenAI
%pip install openai
import openai

# Function to load OpenAI API key from file
# https://stackoverflow.com/a/76148268
def load_api_key(secrets_file="secrets.json"):
    with open(secrets_file) as f:
        secrets = json.load(f)
    return secrets["OPENAI_API_KEY"]

# Read and set our OpenAI API key
api_key = load_api_key()
openai.api_key = api_key

Note: you may need to restart the kernel to use updated packages.


### Split dataframe into chunks for batching

In [7]:
import numpy as np

# Split into n parts (in our case 10) in a list
wd_list = np.array_split(wd, 10)

# Check the result of the first chunk
wd_list[0].head()

Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer
166728,70703290,change event handler in angular not reading op...,<p>I am trying to get the onPageSizeUpdate fun...,70703621,96,<angular><typescript><event-handling>,2,0,2022-01-13 21:25:46,0,<p>Instead of</p>&#xA;<pre><code>onPageSizeUpd...,change event handler in angular not reading op...,I am trying to get the onPageSizeUpdate functi...,Instead of\nonPageSizeUpdate($any($event.targe...


### Get GPT answers to SO questions

In [8]:
# Used for rate limit handling with OpenAI API
%pip install tenacity

Note: you may need to restart the kernel to use updated packages.


In [9]:
# Used to estimate token counts
%pip install tiktoken

Note: you may need to restart the kernel to use updated packages.


In [10]:
import tiktoken
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff

MODEL_NAME = "gpt-4"
ENCODING = tiktoken.encoding_for_model(MODEL_NAME)

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    response = openai.ChatCompletion.create(**kwargs)
    return response

def chat_format(question):
    """Insert the full prompt into chat format."""
    messages = [
        {"role": "user", "content": question},
    ]
    return messages

skipped = 0

for chunk in wd_list:
    GPT_answers = []
    GPT_finished = []
    full_responses = []

    for idx, row in chunk.iterrows():

        title_and_question = row["stripped_title"] + "\n\n" + row["stripped_body"]
        SO_text = title_and_question + row["stripped_stackoverflow_answer"]

        # Estimate tokens for SO T+Q+A, and skip anything too long
        if len(ENCODING.encode(SO_text)) < 4000:

            # Get the response from GPT
            prompt = chat_format(title_and_question)
            GPT_answer = completion_with_backoff(model=MODEL_NAME, messages=prompt, temperature=0, max_tokens=2000)
            extracted_answer = GPT_answer.choices[0].message.content

            # Check if the GPT response completed or terminated early (because e.g. hit token limit)
            if GPT_answer.choices[0].finish_reason == "stop":
                finished = True
            else:
                finished = False

        else:
            skipped += 1
            extracted_answer = None
            GPT_answer = None
            finished = False

        # Add to our lists
        GPT_answers.append(extracted_answer)
        GPT_finished.append(finished)
        full_responses.append(GPT_answer)

    # Add answers back into the chunk dataframe
    COL_NAME = f"{MODEL_NAME}_answer"
    chunk[COL_NAME] = GPT_answers
    chunk["GPT_finished"] = GPT_finished
    chunk["full_GPT_response"] = full_responses

# Note how many questions have been skipped for length reasons
print(skipped, "entries were skipped due to token length.")

0


In [11]:
# Check result of first chunk
wd_list[0].head()

Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer,gpt-4_answer,GPT_finished,full_GPT_response
166728,70703290,change event handler in angular not reading op...,<p>I am trying to get the onPageSizeUpdate fun...,70703621,96,<angular><typescript><event-handling>,2,0,2022-01-13 21:25:46,0,<p>Instead of</p>&#xA;<pre><code>onPageSizeUpd...,change event handler in angular not reading op...,I am trying to get the onPageSizeUpdate functi...,Instead of\nonPageSizeUpdate($any($event.targe...,The issue is that you are not setting the valu...,True,{'id': 'chatcmpl-83JxCm9GK225VdHnBpjhwEUovO6dc...


In [12]:
# Check result of last chun
wd_list[9].head()

Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer,gpt-4_answer,GPT_finished,full_GPT_response
38630,69700210,How to display output in rows of five numbers?,<p>I'm new to programming and I have to displa...,69700396,315,<c++><loops><vector><counter><primes>,1,2,2021-10-24 19:45:49,3,<p>You are seriously over-complicating your ou...,How to display output in rows of five numbers?,I'm new to programming and I have to display a...,You are seriously over-complicating your outpu...,Your code is almost correct. You are using the...,True,{'id': 'chatcmpl-83K22BAwXPoudciLqs7Fr5trhVbOF...


### Prepare JSONL file for evals

In [13]:
# Strip and save a copy of the GPT answer
#   (so that the eval is on fair footing, with HTML tags removed from both human and AI)

for index, chunk in enumerate(wd_list):

    # Separate out the text columns we want for convenience
    titles  = chunk["title"]
    bodies  = chunk["body"]
    answers = chunk["stackoverflow_answer"]

    # Create a new list to store HTML-stripped versions of text
    s_gpt_answers  = []

    # Iterating is slow(!) but comparatively easy to understand (modified Harvey approach)
    for idx, *row in chunk.itertuples():
        
        # Strip HTML from GPT's answer; save to list
        s_gpt_answers.append(soup(chunk[COL_NAME][idx], "html.parser").get_text())

    # Add the populated lists into our dataframe
    stripped_col_name = "stripped_" + COL_NAME
    chunk[stripped_col_name] = s_gpt_answers

    # Preview result
    #chunk.head()

In [14]:
# Check first chunk
wd_list[0].head()

Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer,gpt-4_answer,GPT_finished,full_GPT_response,stripped_gpt-4_answer
166728,70703290,change event handler in angular not reading op...,<p>I am trying to get the onPageSizeUpdate fun...,70703621,96,<angular><typescript><event-handling>,2,0,2022-01-13 21:25:46,0,<p>Instead of</p>&#xA;<pre><code>onPageSizeUpd...,change event handler in angular not reading op...,I am trying to get the onPageSizeUpdate functi...,Instead of\nonPageSizeUpdate($any($event.targe...,The issue is that you are not setting the valu...,True,{'id': 'chatcmpl-83JxCm9GK225VdHnBpjhwEUovO6dc...,The issue is that you are not setting the valu...


In [15]:
# Check last chunk
wd_list[9].head()

Unnamed: 0,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,creation_date,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer,gpt-4_answer,GPT_finished,full_GPT_response,stripped_gpt-4_answer
38630,69700210,How to display output in rows of five numbers?,<p>I'm new to programming and I have to displa...,69700396,315,<c++><loops><vector><counter><primes>,1,2,2021-10-24 19:45:49,3,<p>You are seriously over-complicating your ou...,How to display output in rows of five numbers?,I'm new to programming and I have to display a...,You are seriously over-complicating your outpu...,Your code is almost correct. You are using the...,True,{'id': 'chatcmpl-83K22BAwXPoudciLqs7Fr5trhVbOF...,Your code is almost correct. You are using the...


In [22]:
for index, chunk in enumerate(wd_list):

    json_list = []
    for idx, row in chunk.iterrows():
        
        # Skip if the GPT answer is either unfinished or not present (e.g. token limit)
        #   Note: this has never been explicitly tested because none of my samples hit these conditions
        if (row[col_name] == None) or (row["GPT_finished"] == False):
            continue

        # Strip and chat-format the text as appropriate
        title_and_question = row["stripped_title"] + "\n\n" + row["stripped_body"]
        prompt = chat_format(title_and_question)

        # Create the JSON object using the stripped and chat-formatted text
        json_object = {
            "input": prompt,
            "ideal": row["stripped_stackoverflow_answer"],
            "completion": row[stripped_col_name]
        }

        # Add the object to our list
        json_list.append(json.dumps(json_object))

    # because re-running the notebook changes the sampling we DON'T want a persistent jsonl file
    #   this step relies on the directories existing; they are not created here
    #   so this will fail with e.g. FileNotFoundError if they don't exist
    JSONL_FILENAME = f"new_samples_{index}.jsonl"
    JSONL_FILEPATH = os.path.join(cwd, "eval_samples", JSONL_FILENAME)
    with open(JSONL_FILEPATH, "w") as outfile:
        outfile.write("\n".join(json_list))
        print("Saved JSONL file: " + JSONL_FILEPATH)

Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_0.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_1.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_2.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_3.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_4.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_5.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_6.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_7.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_8.jsonl
Saved JSONL file: c:\Users\Mark\Documents\A2I2 T2 2023\eval_samples\new_samples_9.jsonl


### Install OpenAI evals

As of writing, the version of evals on pip is wildly different from the version of evals available on GitHub (and even worse, they share the same version tag despite this). As such, even though we're not making our own eval we want the current version

In [19]:
import shutil

# openai evals uses git-lfs, but installation may be system-specific
#!git lfs install

# get a local copy of evals (and if one already exists, nuke it first)
try:
    !rm -rf evals
finally:
    !git clone https://github.com/MHLoppy/evals.git

# complete the remaining setup steps
!cd evals
!git lfs fetch --all
!git lfs pull
%pip install -e evals

Cloning into 'evals'...
Updating files:  41% (490/1194)
Updating files:  42% (502/1194)
Updating files:  43% (514/1194)
Updating files:  44% (526/1194)
Updating files:  45% (538/1194)
Updating files:  46% (550/1194)
Updating files:  47% (562/1194)
Updating files:  48% (574/1194)
Updating files:  49% (586/1194)
Updating files:  50% (597/1194)
Updating files:  51% (609/1194)
Updating files:  52% (621/1194)
Updating files:  53% (633/1194)
Updating files:  54% (645/1194)
Updating files:  55% (657/1194)
Updating files:  56% (669/1194)
Updating files:  57% (681/1194)
Updating files:  58% (693/1194)
Updating files:  59% (705/1194)
Updating files:  60% (717/1194)
Updating files:  61% (729/1194)
Updating files:  62% (741/1194)
Updating files:  63% (753/1194)
Updating files:  64% (765/1194)
Updating files:  65% (777/1194)
Updating files:  66% (789/1194)
Updating files:  67% (800/1194)
Updating files:  68% (812/1194)
Updating files:  69% (824/1194)
Updating files:  70% (836/1194)
Updating files: 

fetch: Fetching all references...
Obtaining file:///C:/Users/Mark/Documents/A2I2%20T2%202023/evals
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Checking if build backend supports build_editable: started
  Checking if build backend supports build_editable: finished with status 'done'
  Getting requirements to build editable: started
  Getting requirements to build editable: finished with status 'done'
  Preparing editable metadata (pyproject.toml): started
  Preparing editable metadata (pyproject.toml): finished with status 'done'
Collecting spacy-universal-sentence-encoder (from evals==1.0.3.post1)
  Downloading spacy_universal_sentence_encoder-0.4.6.tar.gz (15 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting jiwer (from evals==1.0.3.post1)
  Obtaining dependency information for jiwer from https://files.pythonhosted.org/packages/0d/4f/ee537ab20144811dd99321735f

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Mark\\anaconda3\\envs\\testenv\\Lib\\site-packages\\~yarrow\\arrow.dll'
Consider using the `--user` option or check the permissions.



### Use/Run evals

(As an aside, while using evals without both magic commands and manual file creation is possible (see https://medium.com/@sergioli/evaluating-chatgpt-using-openai-evals-7ca85c0ad139), it's comparatively more complex.)

In [23]:
# Note how the lists here are appended with an extra set of quotes
#   this is being done because we're running shell commands that need quotes

# Construct list of sample file paths
sample_paths = []
for index, chunk in enumerate(wd_list):
    sample_path = os.path.join(cwd, "eval_samples", f"new_samples_{index}.jsonl")
    sample_paths.append(f"{sample_path}")

# Construct list of record file paths
record_paths = []
for index, chunk in enumerate(wd_list):
    record_path = os.path.join(cwd, "eval_records", f"eval_record_{index}.jsonl")
    record_paths.append(f'\"{record_path}\"')

# Construct list of log file paths
log_paths = []
for index, chunk in enumerate(wd_list):
    log_path = os.path.join(cwd, "eval_logs", f"eval_log_{index}.jsonl")
    log_paths.append(f'\"{log_path}\"')

In [24]:
import shutil

samples_file_path = os.path.join(cwd, "evals", "evals", "registry", "data", "coqa", "samples.jsonl")

# Run chunked evals
for index, chunk in enumerate(wd_list):

    # Update the samples file programmatically each iteration
    shutil.copy(sample_paths[index], samples_file_path)

    # Run the evaluation and save the results (records) and log file as specified
    record = record_paths[index]
    log = log_paths[index]
    !oaieval gpt-4 coqa-fact --record_path $record --log_to_file $log


  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:10<00:00, 10.39s/it]
100%|██████████| 1/1 [00:10<00:00, 10.39s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:14<00:00, 14.67s/it]
100%|██████████| 1/1 [00:14<00:00, 14.67s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:15<00:00, 15.42s/it]
100%|██████████| 1/1 [00:15<00:00, 15.42s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:33<00:00, 33.68s/it]
100%|██████████| 1/1 [00:33<00:00, 33.68s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:27<00:00, 27.12s/it]
100%|██████████| 1/1 [00:27<00:00, 27.12s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:20<00:00, 20.44s/it]
100%|██████████| 1/1 [00:20<00:00, 20.44s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:29<00:00, 29.80s/it]
100%|██████████| 1/1 [00:29<00:00, 29.80s/it]

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:35<00:00, 35.36s/it

In [70]:
# obed's answer reading code
#   (because by his evaluation, the metric disagrees with the written response ~10% of the time)
import re
def get_answer_from_response(text):
    """Parses the output text for the evaluation choice."""
    
    last_letter = text[-1]
    if last_letter not in ['A', 'B', 'C', 'D', 'E']:
        matches = re.findall('\((.*?)\)', text)
        return matches[-1] if matches else None      
    return last_letter


# Iterate through our evals results
for index, chunk in enumerate(wd_list):

    # Update our record path each iteration
    record_path = record_paths[index]

    # Empty lists for us to store stuff in, to later add to the chunk's dataframe
    answer_list = []
    provided_answer_list = []
    sampled_list = []
    metric_list = []

    # Open the record (.jsonl) for this iteration
    with open(record_path) as f:
        # Skip the first two lines (they're just metadata about the query and response)
        for _ in range(2):
            next(f)

        # Iterate through the rest of the file line-by-line
        for line in f:
            eval_line = json.loads(line)

            # process the "sampling" half of each evaluation response
            if eval_line["type"] == "sampling":

                # Extract the response from the answer received
                answer = eval_line["data"]["sampled"][0]
                extr_choice = get_answer_from_response(answer)

                # Add the extracted response and the raw response to our lists
                answer_list.append(extr_choice)
                sampled_list.append(eval_line)

            # Process the "metric" half of each evaluation response
            elif eval_line["type"] == "metric":

                # Also pull evals' self-reported response
                og_choice = eval_line["data"]["choice"]

                # Add that and the raw response to our lists
                provided_answer_list.append(og_choice)
                metric_list.append(eval_line)

    # Add the populated lists into our chunk's dataframe
    chunk["original_eval_choice"] = provided_answer_list
    chunk["extracted_eval_choice"] = answer_list
    chunk["eval_full_sampled"] = sampled_list
    chunk["eval_full_metric"] = metric_list

Unnamed: 0,index,id,title,body,accepted_answer_id,view_count,tags,answer_count,question_score,answer_score,stackoverflow_answer,stripped_title,stripped_body,stripped_stackoverflow_answer,gpt-3.5-turbo-16k_answer,stripped_gpt-3.5-turbo-16k_answer,eval_choice
0,196461,71067715,CRA error. Html Webpack Plugin: Error: Child c...,<p>I create a default react app with <code>npx...,71086154,221,node.js|reactjs|webpack|create-react-app,1,0,0,<p>Solution: install <strong>react-scripts@4.0...,CRA error. Html Webpack Plugin: Error: Child c...,I create a default react app with npx create-r...,Solution: install react-scripts@4.0.3. If you ...,It seems like you are encountering an error re...,It seems like you are encountering an error re...,A
1,243249,71098325,Hyperledger Fabric | Orderer PODs keeps restar...,<p>I'm running Hyperledger Fabric network in A...,71168925,239,azure|kubernetes|hyperledger-fabric|raft,1,0,0,<p>Turns out my WAL logs directory was deleted...,Hyperledger Fabric | Orderer PODs keeps restar...,I'm running Hyperledger Fabric network in Azur...,Turns out my WAL logs directory was deleted. A...,The error message suggests that the raft log o...,The error message suggests that the raft log o...,C
2,375295,73155785,Grouping character strings,<p>I have 3 text :</p>\n<ul>\n<li>Simple test ...,73155978,21,php|regex,1,0,2,"<p>If you want to match the square brackets, y...",Grouping character strings,I have 3 text :\n\nSimple test 1 [https://www....,"If you want to match the square brackets, you ...","To achieve the desired grouping, you can use t...","To achieve the desired grouping, you can use t...",A
3,254120,71900912,Convert array from spreadsheet into associativ...,<p>I've been having difficulty visualizing how...,71901148,96,php|arrays|multidimensional-array,1,0,1,<p>This is a common issue with spreadsheets an...,Convert array from spreadsheet into associativ...,I've been having difficulty visualizing how to...,This is a common issue with spreadsheets and C...,"Yes, the `array_combine` function can be used ...","Yes, the `array_combine` function can be used ...",A
4,361337,71269489,ID components may not include unresolved token...,<p>I am trying to use a CfnParameter in the AW...,71313692,1213,python|amazon-web-services|amazon-cloudformati...,1,0,1,<p>Managed to resolve by using --context inste...,ID components may not include unresolved token...,I am trying to use a CfnParameter in the AWS P...,Managed to resolve by using --context instead ...,The error message suggests that you are trying...,The error message suggests that you are trying...,B


In [None]:
# Combine each of the chunked dataframes, export to CSV
combined = pd.concat(wd_list)
combined.to_csv("dataset_results.csv")

# Preview the result
combined.head()

Extra bit for estimating the token count of the evals prompt (since this will count against our token limit!).

In [1]:
# Estimate token count of evals prompt
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4")

text = "\n\nCompare the factual content of the submitted answer with the expert answer. Ignore any differences in style, grammar, or punctuation.\nThe submitted answer may either be a subset or superset of the expert answer, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:\n(A) The submitted answer is a subset of the expert answer and is fully consistent with it.\n(B) The submitted answer is a superset of the expert answer and is fully consistent with it.\n(C) The submitted answer contains all the same details as the expert answer.\n(D) There is a disagreement between the submitted answer and the expert answer.\n(E) The answers differ, but these differences don't matter from the perspective of factuality.\n\nFirst, write out in a step by step manner your reasoning to be sure that your conclusion is correct. Avoid simply stating the correct answer at the outset. Then print only a single choice from \"A\" or \"B\" or \"C\" or \"D\" or \"E\" (without quotes or punctuation) on its own line corresponding to the correct answer. At the end, repeat just the answer by itself on a new line.\n\nReasoning:"

tokens = len(encoding.encode(text, disallowed_special=()))

print(tokens)

243
