## Generating Evaluation data
We use the original documents as the ground truth and ask an LLM to generate a couple questions, pretending that it is a student, so that we can have a good amount of evaluation data.

In [76]:
# import libraries
import json
import os
from dotenv import load_dotenv
from openai import OpenAI
from tqdm.auto import tqdm
import pickle
import pandas as pd

In [44]:
# setup API key
load_dotenv('.envrc') 
openai_api_key = os.getenv('OPENAI_API_KEY')

# start an openAI client
client = OpenAI()

In [45]:
# load the cleaned up json file
with open('cleaned_Data.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

# add the actual course (only one is ASU online) to the question-level info
documents = []

for id, doc in enumerate(docs_raw['documents']):
    doc['id'] = id #set up a unique id
    doc['course'] = docs_raw['course']
    documents.append(doc)

In [46]:
documents[0]

{'text': 'ASU Online credits are no different than credits earned at our campuses and transcripts do not distinguish between online or on-campus courses. When transferring credits, it is always up to the receiving institution to determine transfer eligibility and how transfer credits may apply to a specific program of interest.',
 'section': 'ASU email basics',
 'question': 'Are all ASU Online credits transferable to other four-year universities?',
 'id': 0,
 'course': 'ASU Online'}

In [47]:
prompt_template = """
You emulate a prospective student interested in ASU online.
Formulate 5 questions this student might ask based on a FAQ record. The record
should contain the answer to the questions, and the questions should be complete and not too short.
If possible, use as few words as possible from the record. 

The record:

section: {section}
question: {question}
answer: {text}

Provide the output in parsable JSON without using code blocks:

["question1","question2","question3","question4","question5"]
""".strip()

In [48]:
# try an example prompt
prompt = prompt_template.format(**documents[0])
print(prompt)

You emulate a prospective student interested in ASU online.
Formulate 5 questions this student might ask based on a FAQ record. The record
should contain the answer to the questions, and the questions should be complete and not too short.
If possible, use as few words as possible from the record. 

The record:

section: ASU email basics
question: Are all ASU Online credits transferable to other four-year universities?
answer: ASU Online credits are no different than credits earned at our campuses and transcripts do not distinguish between online or on-campus courses. When transferring credits, it is always up to the receiving institution to determine transfer eligibility and how transfer credits may apply to a specific program of interest.

Provide the output in parsable JSON without using code blocks:

["question1","question2","question3","question4","question5"]


In [49]:
def generate_questions(doc):
    ''' 
    This function generates 5 questions given the prompt format set up above and the entire documents
    '''
    prompt = prompt_template.format(**doc)

    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )

    json_response = response.choices[0].message.content
    return json_response

In [50]:
results = {}

In [51]:
for doc in tqdm(documents): 
    doc_id = doc['id']
    if doc_id in results:
        continue

    questions = generate_questions(doc)
    results[doc_id] = questions

  0%|          | 0/53 [00:00<?, ?it/s]

In [69]:
results[0]

'["Are the credits earned through ASU Online the same as those from on-campus classes?","How do other four-year universities view ASU Online credits when considering transfers?","Do transcripts show whether courses were taken online or on-campus at ASU?","Who decides if ASU Online credits can be transferred to a different institution?","Can I find out how my ASU Online credits apply to a specific program when transferring?"]'

In [70]:
# do some slight cleanup because ChatGPT did a bit dirty responses
cleaned_parsed_results = {}

for doc_id, result in results.items():
    questions = ' '.join(result.split())
    cleaned_parsed_results[doc_id] = json.loads(questions)

In [71]:
cleaned_parsed_results

{0: ['Are the credits earned through ASU Online the same as those from on-campus classes?',
  'How do other four-year universities view ASU Online credits when considering transfers?',
  'Do transcripts show whether courses were taken online or on-campus at ASU?',
  'Who decides if ASU Online credits can be transferred to a different institution?',
  'Can I find out how my ASU Online credits apply to a specific program when transferring?'],
 1: ['Is it possible to complete my program on a part-time basis?',
  'What is the minimum number of credit hours required each semester?',
  'Are there any financial aid implications for enrolling part-time?',
  'How can I find out more about maintaining my financial aid while studying part-time?',
  'Who can I contact for assistance with my academic schedule and program duration?'],
 2: ['What additional steps are involved in the ASU Online admission process besides submitting an application?',
  'Is there a specific fee that needs to be paid for 

In [72]:
# convert to final results
doc_index = {d['id']: d for d in documents}

final_results = []

for doc_id, questions in cleaned_parsed_results.items():
    course = doc_index[doc_id]['course']
    for q in questions:
        final_results.append((doc_id,course,q))


In [75]:
final_results[22]

(4,
 'ASU Online',
 'Are there any specific guidelines for participating in graduation ceremonies at ASU?')

In [77]:
# convert to pandas df
df = pd.DataFrame(final_results, columns=['id', 'course', 'question'])


In [78]:
df

Unnamed: 0,id,course,question
0,0,ASU Online,Are the credits earned through ASU Online the ...
1,0,ASU Online,How do other four-year universities view ASU O...
2,0,ASU Online,Do transcripts show whether courses were taken...
3,0,ASU Online,Who decides if ASU Online credits can be trans...
4,0,ASU Online,Can I find out how my ASU Online credits apply...
...,...,...,...
260,52,ASU Online,What resources does ASU Online provide for aca...
261,52,ASU Online,Is there a specific process to locate my advisor?
262,52,ASU Online,How can I contact my advisor once I find them?
263,52,ASU Online,What support services are available for online...


In [79]:
# save the df as ground truth csv
df.to_csv('ground_truth_data.csv', index=False)

In [80]:
!head ground_truth_data.csv

id,course,question
0,ASU Online,Are the credits earned through ASU Online the same as those from on-campus classes?
0,ASU Online,How do other four-year universities view ASU Online credits when considering transfers?
0,ASU Online,Do transcripts show whether courses were taken online or on-campus at ASU?
0,ASU Online,Who decides if ASU Online credits can be transferred to a different institution?
0,ASU Online,Can I find out how my ASU Online credits apply to a specific program when transferring?
1,ASU Online,Is it possible to complete my program on a part-time basis?
1,ASU Online,What is the minimum number of credit hours required each semester?
1,ASU Online,Are there any financial aid implications for enrolling part-time?
1,ASU Online,How can I find out more about maintaining my financial aid while studying part-time?
