## Generating Evaluation data
We use the original documents as the ground truth and ask an LLM to generate a couple questions, pretending that it is a student, so that we can have a good amount of evaluation data.

In [6]:
# import libraries
import json
import os
from dotenv import load_dotenv
from openai import OpenAI
from tqdm.auto import tqdm
import pickle
import pandas as pd

In [7]:
# setup API key
load_dotenv('../.envrc') 
openai_api_key = os.getenv('OPENAI_API_KEY')

# start an openAI client
client = OpenAI()

In [8]:
# load the cleaned up json file
with open('../data/cleaned_Data.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

# add the actual course (only one is ASU online) to the question-level info
documents = []

for id, doc in enumerate(docs_raw['documents']):
    doc['id'] = id #set up a unique id
    doc['course'] = docs_raw['course']
    documents.append(doc)

In [9]:
documents[0]

{'text': 'ASU Online credits are no different than credits earned at our campuses and transcripts do not distinguish between online or on-campus courses. When transferring credits, it is always up to the receiving institution to determine transfer eligibility and how transfer credits may apply to a specific program of interest.',
 'section': 'ASU email basics',
 'question': 'Are all ASU Online credits transferable to other four-year universities?',
 'id': 0,
 'course': 'ASU Online'}

In [10]:
prompt_template = """
You emulate a prospective student interested in ASU online.
Formulate 5 questions this student might ask based on a FAQ record. The record
should contain the answer to the questions, and the questions should be complete and not too short.
If possible, use as few words as possible from the record. 

The record:

section: {section}
question: {question}
answer: {text}

Provide the output in parsable JSON without using code blocks:

["question1","question2","question3","question4","question5"]
""".strip()

In [11]:
# try an example prompt
prompt = prompt_template.format(**documents[0])
print(prompt)

You emulate a prospective student interested in ASU online.
Formulate 5 questions this student might ask based on a FAQ record. The record
should contain the answer to the questions, and the questions should be complete and not too short.
If possible, use as few words as possible from the record. 

The record:

section: ASU email basics
question: Are all ASU Online credits transferable to other four-year universities?
answer: ASU Online credits are no different than credits earned at our campuses and transcripts do not distinguish between online or on-campus courses. When transferring credits, it is always up to the receiving institution to determine transfer eligibility and how transfer credits may apply to a specific program of interest.

Provide the output in parsable JSON without using code blocks:

["question1","question2","question3","question4","question5"]


In [12]:
def generate_questions(doc):
    ''' 
    This function generates 5 questions given the prompt format set up above and the entire documents
    '''
    prompt = prompt_template.format(**doc)

    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{"role": "user", "content": prompt}]
    )

    json_response = response.choices[0].message.content
    return json_response

In [13]:
results = {}

In [14]:
for doc in tqdm(documents): 
    doc_id = doc['id']
    if doc_id in results:
        continue

    questions = generate_questions(doc)
    results[doc_id] = questions

  0%|          | 0/53 [00:00<?, ?it/s]

In [15]:
results[0]

'["Can I transfer ASU Online credits to other universities?","How does ASU Online credit transfer work?","Are ASU Online courses treated differently from campus courses?","Will the receiving institution accept my ASU Online credits?","Do ASU transcripts show online or on-campus course distinction?"]'

In [16]:
# do some slight cleanup because ChatGPT did a bit dirty responses
cleaned_parsed_results = {}

for doc_id, result in results.items():
    questions = ' '.join(result.split())
    cleaned_parsed_results[doc_id] = json.loads(questions)

In [17]:
cleaned_parsed_results

{0: ['Can I transfer ASU Online credits to other universities?',
  'How does ASU Online credit transfer work?',
  'Are ASU Online courses treated differently from campus courses?',
  'Will the receiving institution accept my ASU Online credits?',
  'Do ASU transcripts show online or on-campus course distinction?'],
 1: ['Can I pursue my degree on a part-time basis, and are there specific credit limits per semester?',
  'What are the implications of enrolling part-time on my financial aid and scholarship eligibility?',
  'If I want to study part-time, how might that affect the duration of my degree completion?',
  'Are there any loans or grants that require a specific number of credits to receive assistance?',
  'Who can I contact for more details about financial aid options related to part-time enrollment?'],
 2: ['What materials are necessary to complete the ASU Online admission process?',
  'Is an application fee required for ASU Online admission?',
  "Do I need to submit my high sch

In [18]:
# convert to final results
doc_index = {d['id']: d for d in documents}

final_results = []

for doc_id, questions in cleaned_parsed_results.items():
    course = doc_index[doc_id]['course']
    for q in questions:
        final_results.append((doc_id,course,q))


In [19]:
final_results[22]

(4,
 'ASU Online',
 'Where can I find details about graduation ceremonies for ASU Online?')

In [20]:
# convert to pandas df
df = pd.DataFrame(final_results, columns=['id', 'course', 'question'])


In [21]:
df

Unnamed: 0,id,course,question
0,0,ASU Online,Can I transfer ASU Online credits to other uni...
1,0,ASU Online,How does ASU Online credit transfer work?
2,0,ASU Online,Are ASU Online courses treated differently fro...
3,0,ASU Online,Will the receiving institution accept my ASU O...
4,0,ASU Online,Do ASU transcripts show online or on-campus co...
...,...,...,...
260,52,ASU Online,What steps should I follow to locate my academ...
261,52,ASU Online,Where can I find the contact information for m...
262,52,ASU Online,Is it necessary to log in with my ASURITE ID a...
263,52,ASU Online,What information will I see about my advisor o...


In [22]:
# save the df as ground truth csv
df.to_csv('../data/ground_truth_data.csv', index=False)

In [25]:
!tail ../data/ground_truth_data.csv

51,ASU Online,What resources are available to help evaluate my transfer credits to ASU Online?
51,ASU Online,How can I find out which of my credits may be applicable to a degree at ASU Online?
51,ASU Online,Are there specific advisors I should contact for information about transferring credits?
51,ASU Online,What process should I follow to receive a pre-evaluation of my transfer credits?
51,ASU Online,Can I get assistance in understanding how my credits align with ASU's degree programs?
52,ASU Online,What steps should I follow to locate my academic advisor at ASU?
52,ASU Online,Where can I find the contact information for my advisor after logging into My ASU?
52,ASU Online,Is it necessary to log in with my ASURITE ID and password to access advisor details?
52,ASU Online,What information will I see about my advisor once I click on the 'Advising' section?
52,ASU Online,Can I identify my degree program when I log into My ASU before finding my advisor?
