A jupyter notebook to test the code generation module

In [17]:
import os
import re
import json
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv

In [18]:
# Read the CSV file into a Pandas DataFrame
# print(os.getcwd())
calendar_data = pd.read_csv("../data/calendar_data.csv")
print(f"The number of calendar data is {len(calendar_data)}.")

with open("../data/question.json") as json_file:
    json_data = json.load(json_file)
    # print(json_data[question_index])

print(f"The number of the question is {len(json_data)}.")
print(f"Selected question is: {json_data[0]['question']}.")
print(f"Selected answer is: {json_data[0]['answer']}.")

The number of calendar data is 20.
The number of the question is 10.
Selected question is: How many meetings do I have attended in total?.
Selected answer is: 18.


In [19]:

MODEL = "gpt-4"
# MODEL = "gpt-3.5-turbo"

question_index = 9 # for question test

In [20]:
def get_completion(MODEL, PROMPT):
    client = OpenAI(
        # This is the default and can be omitted
        api_key=os.environ.get("OPENAI_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": PROMPT,
            }
        ],
        model=MODEL,
    )

    # verify the output
    return chat_completion.choices[0].message.content

In [28]:
def get_prompt(question):
    
    PROMPT = f"""You are provided a calendar. This calendar is a Pandas dataframe named calendar_data, columns = [ID, status, summary, start, end , duration, attendees].
    All your calenders are stored in this DataFrame calendar_data.
    Your task is generate python function to query this dataframe and answer the question. Output a python code by enclosing it in triple backticks. 

    The input have following columns:
    - ID: meeting ID;
    - status: meeting status, including the following status: cancelled, confirmed, tentative; cancelled means that meeting is cancelled.
    - summary: meeting or event topic;
    - start: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx. for example "2024-02-05 12:00:00-00:00";
    - end: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx, for example "2024-02-05 13:00:00-00:00";
    - duration: meeting duration (second);
    - attendees: people who attend the meeting delimited by the line terminator within 1 sentence.

    Output is executable Python code by enclosing it in triple backticks:
    ```python
    <your code here>
    ```

    The input of python code is a Pandas dataframe named calendar_data, and the answer is saved in variable answer.

    For example, the output have the following format:
    ```
    import pandas as pd 
    def query(calendar_data):
        return calendar_data[0]
    answer = query(calendar_data)
    ```
    Today's date is '2024-03-28 18:02:30-03:20', date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx.

    Question to be resolved: {question} 
    """
    return PROMPT



In [29]:
def get_python_code(llm_reply_with_code):
    _python_code_re_pattern = "```python\n(.*?)```"
    llm_reply_without_code = re.sub(
                _python_code_re_pattern, "", llm_reply_with_code, flags=re.DOTALL
                )
    python_code_list = re.findall(_python_code_re_pattern, llm_reply_with_code, re.DOTALL)
    return python_code_list[0]

In [30]:
for question in json_data: 
    print(question['question'])
    print(question['answer'])

How many meetings do I have attended in total?
18
How many events do I have scheduled for today?
1
How many attendees are there for the meeting with ID masbk72a24cb0a8k9c7jo0e9s6?
2
What is the longest meeting ID on my calendar?
malrq85j74yb0m3n8j8ro2v5d9
How many events with duration longer than 1 hours?
6
How many meeting were cancelled in total?
2
How many events are scheduled to start tomorrow?
0
Do I have me with natalia tomorrow?
False
How many meeting did I have yestaerday ?
0
How many meetings do I have for next 3 days?
0


In [34]:
result = {'success': 0, 'wrong_answer': 0, 'error': 0}
q_a_pair = json_data[0]
PROMPT = get_prompt(q_a_pair['question'])
llm_reply_with_code = get_completion(MODEL, PROMPT)
python_code_list = get_python_code(llm_reply_with_code)
print(python_code_list)
try:
    exec(python_code_list)
    # answer = vars().get('answer')
except Exception as E:
    print(E)
    result['error'] += 1
    pass
else:
    # exec(python_code_list[0])
    print(f"""generated answer: {answer}""")
    if q_a_pair['answer'] == answer:
        result['success'] += 1
        print("success")
    else:   
        result['wrong_answer'] += 1
        print("wrong answer")
        

import pandas as pd 

def total_meetings_attended(calendar_data):
    # Filter the dataframe to include confirmed and tentative meetings only
    confirmed_meetings = calendar_data[(calendar_data['status'] == 'confirmed') | (calendar_data['status'] == 'tentative')]
    
    # Count the number of confirmed and tentative meetings
    total_meetings = len(confirmed_meetings)
    
    return total_meetings

answer = total_meetings_attended(calendar_data)

generated answer: 18
success


In [35]:
result = {'success': 0, 'wrong_answer': 0, 'error': 0}

for q_a_pair in json_data: 
    PROMPT = get_prompt(q_a_pair['question'])
    llm_reply_with_code = get_completion(MODEL, PROMPT)
    python_code_list = get_python_code(llm_reply_with_code)
    
    print(f"""Question: {q_a_pair['question']}\n True answer: {q_a_pair['answer']}""")

    # print(llm_reply_with_code)
    try:
        exec(python_code_list)
        # answer = vars().get('answer')
    except Exception as E:
        print(E)
        result['error'] += 1
        pass
    else:
        # exec(python_code_list[0])
        print(f"""generated answer: {answer}""")
        if q_a_pair['answer'] == answer:
            result['success'] += 1
            print("success")
        else:   
            result['wrong_answer'] += 1
            print("wrong answer")
        
    # result = json.dumps(result)
    print(result)
    

Question: How many meetings do I have attended in total?
 True answer: 18
generated answer: 18
success
{'success': 1, 'wrong_answer': 0, 'error': 0}
Question: How many events do I have scheduled for today?
 True answer: 1
generated answer: 0
wrong answer
{'success': 1, 'wrong_answer': 1, 'error': 0}
Question: How many attendees are there for the meeting with ID masbk72a24cb0a8k9c7jo0e9s6?
 True answer: 2
generated answer: 1
wrong answer
{'success': 1, 'wrong_answer': 2, 'error': 0}
Question: What is the longest meeting ID on my calendar?
 True answer: malrq85j74yb0m3n8j8ro2v5d9
generated answer: malrq85j74yb0m3n8j8ro2v5d9
success
{'success': 2, 'wrong_answer': 2, 'error': 0}


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Question: How many events with duration longer than 1 hours?
 True answer: 6
generated answer: 6
success
{'success': 3, 'wrong_answer': 2, 'error': 0}
Question: How many meeting were cancelled in total?
 True answer: 2
generated answer: 2
success
{'success': 4, 'wrong_answer': 2, 'error': 0}
Question: How many events are scheduled to start tomorrow?
 True answer: 0
generated answer: 0
success
{'success': 5, 'wrong_answer': 2, 'error': 0}
Question: Do I have me with natalia tomorrow?
 True answer: False
generated answer: False
wrong answer
{'success': 5, 'wrong_answer': 3, 'error': 0}


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Question: How many meeting did I have yestaerday ?
 True answer: 0
generated answer: 0
success
{'success': 6, 'wrong_answer': 3, 'error': 0}
Question: How many meetings do I have for next 3 days?
 True answer: 0
generated answer: 0
success
{'success': 7, 'wrong_answer': 3, 'error': 0}


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
