A jupyter notebook to test the code generation module

In [281]:
import os
import json
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv

In [282]:
# Read the CSV file into a Pandas DataFrame
# print(os.getcwd())
calendar_data = pd.read_csv("../data/calendar_data.csv")
print(f"The number of calendar data is {len(calendar_data)}.")

The number of calendar data is 20.


1. Loading the dataset and json
2. Prompt the dataset column and json question to gpt-4 to generate the python function
3. pass the generated funtion and dataframe into the test function
4. save the result into json-ansewer

In [283]:
question_index = 9 # for question test

with open("../data/question.json") as json_file:
    json_data = json.load(json_file)
    print(json_data[question_index])

# print(f"The total number of the question is {len(json_data)}.")
print(f"Selected question is: <{json_data[question_index]['question']}>.")
print(f"Selected answer is: {json_data[question_index]['answer']}.")

{'question': 'How many meetings do I have for next 3 days?', 'answer': 0}
Selected question is: <How many meetings do I have for next 3 days?>.
Selected answer is: 0.


In [284]:
text = f"""
{json_data[question_index]['question']}
"""
PROMPT = f"""You are provided a your own calendar. This calendar is a Pandas dataframe named calendar_data, columns = [ID, status, summary, start, end , duration, attendees].
DataFrame calendar_data includes all you calenders.
Your task is generate python function to query this dataframe and answer the question. Output a python code by enclosing it in triple backticks. 

The input have following columns:
- ID: meeting ID;
- status: meeting status, including the following status: cancelled, confirmed, tentative; cancelled means that meeting is cancelled.
- summary: meeting or event topic;
- start: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx. for example "2024-02-05 12:00:00-00:00";
- end: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx, for example "2024-02-05 13:00:00-00:00";
- duration: meeting duration (second);
- attendees: people who attend the meeting delimited by the line terminator within 1 sentence.

Output is executable Python code by enclosing it in triple backticks:
```python
<your code here>
```

The input of python code is a Pandas dataframe named calendar_data, and the answer is saved in variable answer.

For example, the output have the following format:
```
import pandas as pd 
def query(calendar_data):
    return calendar_data[0]
answer = query(calendar_data)
```
Today's date is '2024-03-28 18:02:30-03:20', date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx.

Question to be resolved: {text} 
"""

In [285]:
print(PROMPT)

You are provided a your own calendar. This calendar is a Pandas dataframe named calendar_data, columns = [ID, status, summary, start, end , duration, attendees]. DataFrame calendar_data includes all you calenders.
Your task is generate python function to query this dataframe and answer the question. Output a python code by enclosing it in triple backticks. 

The input have following columns:
- ID: meeting ID;
- status: meeting status, including the following status: cancelled, confirmed, tentative; cancelled means that meeting is cancelled.
- summary: meeting or event topic;
- start: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx. for example "2024-02-05 12:00:00-00:00";
- end: the start date of meeting, date format: YYYY-MM-DD hh:mm:ss.fff-zz:xx, for example "2024-02-05 13:00:00-00:00";
- duration: meeting duration (second);
- attendees: people who attend the meeting delimited by the line terminator within 1 sentence.

Output is executable Python code by enclosi

In [286]:

MODEL = "gpt-4"
# MODEL = "gpt-3.5-turbo"

def get_completion(MODEL, PROMPT):
    client = OpenAI(
        # This is the default and can be omitted
        api_key=os.environ.get("OPENAI_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": PROMPT,
            }
        ],
        model=MODEL,
    )

    # verify the output
    return chat_completion.choices[0].message.content

llm_reply_with_code = get_completion(MODEL, PROMPT)
print(llm_reply_with_code)

```python
import pandas as pd
from datetime import datetime, timedelta

# Convert the string date into datetime objects
calendar_data['start'] = pd.to_datetime(calendar_data['start'])
calendar_data['end'] = pd.to_datetime(calendar_data['end'])

def query(calendar_data):

    # Today's date
    today = datetime.strptime('2024-03-28 18:02:30-03:20', "%Y-%m-%d %H:%M:%S%z")

    # Find the date and time for 3 days from today.
    three_days_later = today + timedelta(days=3)

    # Find the meetings that are scheduled within the next 3 days.
    query = (calendar_data['start'] >= today) & (calendar_data['end'] <= three_days_later) & (calendar_data['status'] != 'cancelled')

    # Calculte the total number of meetings within the next 3 days.
    num_of_meetings = calendar_data[query].shape[0]
    
    return num_of_meetings

answer = query(calendar_data)
```
The function `query()` takes a dataframe `calendar_data` and filters out only the rows where the start date is after today and end date

In [287]:
import re

_python_code_re_pattern = "```python\n(.*?)```"
llm_reply_without_code = re.sub(
            _python_code_re_pattern, "", llm_reply_with_code, flags=re.DOTALL
            )
python_code_list = re.findall(_python_code_re_pattern, llm_reply_with_code, re.DOTALL)


In [288]:
python_code_list[0]

'import pandas as pd\nfrom datetime import datetime, timedelta\n\n# Convert the string date into datetime objects\ncalendar_data[\'start\'] = pd.to_datetime(calendar_data[\'start\'])\ncalendar_data[\'end\'] = pd.to_datetime(calendar_data[\'end\'])\n\ndef query(calendar_data):\n\n    # Today\'s date\n    today = datetime.strptime(\'2024-03-28 18:02:30-03:20\', "%Y-%m-%d %H:%M:%S%z")\n\n    # Find the date and time for 3 days from today.\n    three_days_later = today + timedelta(days=3)\n\n    # Find the meetings that are scheduled within the next 3 days.\n    query = (calendar_data[\'start\'] >= today) & (calendar_data[\'end\'] <= three_days_later) & (calendar_data[\'status\'] != \'cancelled\')\n\n    # Calculte the total number of meetings within the next 3 days.\n    num_of_meetings = calendar_data[query].shape[0]\n    \n    return num_of_meetings\n\nanswer = query(calendar_data)\n'

In [289]:
exec(python_code_list[0])
answer

0