<a href="https://colab.research.google.com/github/123saga/intent_analyzer/blob/main/intent_analyzer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import json
np.random.seed(123)

In [5]:
## read sample data
sample_data = pd.read_csv("hc_bot_prompt_intent.csv")
sample_data.head(10)

Unnamed: 0,prompt,completion
0,Schedule an appointment with a doctor.,schedule_appointment
1,I need to schedule a follow-up appointment.,schedule_appointment
2,Can I reschedule my appointment?,reschedule_appointment
3,I want to reschedule my appointment.,reschedule_appointment
4,How can I reschedule my appointment?,reschedule_appointment
5,I need to cancel my appointment.,cancel_appointment
6,Can I cancel my appointment?,cancel_appointment
7,How do I cancel my appointment?,cancel_appointment
8,I need a prescription for my medication.,request_prescription
9,Can you refill my prescription?,request_prescription_refill


In [6]:
## possible intents in the sample
sample_data['completion'].unique()

array(['schedule_appointment', 'reschedule_appointment',
       'cancel_appointment', 'request_prescription',
       'request_prescription_refill', 'insurance_questions',
       'preventive_care', 'chronic_care', 'urgent_care',
       'symptom_checker', 'find_new_doctor', 'find_specialist',
       'medication_questions', 'mental_health'], dtype=object)

In [7]:
## preparing data for GPT API compatibility 
sample_data['prompt'] = sample_data['prompt'] + "\n\nIntent:\n\n"
sample_data['completion'] = " "+sample_data['completion'] + " END"
sample_data.head()

Unnamed: 0,prompt,completion
0,Schedule an appointment with a doctor.\n\nInte...,schedule_appointment END
1,I need to schedule a follow-up appointment.\n\...,schedule_appointment END
2,Can I reschedule my appointment?\n\nIntent:\n\n,reschedule_appointment END
3,I want to reschedule my appointment.\n\nIntent...,reschedule_appointment END
4,How can I reschedule my appointment?\n\nIntent...,reschedule_appointment END


In [16]:
## convert data frame into jsonl format for preparing fine tuning:
sample_data.to_json("intent_sample.jsonl", orient='records', lines=True)

In [9]:
## instal openai 
!pip install --upgrade openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.2-py3-none-any.whl (70 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/70.1 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.1/70.1 KB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m158.8/158.8 KB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2

In [10]:
## pass API key
import openai
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


In [17]:
## use opne api tools create train and valid datasets:
!openai tools fine_tunes.prepare_data -f intent_sample.jsonl

Analyzing...

- Your file contains 242 prompt-completion pairs
- Based on your data it seems like you're trying to fine-tune a model for classification
- For classification, we recommend you try one of the faster and cheaper models, such as `ada`
- For classification, you can estimate the expected model performance by keeping a held out dataset, which is not used for training
- There are 1 duplicated prompt-completion sets. These are rows: [224]
- All prompts end with suffix `\n\nIntent:\n\n`. This suffix seems very long. Consider replacing with a shorter suffix, such as `\n\n###\n\n`

Based on the analysis we will perform the following actions:
- [Recommended] Remove 1 duplicate rows [Y/n]: y
- [Recommended] Would you like to split into training and validation set? [Y/n]: y


Your data will be written to a new JSONL file. Proceed [Y/n]: y

Wrote modified files to `intent_sample_prepared_train.jsonl` and `intent_sample_prepared_valid.jsonl`
Feel free to take a look!

Now use that file 

In [20]:
!openai tools fine_tunes.prepare_data -f intent_sample.jsonl

Analyzing...

- Your file contains 242 prompt-completion pairs
- Based on your data it seems like you're trying to fine-tune a model for classification
- For classification, we recommend you try one of the faster and cheaper models, such as `ada`
- For classification, you can estimate the expected model performance by keeping a held out dataset, which is not used for training
- There are 1 duplicated prompt-completion sets. These are rows: [224]
- All prompts end with suffix `\n\nIntent:\n\n`. This suffix seems very long. Consider replacing with a shorter suffix, such as `\n\n###\n\n`

Based on the analysis we will perform the following actions:
- [Recommended] Remove 1 duplicate rows [Y/n]: Y
- [Recommended] Would you like to split into training and validation set? [Y/n]: Y


Your data will be written to a new JSONL file. Proceed [Y/n]: Y

Wrote modified files to `intent_sample_prepared_train.jsonl` and `intent_sample_prepared_valid.jsonl`
Feel free to take a look!

Now use that file 

In [27]:
## create finetune model
!openai api fine_tunes.create -t "intent_sample_prepared_train.jsonl" -v "intent_sample_prepared_valid.jsonl" -m 'davinci'

Found potentially duplicated files with name 'intent_sample_prepared_train.jsonl', purpose 'fine-tune' and size 21778 bytes
file-bK1zDXPmBQ7tOVywKCX3zl5O
file-S1LoZBJBxRrWSXMFKL1REnGo
file-gAYrrUeUtDOgGRREOeuxLUId
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: 
Upload progress: 100% 21.8k/21.8k [00:00<00:00, 21.6Mit/s]
Uploaded file from intent_sample_prepared_train.jsonl: file-nLYdJDErXjspGyd77WLb3tyk
Found potentially duplicated files with name 'intent_sample_prepared_valid.jsonl', purpose 'fine-tune' and size 5723 bytes
file-lalV9BbhuqoSDQQEUBLouoKt
file-lBCyNigxzhzDmi67jhaChHI6
file-962DY2WcqzmdWSNPGPFhJxIX
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: 
Upload progress: 100% 5.72k/5.72k [00:00<00:00, 6.37Mit/s]
Uploaded file from intent_sample_prepared_valid.jsonl: file-5fUj7PrunT3gYc0lsrGXlh2V
Created fine-tune: ft-uFZB4NTBEga77wafewJZa5H1
Streaming events until fine-tuning is comple

In [40]:
## continue to progress on fine tune model creation:
!openai api fine_tunes.follow -i ft-fRUjjY4eeCdUi4YEl90GQZju
#!openai api fine_tunes.follow -i ft-uFZB4NTBEga77wafewJZa5H1

[2023-03-27 20:01:49] Created fine-tune: ft-fRUjjY4eeCdUi4YEl90GQZju
[2023-03-27 20:05:00] Fine-tune costs $0.48
[2023-03-27 20:05:00] Fine-tune enqueued. Queue number: 21
[2023-03-27 20:05:06] Fine-tune is in the queue. Queue number: 20
[2023-03-27 20:05:08] Fine-tune is in the queue. Queue number: 19
[2023-03-27 20:05:22] Fine-tune is in the queue. Queue number: 18
[2023-03-27 20:06:31] Fine-tune is in the queue. Queue number: 17
[2023-03-27 20:08:40] Fine-tune is in the queue. Queue number: 16
[2023-03-27 20:10:01] Fine-tune is in the queue. Queue number: 15
[2023-03-27 20:22:35] Fine-tune is in the queue. Queue number: 14
[2023-03-27 20:22:36] Fine-tune is in the queue. Queue number: 13
[2023-03-27 20:22:42] Fine-tune is in the queue. Queue number: 12
[2023-03-27 20:22:44] Fine-tune is in the queue. Queue number: 10
[2023-03-27 20:22:44] Fine-tune is in the queue. Queue number: 10
[2023-03-27 20:22:45] Fine-tune is in the queue. Queue number: 8
[2023-03-27 20:22:45] Fine-tune is in

In [77]:
def intent_inference():
    while True:
        prompt = input("What's on your mind? ")
        openai.api_key = os.environ["OPENAI_API_KEY"]
        response = openai.Completion.create(
            model= 'davinci:ft-personal-2023-03-27-20-35-22',
            prompt=prompt + " \n\nIntent:\n\n",
            max_tokens=4,
            temperature=0,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=1,
            stop=[" END"]
        )
        print(response['choices'][0]['text'])
        user_input = input("Do you want to continue? (y/n): ")
        if user_input.lower() == 'n':
            break

In [None]:
## try the model:
intent_inference()

 mental_health
What's on your mind? what is cost of urgent care appointment 
 urgent_care
