## Importing the Libraries and also python install the openAI package

In [84]:
import numpy as np
import pandas as pd
from google.colab import drive
# !pip install --upgrade openai
# !pip install wandb
drive.mount('/content/drive/')
project_path = '/content/drive/My Drive/Data Science/GPT3/'
import openai

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


## Loading the dataset and adding seperator

In [85]:
df = pd.read_csv(project_path+'sample_snps_data.csv')
df.drop('case_id',axis=1,inplace=True)
df.rename(columns = {'q7':'prompt','q6':'completion'}, inplace = True)
df.to_json(project_path+"data.jsonl", orient='records', lines=True)

def add_seperator(x):
  return x+" -> "

df['prompt'] = df.apply(lambda x: add_seperator(x['prompt']),axis=1)

In [86]:
df.prompt.iloc[0]

'All I keep hearing is I can not help you.  I have been put off and hung up on.   All I want go do is cancel my HP smart friend account. I am done with hp! -> '

## Converting the Dataframe to a JSON format - to be loaded to GPT3 Model

1. "!openai" (Openai Package) 

2. "tools fine_tunes.prepare_data" (call prepare_data method to prepare the data in json format and  for finetuning the GPT3 model )

3. -f (File path)

In [151]:
!openai tools fine_tunes.prepare_data -f "/content/drive/My Drive/Data Science/GPT3/data.jsonl" -q

Analyzing...

- Your file contains 1000 prompt-completion pairs
- Based on your data it seems like you're trying to fine-tune a model for classification
- For classification, we recommend you try one of the faster and cheaper models, such as `ada`
- For classification, you can estimate the expected model performance by keeping a held out dataset, which is not used for training
- There are 1 duplicated prompt-completion sets. These are rows: [663]
- Your data does not contain a common separator at the end of your prompts. Having a separator string appended to the end of the prompt makes it clearer to the fine-tuned model where the completion should begin. See https://beta.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. If you intend to do open-ended generation, then you should leave the prompts empty
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://bet

# **Fine tuning GPT3**
1. !openai -> calling open ai
2. -k -> Sharing the API key
3. api fine_tunes.create  ->  Using the API fine tune GPT3
4. -t  -> Training data location
5. -v  -> Validation data
6. -m  -> Base model type
7. --compute_classification_metrics   -> Computing classification report
8. --classification_n_classes 3  -> Using 3 classes

In [88]:
# !export OPENAI_API_KEY= "Your - API key"
# !export OPENAI_API_KEY= "Your - API key"

!openai -k "Your - API key" api fine_tunes.create -t "/content/drive/My Drive/Data Science/GPT3/data_prepared_train.jsonl" -v "/content/drive/My Drive/Data Science/GPT3/data_prepared_valid.jsonl" -m ada --batch_size --compute_classification_metrics --classification_n_classes 3

Found potentially duplicated files with name 'data_prepared_train.jsonl', purpose 'fine-tune' and size 324112 bytes
file-YnC4gk1VuNKvVwvlWP5HtXG7
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: YnC4gk1VuNKvVwvlWP5HtXG7
File id 'YnC4gk1VuNKvVwvlWP5HtXG7' is not among the IDs of the potentially duplicated files

Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway: file-YnC4gk1VuNKvVwvlWP5HtXG7
Reusing already uploaded file: file-YnC4gk1VuNKvVwvlWP5HtXG7
Upload progress: 100% 85.9k/85.9k [00:00<00:00, 152Mit/s]
Uploaded file from /content/drive/My Drive/Data Science/GPT3/data_prepared_valid.jsonl: file-infRk9VHi6wEy33sWFpfYNvk
Created fine-tune: ft-U8Xkz19BrpE5RwUma9FBulOD
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-07-28 06:29:15] Created fine-tune: ft-U8Xkz19BrpE5RwUma9FBulOD
[2022-07-28 06:29:22] Fine-tune costs $0.11
[

In [124]:
!openai -k "Your - API key" api fine_tunes.results -i ft-U8Xkz19BrpE5RwUma9FBulOD > result.csv

In [125]:
!openai -k "Your - API key" api completions.create -m ada:ft-student-2022-07-28-06-37-48 -p "I dint get a good resolution ->"

I dint get a good resolution -> detractor -> detractor detractor detractor detractor detractor detractor detract

In [126]:
results = pd.read_csv('result.csv')
results[results['classification/accuracy'].notnull()].tail(1)

Unnamed: 0,step,elapsed_tokens,elapsed_examples,training_loss,training_sequence_accuracy,training_token_accuracy,validation_loss,validation_sequence_accuracy,validation_token_accuracy,classification/accuracy,classification/weighted_f1_score
3196,3197,278437,3197,0.134664,1.0,1.0,,,,0.885,0.858391


# Pythonese version of prediction

In [110]:
openai.api_key = 'Your - API key'
ft_model = 'ada:ft-student-2022-07-28-05-56-26'
openai.Completion.create(model=ft_model, prompt="I dint get a good resolution ->")

<OpenAIObject text_completion id=cmpl-5YrdVPevwFxqjoRRxi5XJJg8UXcoc at 0x7f8a960598f0> JSON: {
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": " detractor detractor detractor detractor detractor detractor detractor detractor"
    }
  ],
  "created": 1658991013,
  "id": "cmpl-5YrdVPevwFxqjoRRxi5XJJg8UXcoc",
  "model": "ada:ft-student-2022-07-28-05-56-26",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 16,
    "prompt_tokens": 8,
    "total_tokens": 24
  }
}

In [149]:
result = openai.Completion.create(model=ft_model, prompt="I purchased the computer June 2017, it has been working well until now. ->")

In [150]:
result['choices'][0]['text'][0:10]

' passive p'

# Key things to watchout in the documentation:

### The seperator does not fit in even if provided. It uses the default seperator " ->"  - Check documentation

### The classes predicted gets repeated multiple times - Check documentation