# Fine Tuning a GPT3 NLP Model for complex claim decomposition
In our attempt to improve the final veracity score for automated fact checking systems, we research whether decomposing complex claims into binary sub-claims will help to improve entailment classification for NLI models. 

To do so, we will train a GPT3 model to decompose complex input claims into binary sub-questions to better tackle different aspects of the claim.

This script uses a preprocessed version of the ClaimDecomp dataset specified in the `chen-etal-2022-generating ` paper by the University of Texas.

## Install Required Libraries

In [None]:
!pip install tqdm
!pip install pandas
!pip install beautifulsoup4
!pip install argparse
!pip install requests
!pip install allennlp==2.7
!pip install torch==1.9.0

## Loading the dataset

In [None]:
!pip install --upgrade pip setuptools
!pip install unicodecsv

In [None]:
import csv
import json

csv_path = "./filtered_data/train.csv"
jsonl_path = "./filtered_jsonl/train.jsonl"

# Read the CSV file and convert it to a list of dictionaries
with open(csv_path, "r", encoding="utf-8-sig") as csv_file, open(jsonl_path, "w", encoding="utf-8") as jsonl_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        jsonl_data = {"prompt": row["prompt"], "completion": row["completion"]}
        jsonl_file.write(json.dumps(jsonl_data, ensure_ascii=False) + "\n")

# Train OpenAi model

In [None]:
pip install --upgrade openai

In [None]:
%env OPENAI_API_KEY=[INSERT YOUR OPENAI KEY HERE]

In [None]:
import openai

## Final preprocessing for your training dataset using OpenAI's CLI tool

In [None]:
# ./filtered_jsonl/train.jsonl refers to the file location of the dataset you wish to fine-tune with. Feel free to modify this.
!openai tools fine_tunes.prepare_data -f ./filtered_jsonl/train.jsonl

Analyzing...

- Your file contains 793 prompt-completion pairs
- All prompts end with suffix `\nsub-questions:`. This suffix seems very long. Consider replacing with a shorter suffix, such as `\n\n###\n\n`
- Your data does not contain a common ending at the end of your completions. Having a common ending string appended to the end of the completion makes it clearer to the fine-tuned model where the completion should end. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples.
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details

Based on the analysis we will perform the following actions:
- [Recommended] Add a suffix ending ` END` to all completions [Y/n]: Y
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: Y


Your data will

## Fine-tuning with the dataset prepared

In [None]:
# Adding a suffix here is optional, exists to help you better identify the difference between different fine-tuned versions.
!openai api fine_tunes.create -t ./filtered_jsonl/train_prepared.jsonl -m davinci --suffix "train-append-subQ"

Upload progress:   0% 0.00/1.07M [00:00<?, ?it/s]Upload progress: 100% 1.07M/1.07M [00:00<00:00, 818Mit/s]
Uploaded file from ./filtered_jsonl/train_prepared.jsonl: file-LAP6Eb5w6BYKqZywGu9ilbrw
Created fine-tune: ft-NipGhN0IwwTPx4jUKO8NoGyr
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-03-27 17:43:42] Created fine-tune: ft-NipGhN0IwwTPx4jUKO8NoGyr

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-NipGhN0IwwTPx4jUKO8NoGyr



In [None]:
# To cancel fine-tuning job
!openai api fine_tunes.cancel -i [YOUR FINE-TUNING JOB ID, for eg. here its ft-Rz3v9tzlfifRFTZI8AEwIPz7]

In [None]:
# To follow up on the fine-tuning progress
!openai api fine_tunes.follow -i [YOUR FINE-TUNING JOB ID, for eg. here its ft-Rz3v9tzlfifRFTZI8AEwIPz7]

[2023-03-27 17:43:42] Created fine-tune: ft-NipGhN0IwwTPx4jUKO8NoGyr
[2023-03-27 17:47:31] Fine-tune costs $27.25
[2023-03-27 17:47:31] Fine-tune enqueued
[2023-03-27 17:48:14] Fine-tune is in the queue. Queue number: 31
[2023-03-27 17:48:39] Fine-tune is in the queue. Queue number: 30
[2023-03-27 17:49:24] Fine-tune is in the queue. Queue number: 29
[2023-03-27 17:51:17] Fine-tune is in the queue. Queue number: 28
[2023-03-27 17:51:51] Fine-tune is in the queue. Queue number: 27
[2023-03-27 17:52:57] Fine-tune is in the queue. Queue number: 26
[2023-03-27 17:53:45] Fine-tune is in the queue. Queue number: 25
[2023-03-27 17:55:37] Fine-tune is in the queue. Queue number: 24
[2023-03-27 17:56:06] Fine-tune is in the queue. Queue number: 23
[2023-03-27 17:57:30] Fine-tune is in the queue. Queue number: 22
[2023-03-27 18:01:40] Fine-tune is in the queue. Queue number: 21
[2023-03-27 18:01:48] Fine-tune is in the queue. Queue number: 20
[2023-03-27 18:01:58] Fine-tune is in the queue. Queu

In [None]:
# To pull up the available flags for completions
!openai api completions.create -h