<a href="https://colab.research.google.com/github/Anson3208/Fine-Tune-GPT/blob/main/Fine_tuning_GPT_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build a Fine-Tuned GPT Model

The goal of this project is to build a model that can provide a detailed description of a person based on personality based on their age, gender, and traits.

The fine-tune model is to illustrate the process of fine-tuning the GPT using the Ada model, showcasing the specific steps and techniques employed to adapt the model to a particular task. 

Please note that for this demonstration, a more cost-effective model named Ada is being utilized, which may result in responses of lower quality

In [None]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

In [None]:
import openai

In [None]:
from google.colab import files
uploaded_files = files.upload()
api_key = uploaded_files['api_key.txt'].decode()

Saving api_key.txt to api_key.txt


In [None]:
openai.api_key = api_key

In [None]:
import os
os.environ['OPENAI_API_KEY'] = api_key

# Preparation of Personality Description dataset for fine-tuning

 It creates a dataframe and generates prompts to describe a person's personality based on their age, gender, and personality traits. The code iterates through different combinations of age, gender, and personality, generating a prompt each time. It then uses the OpenAI Completion API to receive responses for each prompt. The responses are stored in the dataframe along with relevant information such as age, gender, and personality. Finally, the dataframe is saved as a CSV file named "out_openai_completion.csv".

In [None]:
import pandas as pd

In [None]:
l_age = ['young', 'old']
l_gender = ['man', 'woman']
l_personality = ['optimistic', 'energetic', 'introverted'] 

f_prompt = "Write a detailed description of a {age} {gender} who has {personality} personality. Write out the entire description in a maximum of 20 words in great detail:"
f_sub_prompt = "{age}, {gender}, {personality}"

df = pd.DataFrame()
for age in l_age:
 for gender in l_gender:
  for personality in l_personality:
   for i in range(3): ## 3 times each
    prompt = f_prompt.format(age=age, gender=gender, personality=personality)
    sub_prompt = f_sub_prompt.format(age=age, gender=gender, personality=personality)
    print(sub_prompt)
    
    response = openai.Completion.create(
     model="text-davinci-003",
     prompt=prompt,
     temperature=1,
     max_tokens=500,
     top_p=1,
     frequency_penalty=0,
     presence_penalty=0
    )
    
    finish_reason = response['choices'][0]['finish_reason']
    response_txt = response['choices'][0]['text']
    
    new_row = {
      'age':age, 
      'gender':gender, 
      'personality':personality, 
      'prompt':prompt, 
      'sub_prompt':sub_prompt, 
      'response_txt':response_txt, 
      'finish_reason':finish_reason}
    new_row = pd.DataFrame([new_row])
    df = pd.concat([df, new_row], axis=0, ignore_index=True)

df.to_csv("out_openai_completion.csv")

young, man, optimistic
young, man, optimistic
young, man, optimistic
young, man, energetic
young, man, energetic
young, man, energetic
young, man, introverted
young, man, introverted
young, man, introverted
young, woman, optimistic
young, woman, optimistic
young, woman, optimistic
young, woman, energetic
young, woman, energetic
young, woman, energetic
young, woman, introverted
young, woman, introverted
young, woman, introverted
old, man, optimistic
old, man, optimistic
old, man, optimistic
old, man, energetic
old, man, energetic
old, man, energetic
old, man, introverted
old, man, introverted
old, man, introverted
old, woman, optimistic
old, woman, optimistic
old, woman, optimistic
old, woman, energetic
old, woman, energetic
old, woman, energetic
old, woman, introverted
old, woman, introverted
old, woman, introverted


# Data Preparation and Fine-Tuning Initialization

This section of the code involves the preparation of data and initialization of the fine-tuning process for generating personalized personality descriptions. It includes steps such as reading the CSV file, selecting relevant columns, renaming them appropriately, and saving the prepared data to a new CSV file. The subsequent conversion of the CSV file to JSON format and the initiation of the fine-tuning process using the OpenAI API are also executed in this section.

In [None]:
import subprocess

In [None]:
df = pd.read_csv('out_openai_completion.csv')

In [None]:
df['response_txt'][10].strip() #check 

'An optimistic young woman with a bubbly personality, an eternal smile, and a desire to make the world a better place.'

In [None]:
prepared_data = df.loc[:,['sub_prompt','response_txt']]
prepared_data.rename(columns={'sub_prompt':'prompt', 'response_txt':'completion'}, inplace=True)
prepared_data.to_csv('prepared_data.csv',index=False)


## prepared_data.csv --> prepared_data_prepared.json
subprocess.run('openai tools fine_tunes.prepare_data --file prepared_data.csv --quiet'.split())

## Start fine-tuning
subprocess.run('openai api fine_tunes.create --training_file prepared_data_prepared.jsonl --model ada --suffix "Personality"'.split())

CompletedProcess(args=['openai', 'api', 'fine_tunes.create', '--training_file', 'prepared_data_prepared.jsonl', '--model', 'ada', '--suffix', '"Personality"'], returncode=0)