<a href="https://colab.research.google.com/github/Zihooo/Text-selection-codes-pub/blob/main/GPT4_prediction_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Models for Personality Score Prediction
This colab is written in **Python** to illistrate the process of using  state-of-the-art **Transformer** models to predict personality scores. In this code sample, we used **GPT-4** as an example of a transformer and **Extraversion** as a sample personality trait. We've made notes in the code about the changes you'd need to make to use other transformers or predict other personality traits.

In [None]:
#install necessary pacakge
!pip install openai

Collecting openai
  Downloading openai-1.35.0-py3-none-any.whl (326 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m326.0/326.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [None]:
# import the pandas module for data reading
import pandas as pd

In [None]:
# Import Data function
def import_data(path, text_col, label_col, index_col = None, index_val = None, enc = 'latin1'):
  """Import a CSV of sentences

  Args:
    path: A csv file path
    text_col: Name of column in csv containing sentences
    label_col: Name of column containing labels
    enc: File encoding to be used (optional)
  """
# read data
  df = pd.read_csv(path, encoding = enc,keep_default_na=False)
  if not isinstance(index_val, type(None)):
    df = df[df[index_col] == index_val]
  if label_col is None:
    return df[text_col].tolist(), df
  return df[text_col].tolist(), df[label_col].tolist(), df

In [None]:
# We save the aggregated responses ('All_response' column in dataset) as text input, the labels are not needed in this example because we didn't fine-tune the model.
all_text, all_labels, all_raw_data = import_data("/content/drive/MyDrive/Text Selection Paper Codes/data/all_text_latent_extract_10.csv", "All_response", "escore")

In [None]:
# we create the prompt for each case in our data using the for loop
# within each prompt, there should be a 'system' role and a 'user' role. The 'system' contains the instruction of current task and the 'user' contains each individual's response.
ds_test_formatted_E = []
for i in range(len(all_text)):

  ds_test_formatted_E.append([
    {"role": "system", "content": """Read the responses to open-ended personality questions. Predict the responders Big Five personalities based on their text response.
        Note that individuals who are agreeable tend to be good-natured, compliant, modest, gentle, and cooperative while individuals that are not agreeable tend to be irritable, ruthless, suspicious and inflexible.
        Note that individuals who are open to experiences tend to be intellectual, imaginative, sensitive and open-minded while individuals that are not open to experiences tend to be down-to-earth, insensitive and conventional.
        Note that individuals who are conscientious tend to be careful, thorough, organized and scrupulous while individuals that are not conscientious tend to be irresponsible, disorganized and unscrupulous.
        Note that individuals who are extraverted tend to be sociable, talkative, assertive and active while individuals that are not extraverted tend to be retiring, reserved and cautious.
        Note that individuals who are neurotic tend to be anxious, depressed, angry and insecure while individuals that are not neurotic tend to be calm, poised and emotionally stable.
        Please report the personality scores of extraversion. The score values can only be a number between 1-5, and 5 indicates high extraversion level. Report the score value only in your response, DO NOT include any other information in your response."""},
    {"role": "user", "content": all_text[i]}
  ])

In [None]:
# An example from the list of prompts
ds_test_formatted_E[0]

[{'role': 'system',
  'content': 'Read the responses to open-ended personality questions. Predict the responders Big Five personalities based on their text response.\n        Note that individuals who are agreeable tend to be good-natured, compliant, modest, gentle, and cooperative while individuals that are not agreeable tend to be irritable, ruthless, suspicious and inflexible.\n        Note that individuals who are open to experiences tend to be intellectual, imaginative, sensitive and open-minded while individuals that are not open to experiences tend to be down-to-earth, insensitive and conventional.\n        Note that individuals who are conscientious tend to be careful, thorough, organized and scrupulous while individuals that are not conscientious tend to be irresponsible, disorganized and unscrupulous.\n        Note that individuals who are extraverted tend to be sociable, talkative, assertive and active while individuals that are not extraverted tend to be retiring, reserved 

In [None]:
# import the necessary modules to make an API call to OpenAI.
import openai
import os

# set the OpenAI API key
os.environ["OPENAI_API_KEY"] = "your API key"
openai.api_key = 'your API key'
# Upload training data
from openai import OpenAI

client = OpenAI()

In [None]:
# We use a for loop to iterate all prompts in our list and make score prediction.
# create an empty list to save the scores
pred_scores = []
for i in range(len(ds_test_formatted_E)):
  # create gpt generated score
  completion = client.chat.completions.create(
    model='gpt-4o-2024-05-13',
    # here is your prompt list
    messages=ds_test_formatted_E[i],
    top_p=0,
    temperature=0,
    seed = 123
  )
  pred_scores.append(completion.choices[0].message.content)

In [None]:
pred_scores

['2',
 '3',
 '5',
 '2',
 '2',
 '4',
 '3',
 '4',
 '4',
 '5',
 '3',
 '3',
 '4',
 '4',
 '3',
 '2',
 '2',
 '4',
 '3',
 '2',
 '4',
 '2',
 '4',
 '2',
 '2',
 '3',
 '2',
 '3',
 '4',
 '4',
 '2',
 '3',
 '4',
 '2',
 '2',
 '4',
 '2',
 '2',
 '3',
 '4',
 '4',
 '4',
 '5',
 '3',
 '4',
 '3',
 '4',
 '3',
 '3',
 '3',
 '4',
 '4',
 '3',
 '2',
 '4',
 '3',
 '5',
 '3',
 '3',
 '5',
 '3',
 '2',
 '3',
 '3',
 '4',
 '4',
 '3',
 '3',
 '4',
 '3',
 '3',
 '4',
 '4',
 '2',
 '4',
 '3',
 '5',
 '2',
 '4',
 '3',
 '3',
 '3',
 '3',
 '4',
 '3',
 '4',
 '5',
 '3',
 '4',
 '2',
 '4',
 '2',
 '2',
 '4',
 '3',
 '3',
 '5',
 '2',
 '3',
 '4',
 '4',
 '2',
 '5',
 '2',
 '4',
 '3',
 '4',
 '4',
 '2',
 '4',
 '3',
 '4',
 '2',
 '5',
 '2',
 '5',
 '4',
 '4',
 '3',
 '4',
 '3',
 '3',
 '4',
 '4',
 '2',
 '5',
 '3',
 '5',
 '4',
 '3',
 '3',
 '3',
 '3',
 '2',
 '3',
 '4',
 '4',
 '4',
 '4',
 '3',
 '4',
 '3',
 '2',
 '2',
 '3',
 '2',
 '4',
 '3',
 '3',
 '4',
 '5',
 '2',
 '2',
 '4',
 '3',
 '3',
 '4',
 '2',
 '3',
 '4',
 '3',
 '3',
 '3',
 '3',
 '4',
 '4',
 '4'

In [None]:
# Save the predicted scores to drive.
preddf = pd.DataFrame(pred_scores, columns=["pred_E"])
preddf.to_csv('/content/drive/MyDrive/Escore.csv')