# Personality Prediction via Large Language Models (LLMs)

This notebook provides code to predict conversation personality traits using OpenAI’s GPT models. It runs in two stages:

1. **Speaker Attribute Analysis**  
2. **Personality Prediction**  

> **Before you begin:**  
> 1. Get an OpenAI API key and base URL.  
> 2. Purchase the required quotas on [OpenAI’s API](https://openai.com/index/openai-api/).  
> 3. Enter your credentials in the cell below (`api_key`, `api_base`).

---

In [1]:
##################################################
############### API Key of ChatGPT ###############
##################################################

# api_key = "sk_..."
# api_base = ""

##################################################
##################################################
##################################################

import warnings
warnings.filterwarnings("ignore")

import librosa
import numpy as np
from openai import OpenAI
import glob
import os

import sys
sys.path.append("../sho_util/pyfiles/")
from basic import get_bool_base_on_conditions

sys.path.append('../pyfiles/')
from dialog import concatenate_close_voice, update_information, get_start_end_referencedf, most_frequent
from llmprediction import GetResult_Personality, get_prompt_character

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 504)
pd.set_option('display.max_colwidth', None)

# client = OpenAI(api_key=api_key, base_url=api_base)
client = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='ollama', # required, but unused
)

---
# Speaker Attribute Analysis

In this step, we analyze each speaker’s textual, acoustic, and behavioral cues and map them to relative discrete values (e.g., “Normal”, “Rare”, “Frequent”). These values are normalized using the mean and IQR computed over the first 995 dialogue samples from the Fisher dataset (`mean_dir` and `iqr_dir`).  

Please adjust the following variables as needed:

- `audiopath`: A string containing the path to your two‑channel audio file.  
- `feature_dir`: A string specifying the directory where all preprocessed outputs are saved.  
- `mean_dir`: A string pointing to the folder with precomputed means used for normalization.  
- `iqr_dir`: A string pointing to the folder with precomputed IQRs used for normalization.  

You can also tweak these behavioral parameters:

- `n_turns`: Number of turns per minute  
- `avg_duration`: Average speaking time per turn  
- `avg_laughter`: Laughter events per minute  
- `avg_bc`: Total backchannel events per minute  
- `avg_emotive`: Emotive backchannel events per minute  
- `avg_cognitive`: Cognitive backchannel events per minute  
- `n_ci`: Interjection events per minute  

---


In [2]:
###########################################
########## Adjustable Parameters ##########
###########################################

audiopath = "../audio/sample.wav"
feature_dir = "../audio/features/sample/"

###########################################
###########################################
###########################################

# These values represent the mean and iqr of the speaker attribute values computed from the first 995 samples in Fisher. 
mean_dir = {
    'n_turns': 7.383293694948496,
    'avg_duration': 3.3336954492304645,
    'avg_laughter': 1.8922364427060465,
    'avg_emotive': 0.8121129481276123,
    'avg_cognitive': 1.9553241783407014,
    'n_ci': 0.5467134601571795
}
iqr_dir = {
    'n_turns': 2.605129382565053,
    'avg_duration': 1.6044905337979838,
    'avg_laughter': 2.4286077576460636,
    'avg_emotive': 0.8618173313355536,
    'avg_cognitive': 1.7463671894145367,
    'n_ci': 0.5053406298424978
}

x, fs = librosa.load(audiopath)
personality_list = ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]

score_dir = {
    "highly aligned": 100,
    "aligned": 50,
    "positive": 100,
    "neutral": 0,
    "opposed": -50,
    "negative": -100,
    "highly opposed": -100,
    
    "slightly opposed": -50,
    "slightly aligned": 50,
    "introversion": -100,
    "emotionally stable": -50,
    "antagonism": -100,
    "aligned with introversion": -50,
    
    "Sentiment": {
        "positive": 10,
        "neutral": 0,
        "negative": -10,
    },
}

print("#######################")
print("##### Preparation #####")
print("#######################")

labels = {
    "Sentence-Emotion": ["anger", "disgust", "fear", "joy", "neutral", "sadness", "surprise",],
    "Sentence-Sentiment": ["positive", "neutral", "negative"],
}
sentiment_score = False
classes = ["Sentence-Emotion"]
if not(sentiment_score):
    classes += ["Sentence-Sentiment"]
counts = {cl: {label: 0 for label in labels[cl]} for cl in classes}
classes = []
if sentiment_score:
    classes += ["Sentence-Sentiment"]
scores = {cl: 0 for cl in classes}
columns = []
for cl in ["filename", "speaker", "n_turns", "duration", "n_newwords", "ratio_newwords", "avg_duration", "avg_laughter", "avg_emotive", "avg_cognitive", "avg_bc", "n_ci"]:
    columns += [("basics", cl)]
for key in counts:
    cl1 = key.split("-")[-1].lower()
    for cl2 in counts[key]:
        columns += [(cl1, cl2)]
for key in scores:
    cl1 = key.split("-")[-1].lower()
    columns += [("scores", cl1)]
columns = pd.MultiIndex.from_tuples(columns)

thres_duration_tobe_bc = 1.0
thres_being_bc = 1.5

resultpath = feature_dir + "whisper/" + os.path.basename(audiopath[:-4]) + f".npy"
laughpath = feature_dir + "laughs/" + os.path.basename(audiopath[:-4]) + f".npy"
tablepath = laughpath.replace("laughs", "results")

a = np.load(tablepath, allow_pickle=True).item()
rawdata, data1, data2, data3, dfci = a["rawdata"], a["data1"], a["data2"], a["data3"], a["dfci"]
minutes = (rawdata["end"].values.max()-rawdata["start"].values.min())/60

print("####################################################")
print("##### Put Sentence-level Sentiment and Emotion #####")
print("####################################################")

tt_classes = ["Sentiment", "Emotion"]
dirname = feature_dir + "classification_results/"
keys = ["_".join(os.path.basename(a[:-4]).split("_")[1:3]) for a in glob.glob(dirname+"classificationresults_*.npy")]
keys.sort()

udfst = data1.copy()
udfst["Sentence-Emotion"] = ""
udfst["Sentence-Sentiment"] = np.nan
for key in keys:
    # Obtain the prediction results
    path = dirname + f"classificationresults_{key}.npy"
    hey = np.load(path, allow_pickle=True).item()
    result_te, result_ts = hey["result_te"], hey["result_ts"]
    a = [a["label"] for a in result_te[0]]
    b = [a["score"] for a in result_te[0]]
    result_te = a[np.argmax(b)]
    result_ts = result_ts[0]["label"]

    # Insert the information to udfst
    startllm, endllm = [int(a) for a in key.split("_")]
    try:
        start = np.arange(len(udfst))[np.abs(rawdata.iloc[startllm]["start"]-udfst["start"])<1e-5]
        end = np.arange(len(udfst))[np.abs(rawdata.iloc[endllm]["end"]-udfst["end"])<1e-5]
        idx = list(set(list([udfst.iloc[a].name for a in start])) & set(list([udfst.iloc[a].name for a in end])))[0]
        udfst.loc[idx, "Sentence-Emotion"] = result_te
        udfst.loc[idx, "Sentence-Sentiment"] = score_dir["Sentiment"][result_ts]
        if not(sentiment_score):
            udfst.loc[udfst["Sentence-Sentiment"]==10, "Sentence-Sentiment"] = "positive"
            udfst.loc[udfst["Sentence-Sentiment"]==0, "Sentence-Sentiment"] = "neutral"
            udfst.loc[udfst["Sentence-Sentiment"]==-10, "Sentence-Sentiment"] = "negative"
    except IndexError:
        continue
        
print("##################################################")
print("##### Put Backchannel Classification Results #####")
print("##################################################")

dfbc = data2[data2["BC-Candidates"]]
# model = "gpt-4o"
model = "gemma3:12b"

addname = "" if model=="" else "_"+model.split("-")[-1].replace(":", "_")

dirname = feature_dir + f'LLM_responses{addname}/' + os.path.basename(audiopath)[:-4] + "/"
tt_classes = ["interjection type"]
keys = ["_".join(os.path.basename(a).split("_")[1:3]) for a in glob.glob(dirname+"backchannel_*_0.npy")]
keys.sort()

udfbc = dfbc.copy()
udfbc["BC-Labels"] = ""
for cl in tt_classes[1:]:
    udfbc["BC-"+cl] = ""
for key in keys:
    # Obtain the prediction results
    paths = glob.glob(dirname+f"backchannel_{key}_*.npy")
    results = {cl.lower(): [] for cl in tt_classes}
    for path in paths:
        a = np.load(path, allow_pickle=True).item()

        for cl in tt_classes:
            exist = cl in a
            if not(exist):
                cl = cl.lower()
                exist = cl in a
            if exist:
                try:
                    results[cl] += [a[cl].lower()]
                except AttributeError:
                    results[cl] += [key.lower() for key in a[cl]]

    if len(results[tt_classes[0]])==0:
        continue

    summary = {}
    for freq_key in [a.lower() for a in tt_classes[:2]]:
        summary[freq_key] = most_frequent(results[freq_key])
    for score_key in [a.lower() for a in tt_classes[2:]]:
        summary[score_key] = np.mean([score_dir[score_key[0].upper()+score_key[1:]][a] for a in results[score_key]])

    # Insert the information to udfbc
    startllm, endllm = [int(a) for a in key.split("_")]
    try:
        start = np.arange(len(udfbc))[np.abs(rawdata.iloc[startllm]["start"]-udfbc["start"])<1e-5][0]
        idx = udfbc.iloc[start].name
        udfbc.loc[idx, "BC-Labels"] = summary["interjection type"]
        for cl in tt_classes[1:]:
            udfbc.loc[idx, "BC-"+cl] = summary[cl.lower()]
    except IndexError:
        continue
        
print("##############################################")
print("### Put Sentence Labels on Each Text Token ###")
print("##############################################")

sldata = rawdata.copy()
sldata["Sentence Label"] = ""
sldata_dir = {
    "A": sldata[get_bool_base_on_conditions(sldata, {"speaker": ["A"]})],
    "B": sldata[get_bool_base_on_conditions(sldata, {"speaker": ["B"]})],
}

slidx = list(sldata.columns).index("Sentence Label")
# Interjections (backchanneling)
df = udfbc[get_bool_base_on_conditions(udfbc, {"BC-Labels": ["cognitive", "emotive", "not backchannel"]})]
for i in range(len(df)):
    array = df.iloc[i]
    speaker = array["speaker"]
    start, end = get_start_end_referencedf(sldata_dir[speaker], array)
    sldata_dir[speaker].iloc[start:end+1, slidx] = np.array(["interjection-"+array["BC-Labels"]+"-"+str(i) for i in range(end-start+1)])

if type(dfci)!=type(None):
    # Interjections (controlling)
    df = dfci.copy()
    for i in range(len(df)):
        array = df.iloc[i]
        speaker = array["speaker"]
        start, end = get_start_end_referencedf(sldata_dir[speaker], array)
        sldata_dir[speaker].iloc[start:end+1, slidx] = np.array(["interjection-controlling-"+str(i) for i in range(end-start+1)])

# Turn-taking
df = udfst.copy()
for i in range(len(df)):
    array = df.iloc[i]
    speaker = array["speaker"]
    start, end = get_start_end_referencedf(sldata_dir[speaker], array)
    sl_list = sldata_dir[speaker].iloc[start:end+1, slidx].values
    start_list = [0] if sldata_dir[speaker].iloc[start, slidx]=="" else []
    start_list += [idx+2 for idx in range(len(sl_list[1:])) if sl_list[idx+1]!=""]
    for j in range(len(start_list)):
        sentence_start = start_list[j]
        if sentence_start>=len(sl_list):
            break
        try:
            sentence_end = start_list[j+1]-1
        except IndexError:
            sentence_end = len(sl_list)
        # sldata_dir[speaker].iloc[start+sentence_start:start+sentence_end, slidx] = np.array(["turntaking-"+array["Sentence-Labels"]+"-"+str(i) for i in range(sentence_end-sentence_start)])
        sldata_dir[speaker].iloc[start+sentence_start:start+sentence_end, slidx] = np.array(["turntaking-"+str(i) for i in range(sentence_end-sentence_start)])

sldata = pd.concat([sldata_dir["A"], sldata_dir["B"]], axis=0).loc[np.arange(len(sldata))]
        
print("#####################################")
print("##### Obtain Speaker Attributes #####")
print("#####################################")

arrays = []
df = udfst.copy()
nturns = {speaker: get_bool_base_on_conditions(df, {"speaker": [speaker]}).sum() for speaker in ["A", "B"]}
for speaker in ["A", "B"]:
    dfspk = data2[get_bool_base_on_conditions(data2, {"speaker":[speaker]})]
    rawspk = sldata[get_bool_base_on_conditions(sldata, {"speaker": [speaker]})]
    dflaugh = sldata[get_bool_base_on_conditions(sldata, {"speaker": [speaker], "transcription":["[Laugh]", "[StartLaugh]"]})]
    stspk = udfst[get_bool_base_on_conditions(udfst, {"speaker": [speaker]})]
    bcspk = udfbc[get_bool_base_on_conditions(udfbc, {"speaker": [speaker]})]
    if type(dfci)!=type(None):
        cispk = dfci[get_bool_base_on_conditions(dfci, {"speaker": [speaker]})]
    else:
        cispk = []

    # vocab diversity
    vocabfull = list(rawspk.transcription)
    vocab = len(set(vocabfull))
    dur = dfspk.duration.values.sum()/60
    n = vocab/dur

    # average duration per response
    df1spk = data1[get_bool_base_on_conditions(data1, {"speaker":[speaker]})]
    dur1 = df1spk.duration.values.sum()
    avgdur = dur1/nturns[speaker]

    # average Laugh
    avglaugh = len(dflaugh)/dfspk.duration.values.sum()*60

    temdf = stspk[stspk["duration"]>=thres_being_bc].copy()
    classes = ["Sentence-Emotion"] # Both turn-taking and interjections
    if not(sentiment_score):
        classes += ["Sentence-Sentiment"]
    counts = {cl: {label: 0 for label in labels[cl]} for cl in classes}
    for cl in classes:
        for dfref in [temdf]:
            df = dfref.loc[:, [cl, "speaker"]].groupby([cl]).count()
            for key in df.index:
                if not(key in labels[cl]):
                    continue
                counts[cl][key] += df.loc[key][0]

    for cl in counts:
        total = np.array(list(counts[cl].values())).sum()
        for key in counts[cl]:
            counts[cl][key] = counts[cl][key]/total

    ### Get the scores for other text features
    if not(sentiment_score):
        df = temdf.loc[:, ["Sentence-Sentiment"]]
        df[df==""] = np.nan
        if pd.isna(df).values.mean()==0:
            scores = dict(df.mean())

    stdiffspk = udfst[get_bool_base_on_conditions(udfst, {"speaker": list(set(["A", "B"])-set([speaker]))})]
    stdiffspk = stdiffspk[stdiffspk["duration"]>=thres_duration_tobe_bc]
    total = stdiffspk["duration"].sum()/60
    avgemobc = (get_bool_base_on_conditions(bcspk, {"BC-Labels": ["emotive"]}).sum())/total
    avgcogbc = (get_bool_base_on_conditions(bcspk, {"BC-Labels": ["cognitive"]}).sum())/total
    avgallbc = avgemobc + avgcogbc
    ratiobc = [avgemobc/avgallbc, avgcogbc/avgallbc]
    nci = len(cispk) + (udfbc["BC-Labels"]=="not backchannel").sum() # number of controlling interjections (successful and unsuccessful)

    array = [
        os.path.basename(audiopath)[:-4], speaker, nturns[speaker]/minutes, dur, vocab, vocab/len(vocabfull), avgdur, avglaugh, avgemobc, avgcogbc, avgallbc, nci/minutes,
    ]
    for key in counts:
        for cl in counts[key]:
            array += [counts[key][cl]]
    for key in scores:
        array += [scores[key]]
    arrays += [array]

data = pd.DataFrame(np.array(arrays), columns=columns)
data.loc[:, columns[2:]] = data.loc[:, columns[2:]].values.astype(float)
data = data.loc[:, [("basics", cl) for cl in ["filename", "speaker"]+list(mean_dir.keys())]+list(data[["emotion"]].columns)+list(data[["sentiment"]].columns)]

print("######################################")
print("##### Analyze Speaker Attributes #####")
print("######################################")

dvcolumns = []
for cl in mean_dir:
    dvcolumns += [("basics", cl)]
dvcolumns = pd.MultiIndex.from_tuples(dvcolumns)

cl2name = {
    "n_turns": "Number of turns", # Per minute
    "duration": "Total talking time",
    "n_newwords": "Number of new words used in the conversation",
    "ratio_newwords": "Frequency of new words usage",
    "avg_duration": "Talking time per turn",
    "avg_laughter": "Frequency of Laughter",
    "avg_bc": "Frequency of Backchannels",
    "avg_emotive": "Frequency of Emotive Backchannel",
    "avg_cognitive": "Frequency of Cognitive Backchannel",
    "n_ci": "Frequency of interjections" # Per minute
}
eval2step = {
    "samples": "Summarize the sample responses.",
    "basics": "Summarize the basic statistics.",
    "emotion": "Summarize the emotion distribution.",
    "sentiment": "Summarize the sentiment scores",
}

# add = "scores" if sentiment_score else "sentiment"
# averages = {cl: data[cl].mean() for cl in ["emotion", add]}

ratio1 = 0.8
ratio2 = 1.5*ratio1

status = data.copy()
status.loc[:, dvcolumns] = "Normal"
for cl in dvcolumns:
    if cl[1] in ["duration", "avg_duration"]:
        high_text, low_text = "Long", "Short"
        high_emp, low_emp = "Very", "Very"
    elif cl[1] in ["n_turns", "n_newwords"]:
        high_text, low_text = "Many", "Few"
        high_emp, low_emp = "So", "Very"
    elif cl[1] in ["ratio_newwords", "avg_laughter", "avg_emotive", "avg_cognitive", "avg_bc", "n_ci"]:
        high_text, low_text = "Frequent", "Rare"
        high_emp, low_emp = "Very", "Very"
    else:
        assert False
    values = data[cl].values
    mean = mean_dir[cl[1]]
    iqr = iqr_dir[cl[1]]
    # print(cl, mean, iqr)
    status.loc[values>mean+ratio1*iqr, cl] = high_text
    status.loc[values>mean+ratio2*iqr, cl] = "Very " + high_text
    status.loc[values<mean-ratio1*iqr, cl] = low_text
    status.loc[values<mean-ratio2*iqr, cl] = "Very " + low_text
status

#######################
##### Preparation #####
#######################
####################################################
##### Put Sentence-level Sentiment and Emotion #####
####################################################
##################################################
##### Put Backchannel Classification Results #####
##################################################
##############################################
### Put Sentence Labels on Each Text Token ###
##############################################
#####################################
##### Obtain Speaker Attributes #####
#####################################
######################################
##### Analyze Speaker Attributes #####
######################################


Unnamed: 0_level_0,basics,basics,basics,basics,basics,basics,basics,basics,emotion,emotion,emotion,emotion,emotion,emotion,emotion,sentiment,sentiment,sentiment
Unnamed: 0_level_1,filename,speaker,n_turns,avg_duration,avg_laughter,avg_emotive,avg_cognitive,n_ci,anger,disgust,fear,joy,neutral,sadness,surprise,positive,neutral,negative
0,sample,A,Many,Normal,Normal,Very Frequent,Rare,Normal,0.0,0.1,0.0,0.1,0.4,0.1,0.3,0.1,0.8,0.1
1,sample,B,Normal,Short,Normal,Normal,Very Frequent,Normal,0.0,0.0,0.0,0.0,0.444444,0.444444,0.111111,0.222222,0.444444,0.333333


---
# Personality Prediction

In this step, we feed the analyzed speaker attributes into GPT models to predict each speaker’s personality profile. The prompt includes:

- `samples`: Example utterances from the speaker  
- `emotion`: Distribution of emotion labels  
- `sentiment`: Distribution of sentiment scores  
- `basics`: Behavioral attributes from the previous step  

Please adjust the following variables as needed:

- `repeatnum`: An integer specifying how many times to query the LLM for each prediction (higher → more reliable).  
- `model_list`: A list of GPT model names (e.g., `["gpt-4o", "gpt-4.1"]`). To add more models, edit `gpt_api_no_stream` in `./sho_util/pyfiles/gpt.py`.  
- `orders`: A list defining which attribute categories to include in the prompt (e.g., `["samples", "emotion", "sentiment", "basics"]`).  

---

In [3]:
###########################################
########## Adjustable Parameters ##########
###########################################

repeatnum = 10
model_list = ["gpt-4o", "gpt-4.1", "gpt-4.1mini", "gpt-o4mini"]
orders = ["samples", "emotion", "sentiment", "basics"] # "emotion", "basics", "labels", "samples"

###########################################
###########################################
###########################################

get_response = True

print("##################################")
print("##### Personality Prediction #####")
print("##################################")

target_columns = ["n_turns", "avg_duration", "avg_laughter", "avg_emotive", "avg_cognitive", "n_ci"]
if len(orders)>2:
    nsamples = 20
else:
    nsamples = 30

for model in model_list:
    print(model)
    addname = "_"+model.split("-")[-1]
    addname2 = "" if len(orders)>2 else "_withoutemosentiment"
    for idx in range(2): # two speakers
        array = status.iloc[idx]
        spk = array[("basics", "speaker")]

        dirname = feature_dir + f'LLM_responses{addname}{addname2}/' + os.path.basename(audiopath)[:-4] + "/"
        os.makedirs(dirname, exist_ok=True)
        a = glob.glob(dirname + f"personalityprediction_{spk}_*.npy")
        a.sort()
        b = {int(os.path.basename(path).split("_")[-1][:-4]): path for path in a}
        iter_list = list(set(list(range(repeatnum))) - set(list(b.keys())))
        if len(iter_list)==0:
            continue


        df = udfst.copy()
        df = update_information(df) 
        df = concatenate_close_voice(df, np.inf) 
        df = update_information(df) 
        df = df[get_bool_base_on_conditions(df, {"speaker": [array[("basics", "speaker")]]})]
        # df = df[(10.0>=df.duration)*(df.duration>=5.0)]
        df = df[(np.inf>=df.duration)*(df.duration>=2.0)]
        np.random.seed(0)
        df = df.iloc[np.random.choice(np.arange(len(df)), size=min(nsamples, len(df)), replace=False)]

        prompt = get_prompt_character(array, target_columns, orders, cl2name, eval2step, df)
        if get_response:
            for r in iter_list:
                savepath = dirname + f'personalityprediction_{spk}_{r}.npy'
                if os.path.exists(savepath):
                    continue
                response = GetResult_Personality(client, prompt, model, display_print=True)
                np.save(savepath, response)
                
print("#####################################")
print("##### Display Prediction Result #####")
print("#####################################")

allresults = {}
non_defined = []
should_be_deleted = []
orders_list = [["samples", "emotion", "sentiment", "basics"]]
allinfo = ["samples", "basics", "emotion", "sentiment"]
num = len(data)
for orders in orders_list:
    for model in model_list:
        addname = "_"+model.split("-")[-1]
        if len(orders)==4:
            addname2 = ""
        elif len(orders)==2:
            if "emotion" in orders:
                addname2 = "_onlyemosentiment"
            else:
                addname2 = "_withoutemosentiment"
        elif len(orders)==3:
            if "samples" in orders:
                addname2 = "_withoutinterjections"
            else:
                addname2 = "_withoutsamples"
        elif len(orders)==1:
            if "samples" in orders:
                addname2 = "_onlysamples"
            elif "basics" in orders:
                addname2 = "_basics"
        dirname = feature_dir + f'LLM_responses{addname}{addname2}/' + os.path.basename(audiopath)[:-4] + "/"
        
        meandata = data.copy()
        for cl in personality_list:
            meandata[("personality", cl)] = ""
        stddata = meandata.copy()
        for idx in range(num):
            array = data.iloc[idx]
            fn = array[("basics", "filename")]
            spk = array[("basics", "speaker")]
            paths = glob.glob(dirname + f"/personalityprediction_{spk}_*.npy")
            results = {cl: [] for cl in personality_list}
            prediction_num = 0
            for path in paths:
                a = np.load(path, allow_pickle=True).item()
                for cl in personality_list:
                    exist = cl in a
                    if not(exist):
                        cl = cl.lower()
                        exist = cl in a
                    if exist:
                        try:
                            added_key = a[cl].lower()
                            if added_key in score_dir:
                                results[cl] += [added_key]
                            else:
                                non_defined += [added_key]
                        except AttributeError:
                            if type(a[cl])==dict:
                                results[cl] += [key.lower() for key in a[cl].values()]
                            else:
                                try:
                                    results[cl] += [key.lower() for key in a[cl]]
                                except AttributeError:
                                    results[cl] += [key.lower() for key in a[cl][0].values()]
                prediction_num += 1
            if prediction_num==0:
                continue
            for score_key in personality_list:
                results[score_key] = results[score_key][:5]

            mean = {}
            std = {}
            for score_key in personality_list:
                results[score_key] = list(np.array(results[score_key])[np.array(results[score_key])!=""])
            for score_key in personality_list:
                mean[score_key] = np.mean([score_dir[a] for a in results[score_key]])
                std[score_key] = np.std([score_dir[a] for a in results[score_key]])

            meandata.loc[idx, "personality"] = np.array(list(mean.values()))
            stddata.loc[idx, "personality"] = np.array(list(std.values()))
        meandata = meandata[["personality", "basics", "emotion", "sentiment"]]
        meandata = meandata[pd.isna(meandata["personality"]).sum(axis=1)==0]
        meandata = meandata[meandata[("personality", "openness")]!=""]
        stddata = stddata[["personality", "basics", "emotion", "sentiment"]]
        stddata = stddata[pd.isna(stddata["personality"]).sum(axis=1)==0]
        stddata = stddata[stddata[("personality", "openness")]!=""]
        meandata.loc[:, "personality"] = meandata.loc[:, "personality"].values.astype(float)
        stddata.loc[:, "personality"] = stddata.loc[:, "personality"].values.astype(float)
        key = "-".join([str(a in orders) for a in allinfo])
        allresults[model+"_"+"-".join(orders)] = {}
        allresults[model+"_"+"-".join(orders)]["mean"] = meandata
        allresults[model+"_"+"-".join(orders)]["std"] = stddata
        
columns = []
for cl in ["model name"]+allinfo:
    columns += [("condition", cl)]
columns = pd.MultiIndex.from_tuples(columns)

texts = []
df_list = []
for key in allresults:
    df = allresults[key]["mean"].groupby([("basics", "filename"), ("basics", "speaker")]).mean()
    b, c = key.split("_")
    mn = b.split("-")[1]
    c = c.split("-")
    l = []
    for hey in [hey in c for hey in allinfo]:
        l += ["☑︎" if hey else ""]
    texts += [[mn]+l]*2
    df_list += [df]
    
dfs = pd.concat(df_list, axis=0)
dfcon = pd.DataFrame(np.array(texts), columns=columns, index=pd.MultiIndex.from_tuples(list(df.index)*(len(texts)//2)))
dfs.index = dfcon.index
dfresult = pd.concat([dfcon, dfs], axis=1)
dfresult.sort_index()

##################################
##### Personality Prediction #####
##################################
gpt-4o
gpt-4.1
gpt-4.1mini
gpt-o4mini
#####################################
##### Display Prediction Result #####
#####################################


Unnamed: 0_level_0,Unnamed: 1_level_0,condition,condition,condition,condition,condition,personality,personality,personality,personality,personality,basics,basics,basics,basics,basics,basics,emotion,emotion,emotion,emotion,emotion,emotion,emotion,sentiment,sentiment,sentiment
Unnamed: 0_level_1,Unnamed: 1_level_1,model name,samples,basics,emotion,sentiment,openness,conscientiousness,extraversion,agreeableness,neuroticism,n_turns,avg_duration,avg_laughter,avg_emotive,avg_cognitive,n_ci,anger,disgust,fear,joy,neutral,sadness,surprise,positive,neutral,negative
sample,A,4o,☑︎,☑︎,☑︎,☑︎,20.0,-10.0,100.0,90.0,-90.0,10.212766,3.4404,3.479471,2.552764,0.0,0.510638,0.0,0.1,0.0,0.1,0.4,0.1,0.3,0.1,0.8,0.1
sample,A,4.1,☑︎,☑︎,☑︎,☑︎,30.0,-10.0,100.0,70.0,-10.0,10.212766,3.4404,3.479471,2.552764,0.0,0.510638,0.0,0.1,0.0,0.1,0.4,0.1,0.3,0.1,0.8,0.1
sample,A,4.1mini,☑︎,☑︎,☑︎,☑︎,30.0,30.0,90.0,80.0,-50.0,10.212766,3.4404,3.479471,2.552764,0.0,0.510638,0.0,0.1,0.0,0.1,0.4,0.1,0.3,0.1,0.8,0.1
sample,A,o4mini,☑︎,☑︎,☑︎,☑︎,0.0,20.0,90.0,70.0,-90.0,10.212766,3.4404,3.479471,2.552764,0.0,0.510638,0.0,0.1,0.0,0.1,0.4,0.1,0.3,0.1,0.8,0.1
sample,B,4o,☑︎,☑︎,☑︎,☑︎,-30.0,-20.0,0.0,40.0,60.0,8.680851,1.585882,3.773815,0.920726,5.524353,0.510638,0.0,0.0,0.0,0.0,0.444444,0.444444,0.111111,0.222222,0.444444,0.333333
sample,B,4.1,☑︎,☑︎,☑︎,☑︎,-40.0,20.0,0.0,50.0,50.0,8.680851,1.585882,3.773815,0.920726,5.524353,0.510638,0.0,0.0,0.0,0.0,0.444444,0.444444,0.111111,0.222222,0.444444,0.333333
sample,B,4.1mini,☑︎,☑︎,☑︎,☑︎,-50.0,10.0,30.0,40.0,40.0,8.680851,1.585882,3.773815,0.920726,5.524353,0.510638,0.0,0.0,0.0,0.0,0.444444,0.444444,0.111111,0.222222,0.444444,0.333333
sample,B,o4mini,☑︎,☑︎,☑︎,☑︎,-40.0,-20.0,-40.0,50.0,50.0,8.680851,1.585882,3.773815,0.920726,5.524353,0.510638,0.0,0.0,0.0,0.0,0.444444,0.444444,0.111111,0.222222,0.444444,0.333333
