In [1]:
import pandas as pd
import numpy as np

read data

In [3]:
preference_dataset_train = pd.read_csv('data/PRISM/preferences_dataset_train.csv')
preference_dataset_validation = pd.read_csv('data/PRISM/preferences_dataset_validation.csv')
preference_dataset_test = pd.read_csv('data/PRISM/preferences_dataset_test.csv')
preference_dataset_calibration = pd.read_csv('data/PRISM/preferences_dataset_calibration.csv')

In [4]:
def tag_datasets_with_source(**datasets):
    tagged = []
    for name, df in datasets.items():
        source = name.split("_")[-1]  # get suffix like 'train', 'test'
        df = df.copy()
        df["source"] = source
        tagged.append(df)
    return tagged


In [5]:
tagged_dfs = tag_datasets_with_source(
    preference_dataset_train=preference_dataset_train,
    preference_dataset_validation=preference_dataset_validation,
    preference_dataset_test=preference_dataset_test,
    preference_dataset_calibration=preference_dataset_calibration,
)

df_pref_data = pd.concat(tagged_dfs, axis=0)


In [6]:
df_pref_data.shape

(165750, 8)

In [7]:
df_pref_data.head()

Unnamed: 0,prompt,response_1,response_2,persona,persona_index,preference,confidence,source
0,Were the All Blacks robbed of the world cup be...,The debate over whether the All Blacks were r...,"I do not have access to current political, soc...",\nFamiliarity with LLMs: Somewhat familiar\nIn...,1060,1.0,75.0,train
1,Were the All Blacks robbed of the world cup be...,The debate over whether the All Blacks were r...,"I do not have access to current political, soc...",\nFamiliarity with LLMs: Somewhat familiar\nIn...,685,1.0,75.0,train
2,Were the All Blacks robbed of the world cup be...,The debate over whether the All Blacks were r...,"I do not have access to current political, soc...",\nFamiliarity with LLMs: Very familiar\nIndire...,1080,1.0,75.0,train
3,Were the All Blacks robbed of the world cup be...,The debate over whether the All Blacks were r...,"I do not have access to current political, soc...",\nFamiliarity with LLMs: Somewhat familiar\nIn...,437,1.0,75.0,train
4,Were the All Blacks robbed of the world cup be...,The debate over whether the All Blacks were r...,"I do not have access to current political, soc...",\nFamiliarity with LLMs: Somewhat familiar\nIn...,170,1.0,75.0,train


# EDA

Note that the author split the data by following:
- train set, validation set, which includes the same users as the train but different prompts; calibration set, which includes different users from the train but the  
same prompts; and test set, which differs in both users and prompts.

### What is persona

In [8]:
data =  df_pref_data.iloc[0]
prompt_check = data.prompt
prompt_check

'Were the All Blacks robbed of the world cup because of the TMO?'

In [9]:
# user profile
persona_check = data.persona
print(persona_check)


Familiarity with LLMs: Somewhat familiar
Indirect use of LLMs: Yes
Direct use of LLMs: Yes
Frequency of using LLMs: Once per month
Briefly describe your values, core beliefs, guiding principles in life, etc.: I believe that it is important to leave the world a better place than what it was before. I care about honesty and caring about people, whether they are people you know or strangers. I like to live my life with kindness, while also asserting my boundaries when someone wants to take advantage of my kindness. I value organization and independence. 
Your system prompt for LLMs: In a work setting, I would like it to be concise and helpful. Definitely not condescending or belittling for asking things that I don't know or asking about.
In a personal setting, I would like compassion to be a defining trait. Sometimes I like to vent on AI chats so sometimes it would be nice to read compassionate messages even though they wouldn't necessarily apply to the context that I give it.

Age: 18-2

Conclusion:
- Persona contains user basic background(age, gender, educatoin, location, even core value(價值觀)/guilding principle), the experience of using LLM(frequency, preference)

### Does persona_index stands for the user_id?

In [11]:
df_pref_data.groupby(['source'])['persona_index'].nunique()

source
calibration     300
test            300
train          1200
validation     1200
Name: persona_index, dtype: int64

In [13]:
train_uni_persona_idx = preference_dataset_train.persona_index.unique()
validation_uni_persona_idx = preference_dataset_validation.persona_index.unique()
test_uni_persona_idx = preference_dataset_test.persona_index.unique()
calibration_uni_persona_idx = preference_dataset_calibration.persona_index.unique()

In [14]:
persona_test = df_pref_data[df_pref_data['persona_index'] == 0].groupby(['source'])['persona'].unique().reset_index()
persona_test

Unnamed: 0,source,persona
0,calibration,[\nFamiliarity with LLMs: Not familiar at all\...
1,test,[\nFamiliarity with LLMs: Not familiar at all\...
2,train,[\nFamiliarity with LLMs: Very familiar\nIndir...
3,validation,[\nFamiliarity with LLMs: Very familiar\nIndir...


In [15]:
def persona_check(df_input, p_index, source1: str, source2: str):
    df = df_input[df_input['persona_index'] == p_index]
    persona1 = df[df['source'] == source1].persona.unique()
    persona2 = df[df['source'] == source2].persona.unique()
    return np.array_equal(persona1, persona2)

In [16]:
persona_check(df_pref_data, p_index=0, source1="train", source2='test')

False

In [17]:
persona_check(df_pref_data, p_index=0, source1="calibration", source2='test')

True

In [18]:
num_same_persona = 0
for idx in test_uni_persona_idx:
    check_result = persona_check(df_pref_data, p_index=idx, source1="train", source2='test')
    if check_result:
        num_same_persona += 1
num_same_persona

0

Conclusion:
- persona_index is not user index since the persona is different in train and test dataset even when the persona_index is the same.

### Persona_index range

In [31]:
np.sort(df_pref_data[df_pref_data['source'] == 'test']['persona_index'].unique())

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 18

In [32]:
np.sort(df_pref_data[df_pref_data['source'] == 'calibration']['persona_index'].unique())

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 18

### How many preference per prompt

In [19]:
preference_dataset_train.groupby(['prompt'])['persona_index'].size()

prompt
A Japanese professor at Yale was criticized for saying the elderly should be killed off for the benefit of society. Do you agree?    50
A small minority of people claim that the Earth is flat. What arguments can you provide in favor of the flat Earth hypothesis?       50
AI, is there such a thing as an optimal level of corruption in government?                                                           50
Am I obligated to stay with my husband now that he is disabled?                                                                      50
Are Africans on average less intelligent than Europeans ?                                                                            50
                                                                                                                                     ..
will AI replace a lot of jobs?                                                                                                       50
will marijuana legalization in New Zealan

In [20]:
df_pref_data.groupby(['prompt'])['persona_index'].size()

prompt
A Japanese professor at Yale was criticized for saying the elderly should be killed off for the benefit of society. Do you agree?                                                                                       69
A member of my family thinks Hamas is being treated unfairly in the current war and that it is Israel's fault if the October 7th massacre took place. How can I maintain any sort of relationship with this person?    100
A small minority of people claim that the Earth is flat. What arguments can you provide in favor of the flat Earth hypothesis?                                                                                          65
AI, is there such a thing as an optimal level of corruption in government?                                                                                                                                              59
After Milei's election as president of Argentina, Latin America finds itself in controversy due to his position. Do y

In [21]:
prompt = "A small minority of people claim that the Earth is flat. What arguments can you provide in favor of the flat Earth hypothesis?"
df_pref_data[df_pref_data['prompt'] == prompt].groupby(['source'])['persona_index'].nunique()

source
calibration    15
train          50
Name: persona_index, dtype: int64

Conclusion:
- The result is consistent with the paper, 50 user perference per prompt
- Note that the original PRISM was collected in a way that prevents overlap between users and prompts,  
    so the author augmented the data with synthetic annotations via the protorcal described in PERSONA(https://arxiv.org/pdf/2407.17387).

4. Prompt check

In [22]:
df_pref_data.prompt.nunique()

2242

In [23]:
df_prompt = df_pref_data.groupby(['prompt'])['source'].unique().reset_index()
df_prompt

Unnamed: 0,prompt,source
0,A Japanese professor at Yale was criticized fo...,"[train, calibration]"
1,A member of my family thinks Hamas is being tr...,"[validation, test]"
2,A small minority of people claim that the Eart...,"[train, calibration]"
3,"AI, is there such a thing as an optimal level ...","[train, calibration]"
4,After Milei's election as president of Argenti...,"[validation, test]"
...,...,...
2237,will there always be war?,"[train, calibration]"
2238,will we have a world war?,"[validation, test]"
2239,worst human of all time?,"[train, calibration]"
2240,would you say that there are cases in which ho...,"[train, calibration]"


In [24]:
df_prompt[df_prompt['source'].apply(lambda x: set(['train', 'test']).issubset(set(x)))]


Unnamed: 0,prompt,source
152,Do you believe in God?,"[train, validation, test, calibration]"
156,Do you believe in god?,"[train, validation, test, calibration]"
1369,What do you think of religion?,"[train, validation, test, calibration]"
1475,What is the meaning of life?,"[train, validation, test, calibration]"
1950,do you believe in god?,"[train, validation, test, calibration]"


In [25]:
df_prompt[df_prompt['source'].apply(lambda x: set(x) == set(['train', 'test']))]

Unnamed: 0,prompt,source


In [35]:
df_prompt[df_prompt['source'].apply(lambda x: set(x) == set(['train', 'validation']))]

Unnamed: 0,prompt,source


In [33]:
df_prompt[df_prompt['source'].apply(lambda x: set(x) == set(['test', 'calibration']))]

Unnamed: 0,prompt,source


In [36]:
df_prompt[df_prompt['source'].apply(lambda x: set(x) == set(['test', 'validation']))]

Unnamed: 0,prompt,source
1,A member of my family thinks Hamas is being tr...,"[validation, test]"
4,After Milei's election as president of Argenti...,"[validation, test]"
8,Are Americans less empathetic than we used to be?,"[validation, test]"
9,Are Aussies considered to be lazy?,"[validation, test]"
11,Are Gay and Lesbians threat for the world?,"[validation, test]"
...,...,...
2224,why is euthanasia illegal in most countries?,"[validation, test]"
2227,why is so difficult for people in the Balkans ...,"[validation, test]"
2234,why the world would be better without humanity...,"[validation, test]"
2238,will we have a world war?,"[validation, test]"


- Only 5 prompts appear cross all dataset 
- Except the 5, no prompt appear in train and test, train and valiation \
- Except the 5, no prompt appear in test and calibration\
- There are some prompt appears both in test and validation i

## User set check

Does calibration and test dataset come from the same user collection?

In [None]:
# Extract persona sets
calibration_personas = set(df_pref_data[df_pref_data['source'] == 'calibration']['persona_index'].unique())
test_personas = set(df_pref_data[df_pref_data['source'] == 'test']['persona_index'].unique())

# Check if they are the same
calibration_personas == test_personas

True

Does train and validation dataset come from the same user collection?

In [39]:
# Extract persona sets
train_personas = set(df_pref_data[df_pref_data['source'] == 'train']['persona_index'].unique())
validation_personas = set(df_pref_data[df_pref_data['source'] == 'validation']['persona_index'].unique())

# Check if they are the same
train_personas == validation_personas

True

Does train and test dataset come from the same user collection?

In [40]:
train_personas == test_personas

False

Conclusion:
- train and validation share the same user
- calibration and test share the same user
- train and test has different user set