## In the code below we perform the following steps

- Read in the item text data (https://chatgpt.com/share/66fa7c2a-101c-800b-88a5-7334934a995d)
- Calculate item embeddings
- Reverse item embeddings if necessary (we don't have reversed items here, but this approach may be unoptimal. In case of reversed items we could use fine-tuned model as in Hommell (2024))
- Compute cosine similarities
- Store results

### 1- Read in the item text data

In [1]:
import pandas as pd
import numpy as np
# read in file with items text etc.
df_items = pd.read_csv('./Data/dass_21_items_mod_2.csv')
df_items.head()

Unnamed: 0,Number,Factor,Item,Item_simp,Item_mod,Sign
0,1,Depression,I couldn't seem to experience any positive fee...,when feeling depressed I Couldn't seem to expe...,Depression is characterised principally by a l...,+
1,2,Depression,I found it difficult to work up the initiative...,when feeling depressed I Found it difficult to...,Depression is characterised principally by a l...,+
2,3,Depression,I felt that I had nothing to look forward to.,when feeling depressed I Felt that I had nothi...,Depression is characterised principally by a l...,+
3,4,Depression,I felt down-hearted and blue.,when feeling depressed I Felt down-hearted and...,Depression is characterised principally by a l...,+
4,5,Depression,I was unable to become enthusiastic about anyt...,when feeling depressed I was Unable to become ...,Depression is characterised principally by a l...,+


### 2- Calculate embeddings (and reverse code if necessary)

In [2]:
# First we create a list of models (all multilinguals here)
models = ['nli-distilroberta-base-v2',
          'paraphrase-multilingual-mpnet-base-v2',
          'paraphrase-multilingual-MiniLM-L12-v2',
          'intfloat/multilingual-e5-base',
          'LaBSE'] #consider adding the finetuned model for psicometrista

# Import the necessary libraries and functions
from sentence_transformers import SentenceTransformer, util

# Create an empty data frame, which we will then populate with the different type of embeddings
facet_embeddings_sentences = pd.DataFrame()

for mod in models:
    model = SentenceTransformer(mod) #call the model
    item_embed = [] #create list for item-level embed
    item_embed_rev = [] #create list for item-level embed accounting for sign
    for item in range(0,len(df_items['Number'])): #loop over all the items
    #encode items
        item_embed.append(model.encode(df_items['Item_mod'].iloc[item]))
        if df_items['Sign'].iloc[item][0] == '-': #if items is negatively keyed, reverse the embeddings
            item_embed_rev.append(model.encode(df_items['Item_mod'].iloc[item])*-1)
        else:
            item_embed_rev.append(model.encode(df_items['Item_mod'].iloc[item]))
    df_items[mod + '_embeddings'] = item_embed #then, we append the two item-level embeddings list and give them a name based on the model we used
    df_items[mod + '_embeddings_rev'] = item_embed_rev

  from tqdm.autonotebook import tqdm, trange


model.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

In [5]:
df_items

Unnamed: 0,Number,Factor,Item,Item_simp,Item_mod,Sign,nli-distilroberta-base-v2_embeddings,nli-distilroberta-base-v2_embeddings_rev,paraphrase-multilingual-mpnet-base-v2_embeddings,paraphrase-multilingual-mpnet-base-v2_embeddings_rev,paraphrase-multilingual-MiniLM-L12-v2_embeddings,paraphrase-multilingual-MiniLM-L12-v2_embeddings_rev,intfloat/multilingual-e5-base_embeddings,intfloat/multilingual-e5-base_embeddings_rev,LaBSE_embeddings,LaBSE_embeddings_rev
0,1,Depression,I couldn't seem to experience any positive fee...,when feeling depressed I Couldn't seem to expe...,Depression is characterised principally by a l...,+,"[-0.59129614, 0.14790781, 0.2387956, 0.391238,...","[-0.59129614, 0.14790781, 0.2387956, 0.391238,...","[-0.11339817, 0.2065327, -0.0093202945, -0.048...","[-0.11339817, 0.2065327, -0.0093202945, -0.048...","[-0.004934678, -0.078225255, 0.10577597, 0.353...","[-0.004934678, -0.078225255, 0.10577597, 0.353...","[-0.006457227, 0.054376926, -0.014666253, 0.01...","[-0.006457227, 0.054376926, -0.014666253, 0.01...","[-0.059610132, 0.03526358, -0.04124071, -0.036...","[-0.059610132, 0.03526358, -0.04124071, -0.036..."
1,2,Depression,I found it difficult to work up the initiative...,when feeling depressed I Found it difficult to...,Depression is characterised principally by a l...,+,"[-0.48975688, 0.078090005, 0.39947817, 0.75822...","[-0.48975688, 0.078090005, 0.39947817, 0.75822...","[-0.049589682, 0.15710999, -0.0064599286, -0.0...","[-0.049589682, 0.15710999, -0.0064599286, -0.0...","[0.034562506, 0.05709009, 0.04878727, 0.357426...","[0.034562506, 0.05709009, 0.04878727, 0.357426...","[-0.010948448, 0.056557536, -0.0057068355, 0.0...","[-0.010948448, 0.056557536, -0.0057068355, 0.0...","[-0.049506135, 0.017580919, -0.01812656, -0.05...","[-0.049506135, 0.017580919, -0.01812656, -0.05..."
2,3,Depression,I felt that I had nothing to look forward to.,when feeling depressed I Felt that I had nothi...,Depression is characterised principally by a l...,+,"[-0.6578825, 0.07359761, 0.11234485, 0.3634547...","[-0.6578825, 0.07359761, 0.11234485, 0.3634547...","[-0.09742633, 0.2089626, -0.009370378, -0.0858...","[-0.09742633, 0.2089626, -0.009370378, -0.0858...","[-0.030781953, 0.020666908, 0.10233986, 0.3430...","[-0.030781953, 0.020666908, 0.10233986, 0.3430...","[-0.013531556, 0.062577516, -0.0071180756, 0.0...","[-0.013531556, 0.062577516, -0.0071180756, 0.0...","[-0.05181289, 0.023035247, -0.034173265, -0.04...","[-0.05181289, 0.023035247, -0.034173265, -0.04..."
3,4,Depression,I felt down-hearted and blue.,when feeling depressed I Felt down-hearted and...,Depression is characterised principally by a l...,+,"[-0.5270264, 0.32158655, 0.07868067, 0.591353,...","[-0.5270264, 0.32158655, 0.07868067, 0.591353,...","[-0.12377266, 0.09526069, -0.009377572, -0.071...","[-0.12377266, 0.09526069, -0.009377572, -0.071...","[0.05855624, 0.04589368, 0.008427292, 0.435617...","[0.05855624, 0.04589368, 0.008427292, 0.435617...","[0.00046947206, 0.05491685, -0.0062893643, 0.0...","[0.00046947206, 0.05491685, -0.0062893643, 0.0...","[-0.062714405, 0.0328216, -0.026738133, -0.053...","[-0.062714405, 0.0328216, -0.026738133, -0.053..."
4,5,Depression,I was unable to become enthusiastic about anyt...,when feeling depressed I was Unable to become ...,Depression is characterised principally by a l...,+,"[-0.22317879, -0.09874177, 0.15772057, 0.28280...","[-0.22317879, -0.09874177, 0.15772057, 0.28280...","[-0.13284642, 0.19245203, -0.00826132, -0.0756...","[-0.13284642, 0.19245203, -0.00826132, -0.0756...","[0.12185396, 0.053762186, 0.06442843, 0.372469...","[0.12185396, 0.053762186, 0.06442843, 0.372469...","[0.0020632262, 0.055967305, -0.007361045, 0.02...","[0.0020632262, 0.055967305, -0.007361045, 0.02...","[-0.05564849, 0.015690329, -0.03849111, -0.046...","[-0.05564849, 0.015690329, -0.03849111, -0.046..."
5,6,Depression,I felt I wasn't worth much as a person.,when feeling depressed I Felt I wasn't worth m...,Depression is characterised principally by a l...,+,"[-0.56089735, 0.18738821, 0.018788807, 0.40086...","[-0.56089735, 0.18738821, 0.018788807, 0.40086...","[-0.0743325, 0.11809171, -0.009042634, -0.0076...","[-0.0743325, 0.11809171, -0.009042634, -0.0076...","[0.07714252, 0.10064267, 0.027455876, 0.292942...","[0.07714252, 0.10064267, 0.027455876, 0.292942...","[-0.0016914401, 0.053250335, -0.008374933, 0.0...","[-0.0016914401, 0.053250335, -0.008374933, 0.0...","[-0.05692354, 0.02230686, -0.038568024, -0.048...","[-0.05692354, 0.02230686, -0.038568024, -0.048..."
6,7,Depression,I felt that life was meaningless.,when feeling depressed I Felt that life was me...,Depression is characterised principally by a l...,+,"[-0.6420574, 0.41412872, 0.048758984, 0.536828...","[-0.6420574, 0.41412872, 0.048758984, 0.536828...","[-0.057775594, 0.15011202, -0.010155034, -0.05...","[-0.057775594, 0.15011202, -0.010155034, -0.05...","[-0.014743155, 0.031444214, 0.07128177, 0.3435...","[-0.014743155, 0.031444214, 0.07128177, 0.3435...","[-0.0015579559, 0.053133342, -0.005263186, 0.0...","[-0.0015579559, 0.053133342, -0.005263186, 0.0...","[-0.058416516, 0.013483906, -0.025171958, -0.0...","[-0.058416516, 0.013483906, -0.025171958, -0.0..."
7,8,Anxiety,I was aware of dryness of my mouth.,when feeling anxious I Aware of dryness of my ...,Anxiety is a relatively enduring state of anxi...,+,"[0.14491464, -0.07425076, 0.42837176, 0.090149...","[0.14491464, -0.07425076, 0.42837176, 0.090149...","[-0.026776224, -0.14144194, -0.006944206, -0.1...","[-0.026776224, -0.14144194, -0.006944206, -0.1...","[0.056603447, -0.13646995, -0.20966533, 0.3277...","[0.056603447, -0.13646995, -0.20966533, 0.3277...","[0.0022844165, 0.041721098, 0.0005980095, 0.02...","[0.0022844165, 0.041721098, 0.0005980095, 0.02...","[0.007018052, 0.024268072, -0.045407318, -0.06...","[0.007018052, 0.024268072, -0.045407318, -0.06..."
8,9,Anxiety,"I experienced breathing difficulty (e.g., exce...",when feeling anxious I experienced breathing d...,Anxiety is a relatively enduring state of anxi...,+,"[0.20752575, 0.27778625, 0.72708166, 0.2310034...","[0.20752575, 0.27778625, 0.72708166, 0.2310034...","[-0.10335731, -0.10619373, -0.0064917593, -0.0...","[-0.10335731, -0.10619373, -0.0064917593, -0.0...","[0.20126317, -0.012986422, -0.36496776, 0.5446...","[0.20126317, -0.012986422, -0.36496776, 0.5446...","[-0.006160595, 0.044270344, 0.0005525648, 0.00...","[-0.006160595, 0.044270344, 0.0005525648, 0.00...","[-0.020950727, 0.04246694, -0.040927473, -0.02...","[-0.020950727, 0.04246694, -0.040927473, -0.02..."
9,10,Anxiety,"I experienced trembling (e.g., in the hands).",when feeling anxious I Experienced trembling (...,Anxiety is a relatively enduring state of anxi...,+,"[0.13728693, 0.029439833, 0.49849817, 0.208474...","[0.13728693, 0.029439833, 0.49849817, 0.208474...","[-0.0726398, -0.06698165, -0.0066921012, -0.04...","[-0.0726398, -0.06698165, -0.0066921012, -0.04...","[0.11722563, -0.06363736, -0.23205829, 0.54360...","[0.11722563, -0.06363736, -0.23205829, 0.54360...","[-0.01161314, 0.04862203, -0.006124796, 0.0245...","[-0.01161314, 0.04862203, -0.006124796, 0.0245...","[0.017739387, 0.01912253, -0.015784563, -0.044...","[0.017739387, 0.01912253, -0.015784563, -0.044..."


### Step 3 -  Compute cosine simlarities and store the data

In [6]:
# To avoid having too long names for the output datsets, we create a list of names, which we will then use to save the embedding cosine matrices
# make sure that the names here are meaningful and aligned with those of the one in the cell above.
model_short = ['distilroberta', 'mpnet', 'miniLM', 'e5', 'labse']

# Below, we loop over the different models we use for the study and compute the cosine sim. matrices.
for mod in range(0, len(models)):
  # create temporary empty lists for the item and one-pop method embeddings
  facet_embeddings_item = []

  #create cosine similarity matrix for each embedding calculation approach
  cosine_similarities_item = util.pytorch_cos_sim(df_items[models[mod] + '_embeddings'],df_items[models[mod] + '_embeddings']).numpy()

  # we don't have revesed items so code below is not necessary
  
  #fill diagonal with 1. This is done to avoid efa functions reading the cosine matrix as covariance
  np.fill_diagonal(cosine_similarities_item,1)


  #store results
  pd.DataFrame(cosine_similarities_item, columns = df_items['Item_simp'].unique(), index = df_items['Item_simp'].unique()).to_csv('./Data/cos_matrices/matrix_concatenated_item_'+model_short[mod]+'.csv', index = False)
