## What's going to be in this blogpost

- Describe the issue with standard survey development
- Introduce the idea of embeddings psychometrics
    - Create text (items) embeddings
    - Perform Psycometric analysis
- Introduce step 1
- Cliff-hanger to step 2
- Close off with some remarks and workd of caution.

## The workhorse of modern people listening

**How many times have you used a survey to determine your employees' engagement? Or customers' satisfaction?** And how about using tests to assess someone's personality or cognitive abilities to determine if they are a suitable candidate for a program/job etc? **Most companies/institutions use survey or tests as data-sources to draw insights and make data-driven decisions.**

These measurement tools are not perfect. They require careful piloting, calibration and psychometric analyses to ensure that the gathered data is reliable and valid. The risks are often too high to bear for using bad measures, such as... find examples...

Oftentimes, however, there is a lack of time or resources to develop proper measurement tools since:

- **They require lots of money and resources**. Surely survey tools are cheap, but the expertise required to develop good questions, evaluate their fit to the theme/construct one wants to measure is high. Also, one needs to collect large data sets to validate the survey and determine its accuracy. Also, Developing good questions takes time and, in the best case scenario, lots of qualitative and scientific literature investigation;
- **There is a high risk of failure**: Measurement experts know too well that scale development is almost an art, which that requires lots of trial and errors. For instance, formulating questions positvely ("I like parties") or negatively ("I hate parties") can trigger different response processes. And, while one may want to try all possible permutations of a question, this will increase the risk of respondents' fatigue and reduced motivation when filling out a survey, which will result in low quality data and an increased risk in response biases (e.g. social desirability or carless responding);
- and much more...

## Doping the workhorse

So, one may simply ask, is it really not possible to simplify scale development and assessment and make it less resource intensive? The short answer is "Kinda, and you are in the right page to discover how"!. 

Large Language Model embeddings psychometrics (LLMEP), is a new and exiting area of research which can help streamlining and speeding up scale development like never before.

## It all began with an LLM
Large Language Models are one of the most exciting scientific innovation of the last decade and are becoming ubiquitous, from chatbots (chatGPT) to..... LLMs are defined as  a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification, next word prediction etc.(wiki)

These models can have different type of architectures depending on the task they are developed to. For instance, if one wants to produce text a decoder model would preferred (such as GPT). For LLMEP we will be focusing mostly on analyze text, which is better achieved using encoder models. You can think of an encoder as a transformation tool that takes text and transforms it into a multidimensional numerical vector, also called embedding. (maybe add text about the transformation to explain more clearly) [encoder and a decoder layer](https://magazine.sebastianraschka.com/p/understanding-encoder-and-decoder)

The most popular architecture for encoding models is that of Bidirectional Encoder Representations from Transformers ([BERT](https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270)). BERT models are exceptionally good at understanding words due to their ability to capture word context by "looking" at the text both to the left and right of the word. However, when dealing with questions from a survey, which consist of entire sentences rather than single words, we need models that can transform and analyze full sentences. This is where Sentence-BERT (SBERT) comes into play.

## Challenges with Traditional Sentence Embeddings

Before SBERT, one had to aggregate word-level embeddings (typically through max or mean pooling) to obtain sentence embeddings. The problem with this approach is that capturing the semantic meaning of sentences can be challenging. For instance, simple averaging might miss subtle nuances in sentence meaning.

Let's start by demonstrating how to compute word embeddings using a BERT model and then perform mean pooling to obtain sentence embeddings.

**Compute Word Embeddings with BERT**

In [21]:
from transformers import BertTokenizer, BertModel
import torch

# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Example sentences
sentence1 = "Embedding psychometrics is important for better survey analysis."
sentence2 = "Embedding psychometrics enhances the reliability of surveys."

# Tokenize the sentences
inputs1 = tokenizer(sentence1, return_tensors='pt', add_special_tokens=True)
inputs2 = tokenizer(sentence2, return_tensors='pt', add_special_tokens=True)

with torch.no_grad():
    outputs1 = model(**inputs1)
    outputs2 = model(**inputs2)

# Get word embeddings (excluding special tokens)
word_embeddings1 = outputs1.last_hidden_state.squeeze(0)[1:-1]
word_embeddings2 = outputs2.last_hidden_state.squeeze(0)[1:-1]

print("Word Embeddings Shape (Sentence 1):", word_embeddings1.shape)
print("Word Embeddings Shape (Sentence 2):", word_embeddings2.shape)


Word Embeddings Shape (Sentence 1): torch.Size([13, 768])
Word Embeddings Shape (Sentence 2): torch.Size([13, 768])


#### Mean Pooling to Obtain Sentence Embeddings

To obtain a sentence embedding from the word embeddings, we can use mean pooling. This technique averages the embeddings of all tokens in the sentence to create a single vector representing the entire sentence.

**Compute Sentence Embeddings Using Mean Pooling**

In [22]:
# Mean pooling of word embeddings to get the sentence embeddings
sentence_embedding_mean_pooling1 = word_embeddings1.mean(dim=0).numpy()
sentence_embedding_mean_pooling2 = word_embeddings2.mean(dim=0).numpy()

print("Sentence Embedding (Mean Pooling) Shape (Sentence 1):", sentence_embedding_mean_pooling1.shape)
print("Sentence Embedding (Mean Pooling) Shape (Sentence 2):", sentence_embedding_mean_pooling2.shape)

Sentence Embedding (Mean Pooling) Shape (Sentence 1): (768,)
Sentence Embedding (Mean Pooling) Shape (Sentence 2): (768,)


## Enter Sentence-BERT (SBERT)
SBERT models learn the semantic meaning of sentences more effectively by leveraging [siamese and triplet networks () in its architecture](https://towardsdatascience.com/sbert-deb3d4aef8a4). These models significantly improve the ability to capture the meaning of entire sentences, such as those in survey questions, allowing us to transform them into item embeddings that can be used for psychometric analyses.

Compute Sentence Embeddings Using SBERT

Let's compute sentence embeddings using SBERT and compare them with the mean-pooled embeddings.

In [23]:
from sentence_transformers import SentenceTransformer

# Load pre-trained SBERT model
sbert_model = SentenceTransformer('bert-base-nli-mean-tokens')

# Compute sentence embeddings using SBERT for both sentences
sbert_sentence_embedding1 = sbert_model.encode(sentence1)
sbert_sentence_embedding2 = sbert_model.encode(sentence2)

print("SBERT Sentence Embedding Shape (Sentence 1):", sbert_sentence_embedding1.shape)
print("SBERT Sentence Embedding Shape (Sentence 2):", sbert_sentence_embedding2.shape)


SBERT Sentence Embedding Shape (Sentence 1): (768,)
SBERT Sentence Embedding Shape (Sentence 2): (768,)


## Comparing BERT with SBERT

Finally, let's compare the similarity between the two sentences using both the mean-pooled word-level embeddings and the SBERT sentence-level embeddings. This comparison will illustrate how much more effectively SBERT captures the semantic relationship between sentences.

**Compute Cosine Similarity Between Sentence Embedding**

In [24]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity between the mean-pooled word-level embeddings
similarity_mean_pooling = cosine_similarity(
    [sentence_embedding_mean_pooling1],
    [sentence_embedding_mean_pooling2]
)[0][0]

# Compute cosine similarity between SBERT sentence embeddings
similarity_sbert = cosine_similarity(
    [sbert_sentence_embedding1],
    [sbert_sentence_embedding2]
)[0][0]

print(f"Cosine Similarity (Mean-Pooling BERT): {similarity_mean_pooling}")
print(f"Cosine Similarity (SBERT): {similarity_sbert}")


Cosine Similarity (Mean-Pooling BERT): 0.9296162724494934
Cosine Similarity (SBERT): 0.7960948348045349


## Conclusion
SBERT-based models capture the meaning of sentences much better than traditional BERT-based models using mean pooling. This allows us to transform entire sentences, such as items in a questionnaire, into more accurate item embeddings for psychometric analyses. By embedding psychometrics, we can make survey analysis more robust, efficient, and reliable.

## The secrets of LLMEP

- Small intro to NLP and Transformers
    - NLP
        - Definition: process data encoded in natural language and is thus closely related to information retrieval, knowledge representation and computational linguistics, a subfield of linguistics. Typically data is collected in text corpora, using either rule-based, statistical or neural-based approaches of machine learning and deep learning.
        - The advent of transformers: NLP has gone a long way in understanding human language through the use of increasingly complex model. But the biggest revolution was the one from recurrent neural networks to transformers. Recurrent neural networks processed language sequentially, token (e.g., word) by token. Since the revolutionary paper " Attention is all you need" transformers have become the golden standards by capturing context efficiently. Compared to previous models transformers were able to:
        - processing text in parallel
        - capture long text dependencies through context
    - Transformers 
        - Word-based transformers
        - Sentence-based transformers
- What are embeddings
    -  Embeddings 
- How can we use embeddings for scale development
    - The unexpected twist "using sentence embeddings as response vectors" [ Mata and Guenole] 

Large Language Models are one of the most exciting scientific innovation of the last decade and are becoming ubiquitous, from chatbots (chatGPT) to..... LLMs are defined as  a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification, next word prediction etc.(wiki)

These models can have different type of architectures depending on the task they are developed to. For instance, if one wants to produce text a decoder model would preferred (such as GPT). For LLMEP we will be focusing mostly on analyze text, which is better achieved using encoder models. You can think of an encoder as a transformation tool that takes text and transforms it into a multidimensional numerical vector, also called embedding. (maybe add text about the transformation to explain more clearly)

The most popular architecture for encode model is that of Bidirectional Encoder Representation from Transformers ([BERT](https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270)). BERT models are extremely good in understading words due to their ability to capture the word-context through "looking" both at the text left and right of the word. However, when we deal with questions from a survey, they are not formed from a single word but entire sentences, which is why we need models that can transform and analyze entire sentences, such as Sentence-BERT (SBERT).

Before SBERT one had to aggregate (through max or mean pooling) word-level embeddings in order to obtain sentence embeddings. The problem with this approach was that the semantic meaning of sentences would be hard to capture. For instance....
SBERT models learn the semantic meaning of sentences through the [addition of siamese and triplet networks () in its architecture](https://towardsdatascience.com/sbert-deb3d4aef8a4).
{The idea was to first create sentence embeddings through max/mean pooling for two sentences, and concatenate these embeddings with another embedding obtained from their difference. Further, this concatenated embedding was used in a classification task to predict whether the sentences were entailing each other, opposing each other or neutral to each other.} 

SBERT-based models capture the meaning of sentences much better than BERT-based models allowing us to transform sentences, such as items in a questionnaire, into item embeddings which we can in turn use for psychometric analyses. 

But before taking our next step into embedding psychometrics let's demonstrate all that we have discussed so far.






GPT is a decoder mode, which takes the input and outputs text, 

Most of these models comprises an [encoder and a decoder layer](https://magazine.sebastianraschka.com/p/understanding-encoder-and-decoder). You can think of an encoder as a transformation tool that takes text and transforms it into a multidimensional numerical vector, also called embedding. Each of the dimensions of this text embedding captures an aspect of that text. For example, one dimension may capture whether that text is about someone or something, whereas another dimension would represent if that text is joyful or sad etc. Then, the decoder would take this vector and, using its own transformational properties, generate the desired output (e.g., translated text). 
For en



Goal of the blog post

Situation
- self-report measures are one of the most used tools to evaluate peoples' opinion beliefs needs, cognitive abilities etc

Complication
Developing good measures is 
- Expense and Resources: Detail the cost implications, including the need for large sample sizes and extensive data collection efforts.
- Time Consumption: Elaborate on the time required to review existing literature, develop questions, pilot tests, and analyze data.
- Risk of Failure:Discuss the risks of poorly designed measures, including respondent fatigue, biased answers, and unreliable data.
- Limitations in Testing Explain the practical limitations of testing numerous questions and variations, and how this increases the risk of introducing biases.

Question
How can we speed-up and simplify scale development so that, when we use self-report measures, we are quite sure they work well?

Solution
Embedding psychometrics!
What is embedding psychometrics:....
    and what is sentence embeddings to begin with...

- Embedding psychometrics is cheap: it does not require extensive data collection, large sample size as it is simply done through leveraging the power of pre-trained llms
- Embedding psychometrics is fast: surely questions need to be reviewed and collected, but one can simply pilot existing questions all at once and analyze the data in real time without needing to wait for data collection.
- Embedding psychometrics fails fast
- Embedding psychometrics not bias free but less bias sensitive 




What is embedding Psychometrics
    How is it different from classic psychometrics

Why embedding psychometrics?

How? 


Classic Psychometrics
- Research question
Embedding Psychometrics


Structure of the blopost
- E