### Tecnolg√≠as del Lenguaje. Entregable 4
---
#### Generaci√≥n de embedding √∫nico para los rasgos Big Five (NEO-FFI)

En este notebook se pretende generar un embedding √∫nico que represente la sem√°ntica de las frases del cuestionario Big-5, para tener un √∫nico embedding con el que comparar los posts de los usuarios. Es un paso importante para reducir el tama√±o del dataset: **buscamos los posts de cada usuario m√°s similares sem√°nticamente con las frases del cuestionario**.

El archivo resultante se almacena en `material/all_traits_embedding.csv` y tambi√©n como `.npy`, m√°s eficiente para usar con Python.

In [1]:
import pandas as pd
from sentence_transformers import SentenceTransformer
import numpy as np
import os

# ------------------------------------------------------
# 1Ô∏è‚É£ Definir los √≠tems del cuestionario NEO-FFI
# ------------------------------------------------------
neo_ffi = {
    "Agreeableness": {
        "positive": [
            "I try to be courteous to everyone I meet.",
            "I would rather cooperate with others than compete with them.",
            "Most people I know like me.",
            "I generally try to be thoughtful and considerate."
        ],
        "negative": [
            "I often get into arguments with my family and co-workers.",
            "Some people think I‚Äôm selfish and egotistical.",
            "I tend to be cynical and skeptical of other‚Äôs intentions.",
            "I believe that most people will take advantage of you if you let them.",
            "Some people think of me as cold and calculating.",
            "I‚Äôm hard-headed and tough-minded in my attitudes.",
            "If I don‚Äôt like people, I let them know it.",
            "If necessary, I am willing to manipulate people."
        ]
    },
    "Openness": {
        "positive": [
            "I am intrigued by the patterns I find in art and nature.",
            "I often try new and foreign foods.",
            "Sometimes when I am reading poetry or looking at a work of art, I feel a chill or wave of excitement.",
            "I have a lot of intellectual curiosity.",
            "I often enjoy playing with theories or abstract ideas."
        ],
        "negative": [
            "I don‚Äôt like to waste my time daydreaming.",
            "Once I find the right way to do something, I stick to it.",
            "I believe letting students hear controversial speakers can only confuse and mislead them.",
            "Poetry has little or no effect on me.",
            "I seldom notice the moods or feelings that different environments produce.",
            "I believe we should look to our religious authorities for decisions on moral issues.",
            "I have little interest in speculating on the nature of the universe or the human condition."
        ]
    },
    "Conscientiousness": {
        "positive": [
            "I keep my belongings clean and neat.",
            "I‚Äôm pretty good about pacing myself so as to get things done on time.",
            "I try to perform all the tasks assigned to me conscientiously.",
            "I have a clear set of goals and work toward them in an orderly fashion.",
            "I work hard to accomplish my goals.",
            "When I make a commitment, I can always be counted on to follow through.",
            "I am a productive person who always gets the job done.",
            "I strive for excellence in everything I do."
        ],
        "negative": [
            "I am not a very methodical person.",
            "I waste a lot of time before settling down to work.",
            "Sometimes I‚Äôm not as dependable or reliable as I should be.",
            "I never seem to be able to get organized."
        ]
    },
    "Extraversion": {
        "positive": [
            "I like to have a lot of people around me.",
            "I laugh easily.",
            "I really enjoy talking to people.",
            "I like to be where the action is.",
            "I often feel as if I am bursting with energy.",
            "I am a cheerful, high-spirited person.",
            "My life is fast-paced.",
            "I am a very active person."
        ],
        "negative": [
            "I don‚Äôt consider myself especially ‚Äòlighthearted‚Äô.",
            "I usually prefer to do things alone.",
            "I am not a cheerful optimist.",
            "I would rather go my own way than be a leader of others."
        ]
    },
    "Neuroticism": {
        "positive": [
            "I often feel inferior to others.",
            "When I‚Äôm under a great deal of stress, sometimes I feel like I‚Äôm going to pieces.",
            "I often feel tense and jittery.",
            "Sometimes I feel completely worthless.",
            "I often get angry at the way people treat me.",
            "Too often, when things go wrong, I get discouraged and feel like giving up.",
            "I often feel helpless and want someone else to solve my problems.",
            "At times I have been so ashamed I just wanted to hide."
        ],
        "negative": [
            "I am not a worrier.",
            "I rarely feel lonely or blue.",
            "I rarely feel fearful or anxious.",
            "I am seldom sad or depressed."
        ]
    }
}

# ------------------------------------------------------
# 2Ô∏è‚É£ Unir todas las frases en una lista √∫nica
# ------------------------------------------------------
all_sentences = []
for polarities in neo_ffi.values():
    for sentences in polarities.values():
        all_sentences.extend(sentences)

# ------------------------------------------------------
# 3Ô∏è‚É£ Cargar modelo
# ------------------------------------------------------
print("üîπ Loading model...")
model = SentenceTransformer("all-MiniLM-L6-v2")

# ------------------------------------------------------
# 4Ô∏è‚É£ Calcular embedding promedio de todas las frases
# ------------------------------------------------------
print("üîπ Encoding all sentences...")
embeddings = model.encode(all_sentences, normalize_embeddings=True)
mean_embedding = np.mean(embeddings, axis=0)

# ------------------------------------------------------
# 5Ô∏è‚É£ Guardar embedding promedio como .npy
# ------------------------------------------------------
os.makedirs("material", exist_ok=True)
np.save("material/all_traits_embedding.npy", mean_embedding)
print(f"‚úÖ Saved all_traits_embedding.npy with shape {mean_embedding.shape}")



üîπ Loading model...




üîπ Encoding all sentences...
‚úÖ Saved all_traits_embedding.npy with shape (384,)
