# Analyzing Personality Population 🌊

Analysis of the texts generated by the **Persona-Predict V2** 🧠 service. Analyses were conducted using **100** randomly selected records from the *Brasil Escola* 🇧🇷 dataset, which comprises essays from Brazilian students:

  - [Dataset Brasil Escola](https://github.com/gpassero/uol-redacoes-xml/tree/master/brasilescola)
  - [Batch Notebook](https://nbviewer.org/github/NeuroQuestAi/neuroquest-examples/blob/main/products/persona-predict/notebooks/Persona-Predict-Batch-PT-BR.ipynb?flush_cache=true)
  
The scores for the traits follow the [IPIP](https://ipip.ori.org/) standard. Where:

- If the score is below 45: It is classified as a 'low' score.
- If the score is between 45 and 55: It is classified as an 'average' score.
- If the score is above 55: It is classified as a 'high' score.

If you have any doubts about interpreting the results, please consult a psychologist specializing in personality theory.

If the graphics are not rendered 🚫, use the address:

  - [View in NBViewer](https://nbviewer.org/github/NeuroQuestAi/neuroquest-examples/blob/main/products/persona-predict/notebooks/Persona-Predict-Pop-PT-BR.ipynb?flush_cache=true)

For more information 🔍 about the service visit: [docs.neuroquest.ai/persona-predict](https://docs.neuroquest.ai/persona-predict/)


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utility as U

from wordcloud import WordCloud

pd.set_option("display.max_rows", 50)  
pd.set_option("display.max_columns", 10)

## Read Data 📊

In [None]:
df = pd.DataFrame(U.get_traits_from_batch_json_results())
df.head(5)

## Most used Words 🗣️

In [None]:
force_download_data = False
wordcloud = WordCloud(
    width=800, height=400, max_words=100, background_color="white"
).generate(U.remove_stop_words_from_essay(df=df, download_data=force_download_data))

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title("Top 100 Most Used Words")
plt.show()

## 1. Big-Five Dimensions 🎭

In [None]:
big5_acronym = ["O", "C", "E", "A", "N"]

correlation_matrix = df[
    ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]
].corr()
ax = sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f")

ax.set_xticklabels(big5_acronym)
ax.set_yticklabels(big5_acronym)

plt.title("Big-Five Correlation", fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
big5_dimensions = [
    "openness",
    "conscientiousness",
    "extraversion",
    "agreeableness",
    "neuroticism",
]

U.plot_eda_boxplot(
    df=df,
    targets=big5_dimensions,
    ticktext=[x.capitalize() for x in big5_dimensions],
    title="Big-Five Facets",
    color=1,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=big5_dimensions, title="Big-Five Personality Comparison"
)

## 2. Big-Five Openness & Facets 🧑‍🎨

In [None]:
openness_facets = [
    "imagination",
    "artistic_interests",
    "emotionality",
    "adventurousness",
    "intellect",
    "liberalism",
]

title = [x.replace("_", "-").capitalize() for x in openness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=openness_facets,
    ticktext=title,
    title="Openness Facets",
    color=2,
)

In [None]:
U.plot_eda_radar(df=df, targets=openness_facets, title="Openness Facets Comparison")

## 2. Big-Five Conscientiousness & Facets 🧑‍🔬 

In [None]:
conscientiousness_facets = [
    "self_efficacy",
    "orderliness",
    "dutifulness",
    "achievement_striving",
    "self_discipline",
    "cautiousness",
]

title = [x.replace("_", "-").capitalize() for x in conscientiousness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=conscientiousness_facets,
    ticktext=title,
    title="Conscientiousness Facets",
    color=3,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=conscientiousness_facets, title="Conscientiousness Facets Comparison"
)

## 3. Big-Five Extraversion & Facets 🕺💃

In [None]:
extraversion_facets = [
    "friendliness",
    "gregariousness",
    "assertiveness",
    "activity_level",
    "excitement_seeking",
    "cheerfulness",
]

title = [x.replace("_", "-").capitalize() for x in extraversion_facets]

U.plot_eda_boxplot(
    df=df,
    targets=extraversion_facets,
    ticktext=title,
    title="Extraversion Facets",
    color=4,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=extraversion_facets, title="Extraversion Facets Comparison"
)

## 4. Big-Five Agreeableness & Facets 🙋🧡

In [None]:
agreeableness_facets = [
    "trust",
    "morality",
    "altruism",
    "cooperation",
    "modesty",
    "sympathy",
]

title = [x.replace("_", "-").capitalize() for x in agreeableness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=agreeableness_facets,
    ticktext=[x.capitalize() for x in title],
    title="Agreeableness Facets",
    color=5,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=agreeableness_facets, title="Agreeableness Facets Comparison"
)

## 5. Big-Five Neuroticism & Facets 🙅💢

In [None]:
neuroticism_facets = [
    "anxiety",
    "anger",
    "depression",
    "self_consciousness",
    "immoderation",
    "vulnerability",
]

title = [x.replace("_", "-").capitalize() for x in agreeableness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=neuroticism_facets,
    ticktext=title,
    title="Neuroticism Facets",
    color=6,
)