# Analyzing Personality Population (Brasil Escola) 🌊

Analysis of the texts generated by the **Persona-Predict V2** 🧠 service. Analyses were conducted using **100** randomly selected records from the Brasil Escola 🇧🇷 dataset, which comprises essays from Brazilian students:

  - [Brasil Escola](https://github.com/gpassero/uol-redacoes-xml/tree/master/brasilescola)

If the graphics are not rendered 🚫, use the address:

  - [View in NBViewer](https://nbviewer.org/github/NeuroQuestAi/neuroquest-examples/blob/main/products/persona-predict/notebooks/Persona-Predict-Pop-PT-BR.ipynb?flush_cache=true)

For more information 🔍 about the service visit: [docs.neuroquest.ai/persona-predict](https://docs.neuroquest.ai/persona-predict/)


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import utility as U

from wordcloud import WordCloud

In [None]:
import pandas as pd
import json
import glob

directory = 'results/batch/*.json'

json_files = glob.glob(directory)

file_count = len(json_files)
print(f"Number of JSON files found: {file_count}")

dataframes = []

# Process each JSON file
for file_path in json_files:
    # Load JSON data
    with open(file_path, 'r') as file:
        data = json.load(file)
        
        print(data["data"]["person"]["name"])
        print(data["data"]["person"]["analysis"]["essay"]["analyzed_text"])
        personalities = data["data"]["person"]["analysis"]["personalities"]

        x = personalities

        print(x[0])
        print(x[0]["openness"]["result"])

        user = {
            "openness": x[0]["openness"]["result"],
            "imagination": x[0]["openness"]["traits"][0]["result"], 
            "artistic_interests": x[0]["openness"]["traits"][1]["result"], 
            "emotionality": x[0]["openness"]["traits"][2]["result"], 
            "adventurousness": x[0]["openness"]["traits"][3]["result"],
            "intellect": x[0]["openness"]["traits"][4]["result"], 
            "liberalism": x[0]["openness"]["traits"][5]["result"],
        
            "conscientiousness": x[1]["conscientiousness"]["result"],
            "self_efficacy": x[1]["conscientiousness"]["traits"][0]["result"], 
            "orderliness": x[1]["conscientiousness"]["traits"][1]["result"],
            "dutifulness": x[1]["conscientiousness"]["traits"][2]["result"],
            "achievement_striving": x[1]["conscientiousness"]["traits"][3]["result"],
            "self_discipline": x[1]["conscientiousness"]["traits"][4]["result"],
            "cautiousness": x[1]["conscientiousness"]["traits"][5]["result"],

            "extraversion": x[2]["extraversion"]["result"],
            "friendliness": x[2]["extraversion"]["traits"][0]["result"], 
            "gregariousness": x[2]["extraversion"]["traits"][1]["result"],
            "assertiveness": x[2]["extraversion"]["traits"][2]["result"],
            "activity_level": x[2]["extraversion"]["traits"][3]["result"],
            "excitement_seeking": x[2]["extraversion"]["traits"][4]["result"],
            "cheerfulness": x[2]["extraversion"]["traits"][5]["result"],

            "agreeableness": x[3]["agreeableness"]["result"],
            "trust": x[3]["agreeableness"]["traits"][0]["result"], 
            "morality": x[3]["agreeableness"]["traits"][1]["result"],
            "altruism": x[3]["agreeableness"]["traits"][2]["result"],
            "cooperation": x[3]["agreeableness"]["traits"][3]["result"],
            "modesty": x[3]["agreeableness"]["traits"][4]["result"],
            "sympathy": x[3]["agreeableness"]["traits"][5]["result"],
        }

        dataframes.append(user)



Number of JSON files found: 100
Michelle Castillo
é cada vez mais crescente o número de mortos e desabrigados, por conta das enchentes a qual que vem se tornando o pesadelo de muitas pessoas, sobretudo as mais carentes. com a falta de planejamento deurbanização das cidades, as pessoas ficam mais perecíveis**a tais riscos, por conta de invadirem espaços os quais não deveriam ser habitados. a chuva é um fenômeno climático, sendo essencial para a vida animal e vegetal, porém o excesso da mesma, decorrente da altíssima umidade do ar, causa enormes estragos como deslizamento de terras, erosões ( ) entre outros, ocasionando assim as enchentes- équando um leito de um rio recebe uma quantidade proveniente da chuva maior que sua capacidade suportável- deixando sérias consequências, chegando a deixar milhares de mortos e desabrigados, além da grande quantidade de pessoas infectadas e os enormes prejuízos. é comum que ocorra as enchentes, pois faz parte de um processo natural, já que todo rio nec

In [95]:
results_df = pd.DataFrame(dataframes)

results_df

Unnamed: 0,openness,imagination,artistic_interests,emotionality,adventurousness,intellect,liberalism,conscientiousness,self_efficacy,orderliness,...,achievement_striving,self_discipline,cautiousness,extraversion,friendliness,gregariousness,assertiveness,activity_level,excitement_seeking,cheerfulness
0,76.867932,78.453399,74.823980,78.170694,73.910109,79.290641,76.558766,33.919726,36.822471,29.552695,...,37.022312,35.415906,36.469088,31.005555,34.145332,33.524949,30.231250,27.841278,30.599065,29.691456
1,82.566448,81.832274,82.066499,84.867097,80.898725,84.486025,81.248069,30.908253,33.552268,27.594440,...,33.975709,33.116611,32.536689,31.302416,32.855867,34.029079,30.661172,28.183777,31.457377,30.627226
2,72.663569,73.002011,71.464401,75.057528,70.608785,75.261979,70.586713,47.463825,50.384282,42.918353,...,50.878324,50.255349,49.159172,49.281543,52.350520,53.535906,47.905736,45.235120,50.862220,45.799756
3,62.405338,63.151641,61.569786,65.172355,58.827372,66.151626,59.559246,57.824276,60.664552,52.472703,...,62.836137,60.708083,59.287646,32.103571,35.530701,36.828828,30.018707,27.983467,31.815237,30.444488
4,82.818649,82.371376,81.625064,85.189129,81.841138,84.361286,81.523899,29.499466,31.714872,26.652062,...,32.084479,31.439832,30.863546,39.335351,38.093221,42.436059,38.876306,36.462907,40.470732,39.672883
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,57.036765,58.495051,54.210771,59.160418,54.470978,58.206923,57.676447,59.040431,60.912610,55.361771,...,61.160568,61.276517,59.440355,37.819546,40.079751,41.590261,34.913822,34.915557,38.036675,37.381211
96,77.874097,76.319039,76.796174,81.284135,76.927946,80.864864,75.052423,24.563095,26.688072,21.779463,...,26.783135,25.570844,27.484639,35.354942,36.014073,40.230152,34.833233,31.115996,35.370559,34.565638
97,79.680753,79.417467,79.494418,82.035602,78.192715,81.938686,77.005628,39.671439,41.344428,36.344666,...,42.569539,41.933488,40.602382,38.892887,42.095110,42.849850,37.022000,35.017532,39.999401,36.373430
98,65.406902,67.465344,61.738342,66.998342,63.990237,66.870872,65.378278,64.466656,66.024035,60.468021,...,67.382593,66.832179,65.717243,35.817304,40.267482,39.676385,33.208470,31.821715,36.050737,33.879032


## Most used Words 🗣️

In [None]:
wordcloud = WordCloud(
    width=800, height=400, max_words=100, background_color="white"
).generate(U.remove_stop_words_from_essay(df=df, download_data=False))

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title("Top 100 Most Used Words")
plt.show()

## 1. Big-Five Dimensions 🎭

In [None]:
big5_acronym = ["O", "C", "E", "A", "N"]

correlation_matrix = df[
    ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]
].corr()
ax = sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f")

ax.set_xticklabels(big5_acronym)
ax.set_yticklabels(big5_acronym)

plt.title("Big-Five Correlation", fontsize=12)
plt.tight_layout()
plt.show()

In [None]:
big5_dimensions = [
    "openness",
    "conscientiousness",
    "extraversion",
    "agreeableness",
    "neuroticism",
]

U.plot_eda_boxplot(
    df=df,
    targets=big5_dimensions,
    ticktext=[x.capitalize() for x in big5_dimensions],
    title="Big-Five Facets",
    color=1,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=big5_dimensions, title="Big-Five Personality Comparison"
)

## 2. Big-Five Openness & Facets 🧑‍🎨

In [None]:
openness_facets = [
    "facet_imagination",
    "facet_artistic_interests",
    "facet_emotionality",
    "facet_adventurousness",
    "facet_intellect",
    "facet_liberalism",
]

title = [x.replace("facet_", "").replace("_", "-") for x in openness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=openness_facets,
    ticktext=[x.capitalize() for x in title],
    title="Openness Facets",
    color=2,
)

In [None]:
U.plot_eda_radar(df=df, targets=openness_facets, title="Openness Facets Comparison")

## 2. Big-Five Conscientiousness & Facets 🧑‍🔬 

In [None]:
conscientiousness_facets = [
    "facet_self_efficacy",
    "facet_orderliness",
    "facet_dutifulness",
    "facet_achievement_striving",
    "facet_self_discipline",
    "facet_cautiousness",
]

title = [x.replace("facet_", "").replace("_", "-") for x in conscientiousness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=conscientiousness_facets,
    ticktext=[x.capitalize() for x in title],
    title="Conscientiousness Facets",
    color=3,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=conscientiousness_facets, title="Conscientiousness Facets Comparison"
)

## 3. Big-Five Extraversion & Facets 🕺💃

In [None]:
extraversion_facets = [
    "facet_friendliness",
    "facet_gregariousness",
    "facet_assertiveness",
    "facet_activity_level",
    "facet_excitement_seeking",
    "facet_cheerfulness",
]

title = [x.replace("facet_", "").replace("_", "-") for x in extraversion_facets]

U.plot_eda_boxplot(
    df=df,
    targets=extraversion_facets,
    ticktext=[x.capitalize() for x in title],
    title="Extraversion Facets",
    color=4,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=extraversion_facets, title="Extraversion Facets Comparison"
)

## 4. Big-Five Agreeableness & Facets 🙋🧡

In [None]:
agreeableness_facets = [
    "facet_trust",
    "facet_morality",
    "facet_altruism",
    "facet_cooperation",
    "facet_modesty",
    "facet_sympathy",
]

title = [x.replace("facet_", "") for x in agreeableness_facets]

U.plot_eda_boxplot(
    df=df,
    targets=agreeableness_facets,
    ticktext=[x.capitalize() for x in title],
    title="Agreeableness Facets",
    color=5,
)

In [None]:
U.plot_eda_radar(
    df=df, targets=agreeableness_facets, title="Agreeableness Facets Comparison"
)

## 5. Big-Five Neuroticism & Facets 🙅💢

In [None]:
neuroticism_facets = [
    "facet_anxiety",
    "facet_anger",
    "facet_depression",
    "facet_self_consciousness",
    "facet_immoderation",
    "facet_vulnerability",
]

title = [x.replace("facet_", "").replace("_", "-") for x in neuroticism_facets]

U.plot_eda_boxplot(
    df=df,
    targets=neuroticism_facets,
    ticktext=[x.capitalize() for x in title],
    title="Neuroticism Facets",
    color=6,
)