# Classification of a characters personality trait (OCEAN) when speaking about another character using unsupervised techniques

**Openness** - Describes an individual's openness to experience. A high score of this trait is great in a fast-growing company where there are many changes and ambiguity. The mid scorers tend to be great as well as they are usually open-minded but also practical and level-headed. Pay attention to the very low end of the spectrum as individuals who scored low may be resistant to change and can hinder innovation and progress.

**Conscientiousness** - The degree to which a person is characterized by dependability, efficiency, and purposeful action. This dimension is a good predictor of successful individual performance in the workplace.

**Extraversion**- Refers to a person's comfort level with his or her environment. A person high in extraversion is usually comfortable talking with new people. He or she likes to look at the big picture and is a successful influencer. This trait is usually seen in many CEOs and entrepreneurs.

**Agreeableness** - Measures how well a person gets along with others, competitiveness, and cooperation. People who scored high in this spectrum are empathetic and work well in a team. Highly agreeable individuals may thrive in roles that may involve counseling, social work, and leadership. On the negative side, a highly agreeable person may conform to groups, perhaps to avoid disagreements and/or to fit in. A mid scorer will likely bring up hard topics and be more assertive in situations that require decision-making.

**Neuroticism** - Measures a person's emotional stability. High neuroticism can lead to issues in the workplace. But just because an individual scored high in neuroticism, doesn't mean they should be disregarded completely. The concern is more about the type of job that he or she will be performing. For example, a person who scored high in neuroticism may not do well working as a server in a busy restaurant but may thrive working in a quiet, slower-paced setting such as in a library. Some people who get easily stressed handle their stress well and use it as a motivator to get their tasks accomplished.


In [None]:
import pandas as pd 
import csv

df = pd.read_csv("/content/Final.csv") 

In [None]:
df

Unnamed: 0.1,Unnamed: 0,characters,Anger,Happy,Surprise,Sad,Fear,Positive,Negative,Neutral,Compound,Jaccard_Similarity
0,0,"['Dumbledore', 'McGonagall']",0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.555556
1,1,"['McGonagall', 'Dumbledore']",0.0,0.0,0.0,1.0,0.0,0.492,0.565,1.943,0.3632,0.555556
2,2,"['Dumbledore', 'Hagrid']",0.0,0.5,0.0,0.0,1.5,1.44,0.372,2.188,1.581,0.454545
3,3,"['McGonagall', 'Hagrid']",0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.454545
4,4,"['Hagrid', 'Dumbledore']",1.0,1.0,0.0,2.5,0.5,1.417,0.0,7.583,1.5167,0.454545
5,5,"['Hagrid', 'McGonagall']",0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.454545
6,6,"['Dumbledore', 'Harry']",1.0,0.0,1.5,1.0,1.5,0.0,1.217,6.783,-1.0446,0.428571
7,7,"['Harry', 'Petunia']",0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.142857
8,8,"['Petunia', 'Dudley']",0.0,0.0,0.0,0.0,1.0,0.649,0.0,0.351,0.6765,0.4
9,9,"['Harry', 'Vernon']",0.0,0.0,0.0,0.0,0.0,0.0,0.688,0.312,-0.296,0.142857


In [None]:

from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
import plotly.graph_objects as go
import numpy as np
X=df.drop("characters",axis=1)
scaler = MinMaxScaler()
scaler.fit(X)
X=scaler.transform(X)
inertia = []
for i in range(1,11):
    kmeans = KMeans(
        n_clusters=i, init="k-means++",
        n_init=10,
        tol=1e-04, random_state=42
    )
    kmeans.fit(X)
    inertia.append(kmeans.inertia_)
fig = go.Figure(data=go.Scatter(x=np.arange(1,11),y=inertia))
fig.update_layout(title="Inertia vs Cluster Number",xaxis=dict(range=[0,11],title="Cluster Number"),
                  yaxis={'title':'Inertia'},
                 annotations=[
        dict(
            x=3,
            y=inertia[2],
            xref="x",
            yref="y",
            text="Elbow!",
            showarrow=True,
            arrowhead=7,
            ax=20,
            ay=-40
        )
    ])

In [None]:
import plotly.express as px
kmeans = KMeans(
        n_clusters=5, init="k-means++",
        n_init=10,
        tol=1e-04, random_state=42
    )
kmeans.fit(X)
clusters=pd.DataFrame(X,columns=df.drop("characters",axis=1).columns)
clusters['label']=kmeans.labels_
polar=clusters.groupby("label").mean().reset_index()
polar=pd.melt(polar,id_vars=["label"])
fig4 = px.line_polar(polar, r="value", theta="variable", color="label", line_close=True,height=800,width=1400)
fig4.show()


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.



In [None]:

pie=clusters.groupby('label').size().reset_index()
pie.columns=['label','value']
px.pie(pie,values='value',names='label',color=['aqua','pink','green','orange','yellow'])

In [None]:
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 2, 4,
       0, 2, 0, 0, 2, 0, 0, 0, 0, 2, 2, 1, 1, 1, 4, 2, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1], dtype=int32)

## Labels based on clustering score

*   Openness = 0-0.15 (Label=0)
*   Conscientiousness = 0.85-1.0 (Label=1)
*   Neuroticism = 0.35-0.55 (label=2)
*   Extraversion = 0.2-0.35 (Label=3)
*   Agreeableness = 0.55-0.85 (Label=4)


In [None]:
df

In [None]:
for i in range(0,59):
  if(clusters['label'][i]==0): clusters['label'][i] = 'Openness'
  elif(clusters['label'][i]==1): clusters['label'][i] = 'Conscientiousness'
  elif(clusters['label'][i]==2): clusters['label'][i] = 'Neuroticism'
  elif(clusters['label'][i]==3): clusters['label'][i] = 'Extraversion'
  else: clusters['label'][i] = 'Agreeableness'



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
clusters['characters'] = df['characters']

In [None]:
clusters

Unnamed: 0.1,Unnamed: 0,Anger,Happy,Surprise,Sad,Fear,Positive,Negative,Neutral,Compound,Jaccard_Similarity,label,characters
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028103,0.427084,0.813333,Openness,"['Dumbledore', 'McGonagall']"
1,0.017241,0.0,0.0,0.0,0.176678,0.0,0.140813,0.32084,0.066623,0.515214,0.813333,Openness,"['McGonagall', 'Dumbledore']"
2,0.034483,0.0,0.142857,0.0,0.0,0.428571,0.412135,0.211244,0.076631,0.81071,0.643636,Openness,"['Dumbledore', 'Hagrid']"
3,0.051724,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.028103,0.427084,0.643636,Openness,"['McGonagall', 'Hagrid']"
4,0.068966,0.307692,0.285714,0.0,0.441696,0.142857,0.405552,0.0,0.297006,0.795108,0.643636,Openness,"['Hagrid', 'Dumbledore']"
5,0.086207,0.0,0.0,0.0,0.176678,0.0,0.0,0.0,0.028103,0.427084,0.643636,Openness,"['Hagrid', 'McGonagall']"
6,0.103448,0.307692,0.0,0.45045,0.176678,0.428571,0.0,0.691085,0.264327,0.173614,0.6,Agreeableness,"['Dumbledore', 'Harry']"
7,0.12069,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028103,0.427084,0.12,Openness,"['Harry', 'Petunia']"
8,0.137931,0.0,0.0,0.0,0.0,0.285714,0.185747,0.0,0.001593,0.591236,0.552,Openness,"['Petunia', 'Dudley']"
9,0.155172,0.0,0.0,0.0,0.0,0.0,0.0,0.390687,0.0,0.355261,0.12,Openness,"['Harry', 'Vernon']"
