# ***Learner Personality Traits And Their Department***

This project deals with learners' personality scores, in relation to the department they applied for. The aim is to draw insights from `personality_scores.csv` and `departments.csv`. These insights should help determine which learners are well suited for their chosen department, and which are not, based on their personality.

## Import dependencies:

In [1]:
import pandas as pd
import ast

## Load the data:

In [2]:
personality_df = pd.read_csv("../data/personality_scores.csv", delimiter=";")
loaded_departments = pd.read_csv("../data/departments.csv", delimiter=";")

## Manipulate the data:

In [3]:
pd.set_option("display.max_columns", None)

This expands any truncated columns in a DataFrame, so that one is able to see all columns of a DataFrame.

In [4]:
personality_df.head()

Unnamed: 0,ID,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],Section 5 of 6 [I am relaxed most of the time.],Section 5 of 6 [I am the life of the party.],Section 5 of 6 [I change my mood a lot.],Section 5 of 6 [I do not have a good imagination.],Section 5 of 6 [I don't like to draw attention to myself.],Section 5 of 6 [I don't mind being the center of attention.],Section 5 of 6 [I don't talk a lot.],Section 5 of 6 [I feel comfortable around people.],Section 5 of 6 [I feel little concern for others.],Section 5 of 6 [I feel others' emotions.],Section 5 of 6 [I follow a schedule.],Section 5 of 6 [I get chores done right away.],Section 5 of 6 [I get irritated easily.],Section 5 of 6 [I get stressed out easily.],Section 5 of 6 [I get upset easily.],Section 5 of 6 [I have a rich vocabulary.],Section 5 of 6 [I have a soft (kind) heart.],Section 5 of 6 [I have a vivid imagination.],Section 5 of 6 [I have difficulty understanding abstract ideas.],Section 5 of 6 [I have excellent ideas.],Section 5 of 6 [I have frequent mood swings.],Section 5 of 6 [I have little to say.],Section 5 of 6 [I insult people.],Section 5 of 6 [I keep in the background.],Section 5 of 6 [I leave my belongings lying around.],Section 5 of 6 [I like order.],Section 5 of 6 [I make a mess of things.],Section 5 of 6 [I make people feel at ease.],Section 5 of 6 [I neglect my duties.],Section 5 of 6 [I often feel blue (down).],Section 5 of 6 [I often forget to put things back in their proper place],Section 5 of 6 [I pay attention to details.],Section 5 of 6 [I seldom feel blue (down).],Section 5 of 6 [I spend time reflecting on things.],Section 5 of 6 [I start conversations.],Section 5 of 6 [I sympathize with others' feelings.],Section 5 of 6 [I take time out for others.],Section 5 of 6 [I talk to a lot of different people at parties.],Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,IPIP_HIGH_RISK
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)","(4, 3)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 3)","(5, 5)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 3)","(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",,,,,,,,,,,,,,,,,,,
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 3)","(1, 5)","(1, 5)","(2, 1)","(2, 5)","(3, 5)","(3, 5)","(4, 5)","(4, 3)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 5)","(2, 5)","(1, 5)","(3, 5)","(3, 5)","(3, 5)","(2, 5)","(3, 5)","(4, 5)","(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",,,,,,,,,,,,,,,,,,,
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 1)","(3, 3)","(3, 3)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",,,,,,,,,,,,,,,,,,,
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)","(4, 3)","(1, 1)","(4, 5)","(5, 5)","(1, 1)","(1, 3)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 5)","(4, 5)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 3)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 1)","(3, 1)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 5)","(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)",,,,,,,,,,,,,,,,,,,
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)","(4, 3)","(1, 3)","(4, 3)","(5, 5)","(1, 3)","(1, 1)","(1, 5)","(1, 3)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 5)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 3)","(4, 3)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 5)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",,,,,,,,,,,,,,,,,,,


In [5]:
personality_df.shape

(1558, 70)

`personality_df` currently has 1558 rows and 70 columns.

In [6]:
def assert_sequential_indexes(df):
    assert (
        df.index == pd.RangeIndex(len(df))
    ).all(), f"Expected: {True} meaning the DataFrame indexes are sequential, but got: {(df.index == pd.RangeIndex(len(df))).all()} meaning the DataFrame indexes are not sequential."

In [7]:
assert_sequential_indexes(personality_df)

Prior to any data manipulation, `personality_df` indexes are sequential.

In [8]:
personality_df_initial_rows = personality_df.shape[0]
personality_df_initial_columns = personality_df.shape[1]

The initial number of `personality_df` rows and columns are recorded for later DataFrame dimension assertion. This makes it easier to validate whether the dimensions of `personality_df` will meet expectations after data manipulation.

In [9]:
def assert_duplicate_rows(df):
    assert df.duplicated(
        subset="ID"
    ).any(), f"Expected: {True} for any duplicate rows, but got: {df.duplicated(subset="ID").any()}, which means the DataFrame has no duplicate rows"

In [10]:
assert_duplicate_rows(personality_df)

Therefore there are duplicate rows.

In [11]:
personality_df_duplicate_rows = personality_df[
    personality_df.duplicated(subset="ID", keep="last")
]
personality_df_duplicate_rows_count = len(personality_df_duplicate_rows)

In [12]:
f"Number of duplicate rows in personality_df: {personality_df_duplicate_rows_count}"

'Number of duplicate rows in personality_df: 3'

In [13]:
personality_df = personality_df.drop(personality_df_duplicate_rows.index)

Three duplicates from `personality_df` have been removed. Therefore there should be 1555 rows remaining of the initial 1558 rows.

In [14]:
def assert_df_dimensions_equal(df, rows, columns):
    assert df.shape == (
        rows,
        columns,
    ), f"Expected the following DataFrame dimensions: {(1555, 70)}, but got: {(df.shape[0], df.shape[1])}."

In [15]:
assert_df_dimensions_equal(
    personality_df,
    personality_df_initial_rows - personality_df_duplicate_rows_count,
    personality_df_initial_columns,
)

Since duplicate rows were removed, the indexes of `personality_df` may not be sequential, in other words, there is a possibility that some consecutive indexes of some rows do not increase by 1 but more than 1.

In [16]:
personality_df_indexes_removed = personality_df_duplicate_rows.index
f"Indexes of the rows removed from personality_df: {personality_df_indexes_removed}"

"Indexes of the rows removed from personality_df: Index([67, 157, 997], dtype='int64')"

- Since there are missing indexes in `personality_df` according to cell 16, because of the duplicate rows that were removed, this means that the indexes are not sequential. 

- At this stage, data is being cleaned for better use later, none of the indexes in `personality_df` are used as unique identifiers yet, so it is safe to reset indexes to make them sequential.  

In [17]:
personality_df = personality_df.reset_index(drop=True)

In [18]:
assert_sequential_indexes(personality_df)

The indexes have been reset to ensure they are sequential, which streamlines operations and enhances the usability of `personality_df`.

In [19]:
def assert_empty_columns(df):
    assert (
        df.isnull().all().any()
    ), f"Expected {True} for any empty columns, but got: {df.isnull().all().any()} which means the DataFrame has no empty columns"

In [20]:
assert_empty_columns(personality_df)

Therefore there are empty columns. Prior to removing empty columns from `personality_df`, `personality_df` has 70 columns and 1555 rows.

In [21]:
personality_df_empty_columns = len(
    personality_df.columns[personality_df.isna().all()].to_list()
)

In [22]:
f"Number of empty columns: {personality_df_empty_columns}"

'Number of empty columns: 19'

In [23]:
personality_df = personality_df.dropna(axis=1, how="all")

Out of the 70 columns that were in `personality_df`, 19 were empty and removed. Therefore there should be 51 columns remaining.


In [24]:
assert_df_dimensions_equal(
    personality_df,
    personality_df_initial_rows - personality_df_duplicate_rows_count,
    personality_df_initial_columns - personality_df_empty_columns,
)

In [25]:
personality_df.shape

(1555, 51)

Therefore there are 51 `personality_df` columns remaining.

In [26]:
for character in ["Section 5 of 6 [", ".", "]"]:
    personality_df.columns = personality_df.columns.str.replace(character, "")

personality_df.head(0)

Unnamed: 0,ID,I am always prepared,I am easily disturbed,I am exacting (demanding) in my work,I am full of ideas,I am interested in people,I am not interested in abstract ideas,I am not interested in other people's problems,I am not really interested in others,I am quick to understand things,I am quiet around strangers,I am relaxed most of the time,I am the life of the party,I change my mood a lot,I do not have a good imagination,I don't like to draw attention to myself,I don't mind being the center of attention,I don't talk a lot,I feel comfortable around people,I feel little concern for others,I feel others' emotions,I follow a schedule,I get chores done right away,I get irritated easily,I get stressed out easily,I get upset easily,I have a rich vocabulary,I have a soft (kind) heart,I have a vivid imagination,I have difficulty understanding abstract ideas,I have excellent ideas,I have frequent mood swings,I have little to say,I insult people,I keep in the background,I leave my belongings lying around,I like order,I make a mess of things,I make people feel at ease,I neglect my duties,I often feel blue (down),I often forget to put things back in their proper place,I pay attention to details,I seldom feel blue (down),I spend time reflecting on things,I start conversations,I sympathize with others' feelings,I take time out for others,I talk to a lot of different people at parties,I use difficult words,I worry about things


Cleaned all `personality_df` columns for better readability.

In [27]:
personality_score_df = personality_df.copy()
personality_score_df.head()

Unnamed: 0,ID,I am always prepared,I am easily disturbed,I am exacting (demanding) in my work,I am full of ideas,I am interested in people,I am not interested in abstract ideas,I am not interested in other people's problems,I am not really interested in others,I am quick to understand things,I am quiet around strangers,I am relaxed most of the time,I am the life of the party,I change my mood a lot,I do not have a good imagination,I don't like to draw attention to myself,I don't mind being the center of attention,I don't talk a lot,I feel comfortable around people,I feel little concern for others,I feel others' emotions,I follow a schedule,I get chores done right away,I get irritated easily,I get stressed out easily,I get upset easily,I have a rich vocabulary,I have a soft (kind) heart,I have a vivid imagination,I have difficulty understanding abstract ideas,I have excellent ideas,I have frequent mood swings,I have little to say,I insult people,I keep in the background,I leave my belongings lying around,I like order,I make a mess of things,I make people feel at ease,I neglect my duties,I often feel blue (down),I often forget to put things back in their proper place,I pay attention to details,I seldom feel blue (down),I spend time reflecting on things,I start conversations,I sympathize with others' feelings,I take time out for others,I talk to a lot of different people at parties,I use difficult words,I worry about things
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)","(4, 3)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 3)","(5, 5)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 3)","(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 3)","(1, 5)","(1, 5)","(2, 1)","(2, 5)","(3, 5)","(3, 5)","(4, 5)","(4, 3)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 5)","(2, 5)","(1, 5)","(3, 5)","(3, 5)","(3, 5)","(2, 5)","(3, 5)","(4, 5)","(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)"
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 1)","(3, 3)","(3, 3)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)","(4, 3)","(1, 1)","(4, 5)","(5, 5)","(1, 1)","(1, 3)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 5)","(4, 5)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 3)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 1)","(3, 1)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 5)","(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)"
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)","(4, 3)","(1, 3)","(4, 3)","(5, 5)","(1, 3)","(1, 1)","(1, 5)","(1, 3)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 5)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 3)","(4, 3)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 5)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"


Stored changes made to `personality_df` as `personality_score_df`.

In [28]:
score_totals = {
    1: [],
    2: [],
    3: [],
    4: [],
    5: [],
}

`score_totals` represents the 5 personality trait columns, where the keys are subscales that correspond to a trait of the big five personality traits. These traits will become column names. The values in `score_totals` represent the rows of each column.

In [29]:
def sum_trait_scores(df):

    for row_index in df["ID"]:
        for column_name in df.columns[1:]:
            trait_tuple = ast.literal_eval(df.loc[row_index, column_name])
            if trait_tuple[0] in score_totals:
                score_totals[trait_tuple[0]].append(0)
                score_totals[trait_tuple[0]][row_index] += trait_tuple[1]

In [30]:
sum_trait_scores(personality_score_df)
head_score_totals = {}

for subscale, totals in score_totals.items():
    head_score_totals[subscale] = totals[:5]
head_score_totals

{1: [30, 42, 28, 30, 28],
 2: [40, 46, 40, 38, 34],
 3: [48, 46, 40, 38, 46],
 4: [36, 40, 38, 40, 38],
 5: [42, 42, 42, 38, 36]}

`sum_trait_score` sums up score responses of each learner, starting from the first row of `personality_score_df` to the last. The purpose of getting the sum of each trait score for each learner, is to get an overview of how each individual scored according to the big five personality traits.

In [31]:
personality_score_df[
    [
        "Conscientiousness",
        "Emotional Stability",
        "Openness to experience",
        "Agreeableness",
        "Extraversion",
    ]
] = pd.DataFrame(
    {
        "Conscientiousness": score_totals[3],
        "Emotional Stability": score_totals[4],
        "Openness to experience": score_totals[5],
        "Agreeableness": score_totals[2],
        "Extraversion": score_totals[1],
    }
)

The following 5 columns were added to `personality_score_df`: `"Conscientiousness", "Emotional Stability", "Openness to experience", "Agreeableness", "Extraversion"`

In [32]:
personality_score_totals_df = personality_score_df
personality_score_totals_df.head()

Unnamed: 0,ID,I am always prepared,I am easily disturbed,I am exacting (demanding) in my work,I am full of ideas,I am interested in people,I am not interested in abstract ideas,I am not interested in other people's problems,I am not really interested in others,I am quick to understand things,I am quiet around strangers,I am relaxed most of the time,I am the life of the party,I change my mood a lot,I do not have a good imagination,I don't like to draw attention to myself,I don't mind being the center of attention,I don't talk a lot,I feel comfortable around people,I feel little concern for others,I feel others' emotions,I follow a schedule,I get chores done right away,I get irritated easily,I get stressed out easily,I get upset easily,I have a rich vocabulary,I have a soft (kind) heart,I have a vivid imagination,I have difficulty understanding abstract ideas,I have excellent ideas,I have frequent mood swings,I have little to say,I insult people,I keep in the background,I leave my belongings lying around,I like order,I make a mess of things,I make people feel at ease,I neglect my duties,I often feel blue (down),I often forget to put things back in their proper place,I pay attention to details,I seldom feel blue (down),I spend time reflecting on things,I start conversations,I sympathize with others' feelings,I take time out for others,I talk to a lot of different people at parties,I use difficult words,I worry about things,Conscientiousness,Emotional Stability,Openness to experience,Agreeableness,Extraversion
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)","(4, 3)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 3)","(5, 5)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 3)","(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",48,36,42,40,30
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 3)","(1, 5)","(1, 5)","(2, 1)","(2, 5)","(3, 5)","(3, 5)","(4, 5)","(4, 3)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 5)","(2, 5)","(1, 5)","(3, 5)","(3, 5)","(3, 5)","(2, 5)","(3, 5)","(4, 5)","(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",46,40,42,46,42
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 1)","(3, 3)","(3, 3)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",40,38,42,40,28
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)","(4, 3)","(1, 1)","(4, 5)","(5, 5)","(1, 1)","(1, 3)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 5)","(4, 5)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 3)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 1)","(3, 1)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 5)","(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)",38,40,38,38,30
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)","(4, 3)","(1, 3)","(4, 3)","(5, 5)","(1, 3)","(1, 1)","(1, 5)","(1, 3)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 5)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 3)","(4, 3)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 5)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",46,38,36,34,28


The resulting DataFrame was stored as `personality_score_totals_df`.

In [33]:
personality_score_totals_df.shape

(1555, 56)

Therefore, `personality_score_totals_df` has 56 columns and 1555 rows. The purpose of getting an overview of how each individual scored according to the big five personality traits, is so that learners' scores can be related the to the department they applied to. In the next few cells, this is achieved by merging the departments DataFrame with `personality_score_totals_df`.

In [34]:
loaded_departments.head()

Unnamed: 0,ID,Department
0,0,Data
1,1,Data
2,2,Data
3,3,Data
4,4,Data


`loaded_departments` contains the departments DataFrame.

In [35]:
loaded_departments.shape

(1555, 2)

`Loaded_departments` currently has 1555 rows and 2 columns.

To ensure that everything aligns perfectly and to safely merge `loaded_departments` and `personality_score_totals_df` on the `ID` column common to both DataFrames, with no resulting discrepancies, these `ID` columns need to be identical with no missing values. This is achieved in the following cells.

In [36]:
def assert_no_null_values(df):
    assert (
        not df.isnull().values.any()
    ), f"Expected: {False} for no null values, but got: {df.isnull().values.any()} which means that there are null values in the column."

In [37]:
assert_no_null_values(loaded_departments["ID"])

In [38]:
assert_no_null_values(personality_score_totals_df["ID"])

In [39]:
def assert_df_columns_identical(column_1, column_2):
    assert column_1.equals(
        column_2
    ), f"Expected: {True} for identical columns, but got: {column_1.equals(column_2)} which means the columns are not identical"

In [40]:
assert_df_columns_identical(loaded_departments["ID"], personality_score_totals_df["ID"])

Since the `ID` column in `loaded_departments` and `personality_score_totals_df` are identical, this means that their length should be identical too. This allows `loaded_departments` and `personality_score_totals_df` to be safely merged.

In [41]:
merged_personality_department_df = pd.merge(
    personality_score_totals_df, loaded_departments, on="ID"
)
merged_personality_department_df.head()

Unnamed: 0,ID,I am always prepared,I am easily disturbed,I am exacting (demanding) in my work,I am full of ideas,I am interested in people,I am not interested in abstract ideas,I am not interested in other people's problems,I am not really interested in others,I am quick to understand things,I am quiet around strangers,I am relaxed most of the time,I am the life of the party,I change my mood a lot,I do not have a good imagination,I don't like to draw attention to myself,I don't mind being the center of attention,I don't talk a lot,I feel comfortable around people,I feel little concern for others,I feel others' emotions,I follow a schedule,I get chores done right away,I get irritated easily,I get stressed out easily,I get upset easily,I have a rich vocabulary,I have a soft (kind) heart,I have a vivid imagination,I have difficulty understanding abstract ideas,I have excellent ideas,I have frequent mood swings,I have little to say,I insult people,I keep in the background,I leave my belongings lying around,I like order,I make a mess of things,I make people feel at ease,I neglect my duties,I often feel blue (down),I often forget to put things back in their proper place,I pay attention to details,I seldom feel blue (down),I spend time reflecting on things,I start conversations,I sympathize with others' feelings,I take time out for others,I talk to a lot of different people at parties,I use difficult words,I worry about things,Conscientiousness,Emotional Stability,Openness to experience,Agreeableness,Extraversion,Department
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)","(4, 3)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 3)","(5, 5)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 3)","(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",48,36,42,40,30,Data
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 3)","(1, 5)","(1, 5)","(2, 1)","(2, 5)","(3, 5)","(3, 5)","(4, 5)","(4, 3)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 5)","(2, 5)","(1, 5)","(3, 5)","(3, 5)","(3, 5)","(2, 5)","(3, 5)","(4, 5)","(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",46,40,42,46,42,Data
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 1)","(3, 3)","(3, 3)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",40,38,42,40,28,Data
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)","(4, 3)","(1, 1)","(4, 5)","(5, 5)","(1, 1)","(1, 3)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 5)","(4, 5)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 3)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 1)","(3, 1)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 5)","(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)",38,40,38,38,30,Data
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)","(4, 3)","(1, 3)","(4, 3)","(5, 5)","(1, 3)","(1, 1)","(1, 5)","(1, 3)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 5)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 3)","(4, 3)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 5)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",46,38,36,34,28,Data


- `pd.merge()` merges `personality_score_totals_df`'s "ID" column on the "ID" column in `loaded_departments`, resulting in 1 "ID" column in `merged_personality_department_df` identical to that of `personality_score_totals_df` and `loaded_departments`.

- Cell 33 shows that the initial number of columns in `personality_score_totals_df` is 56 and that of `loaded_departments` in cell 35 is 2.

- When the "ID" columns from these two DataFrames are merged onto one another, the remaining unmerged columns are 55 and 1 respectively.

- `pd.merge()` takes these remaining columns from these two DataFrames and matches rows where the `ID` in `personality_score_totals_df` corresponds to the `ID` in `loaded_departments`, then merges these columns accordingly.

- As a result, the `ID` column that comes from the merged `ID` columns of `personality_score_totals_df` and `loaded_departments`, plus the remaining columns from these two DataFrames, 55 and 1 respectively, that were merged, should add up to 57.

- So, the resulting DataFrame `merged_personality_department_df` should have 57 columns from merging `personality_score_totals_df` and `loaded_departments` together.

- Cell 33 shows that the initial number of rows in `personality_score_totals_df` is 1555.

- Cell 35 shows that `loaded_departments` also has 1555 rows.

- Cells 37 and 38 show that the `ID` columns of `loaded_departments` and `personality_score_totals_df` have no null values.

- Cell 40 shows that the `ID` columns of `loaded_departments` and `personality_score_totals_df` are identical. Therefore, the lengths of these `ID` columns are equals to the number of rows of either DataFrame.

- Since `personality_score_totals_df` and `loaded_departments` were merged to create `merged_personality_department_df`, `merged_personality_department_df` should therefore have 1555 rows.

In [42]:
assert_df_dimensions_equal(merged_personality_department_df, 1555, 57)

Since `merged_personality_department_df` relates the learners' personality trait scores to the department they applied to, this allows one to compare the learners' scores with a metric for determining whose personality does not fit well with their chosen department, and whose does. In the next cell, this is achieved by creating a column named `"Risk status"`, where its rows either contains `"High risk"` or `"Low risk"` depending on the metric. `"High risk"` tags learners whose personality does not fit well with their chosen department, and `"Low risk"` whose does.

In [43]:
merged_personality_department_df["Risk Status"] = (
    merged_personality_department_df.apply(
        lambda row: (
            "High risk"
            if (row["Emotional Stability"] < 30)
            and (row["Conscientiousness"] < 30)
            and (row["Agreeableness"] < 30)
            else "Low risk"
        ),
        axis=1,
    )
)

A new column `"Risk status"` was created in `merged_personality_department_df`, and the rows of this column was populated with either "High risk" or "Low risk", depending on which row values in `"Emotional Stability", "Conscientiousness" and "Agreeableness"` are less than 30, and which are not respectively.

In [44]:
risk_status_df = merged_personality_department_df
risk_status_df.head()

Unnamed: 0,ID,I am always prepared,I am easily disturbed,I am exacting (demanding) in my work,I am full of ideas,I am interested in people,I am not interested in abstract ideas,I am not interested in other people's problems,I am not really interested in others,I am quick to understand things,I am quiet around strangers,I am relaxed most of the time,I am the life of the party,I change my mood a lot,I do not have a good imagination,I don't like to draw attention to myself,I don't mind being the center of attention,I don't talk a lot,I feel comfortable around people,I feel little concern for others,I feel others' emotions,I follow a schedule,I get chores done right away,I get irritated easily,I get stressed out easily,I get upset easily,I have a rich vocabulary,I have a soft (kind) heart,I have a vivid imagination,I have difficulty understanding abstract ideas,I have excellent ideas,I have frequent mood swings,I have little to say,I insult people,I keep in the background,I leave my belongings lying around,I like order,I make a mess of things,I make people feel at ease,I neglect my duties,I often feel blue (down),I often forget to put things back in their proper place,I pay attention to details,I seldom feel blue (down),I spend time reflecting on things,I start conversations,I sympathize with others' feelings,I take time out for others,I talk to a lot of different people at parties,I use difficult words,I worry about things,Conscientiousness,Emotional Stability,Openness to experience,Agreeableness,Extraversion,Department,Risk Status
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)","(4, 3)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 3)","(5, 5)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 3)","(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",48,36,42,40,30,Data,Low risk
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 3)","(1, 5)","(1, 5)","(2, 1)","(2, 5)","(3, 5)","(3, 5)","(4, 5)","(4, 3)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 5)","(4, 5)","(1, 5)","(2, 5)","(1, 5)","(3, 5)","(3, 5)","(3, 5)","(2, 5)","(3, 5)","(4, 5)","(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",46,40,42,46,42,Data,Low risk
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)","(4, 5)","(1, 3)","(4, 5)","(5, 5)","(1, 3)","(1, 1)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 3)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 5)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 3)","(3, 1)","(3, 3)","(3, 3)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",40,38,42,40,28,Data,Low risk
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)","(4, 3)","(1, 1)","(4, 5)","(5, 5)","(1, 1)","(1, 3)","(1, 3)","(1, 5)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 5)","(4, 5)","(4, 5)","(5, 3)","(2, 3)","(5, 5)","(5, 3)","(5, 5)","(4, 5)","(1, 3)","(2, 5)","(1, 1)","(3, 1)","(3, 5)","(3, 3)","(2, 5)","(3, 5)","(4, 5)","(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)",38,40,38,38,30,Data,Low risk
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)","(4, 3)","(1, 3)","(4, 3)","(5, 5)","(1, 3)","(1, 1)","(1, 5)","(1, 3)","(2, 1)","(2, 3)","(3, 5)","(3, 5)","(4, 3)","(4, 5)","(4, 3)","(5, 3)","(2, 5)","(5, 3)","(5, 5)","(5, 3)","(4, 3)","(1, 3)","(2, 5)","(1, 3)","(3, 5)","(3, 5)","(3, 5)","(2, 3)","(3, 5)","(4, 5)","(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",46,38,36,34,28,Data,Low risk


The resulting DataFrame from adding the `"Risk Status"` column was stored in `risk_status_df`

In [45]:
risk_status_df["Department"].unique()

array(['Data', 'Web Dev', 'Copywriting', 'Design', 'Strategy', 'Web dev'],
      dtype=object)

Since `Web Dev` and `Web dev` are the same department, this will cause inconsistencies and discrepancies when analyzing data. To avoid this, `Web dev` must be replaced with `Web Dev` throughout the `Department` column.

In [46]:
risk_status_df["Department"] = risk_status_df["Department"].replace(
    "Web dev", "Web Dev"
)

For consistency in the `Department` column, "Web dev" was replaced with "Web Dev".

In [47]:
status_summary = {
    "Copywriting": [],
    "Data": [],
    "Design": [],
    "Strategy": [],
    "Web Dev": [],
}

In `status_summary`, the keys represent column names, which are departments. The values represent rows.

In [48]:
for column_name in list(status_summary.keys()):
    status_summary[column_name].append(
        (
            risk_status_df[risk_status_df["Department"] == column_name]["Risk Status"]
            == "Low risk"
        ).sum()
    )
    status_summary[column_name].append(
        (
            risk_status_df[risk_status_df["Department"] == column_name]["Risk Status"]
            == "High risk"
        ).sum()
    )
status_summary

{'Copywriting': [325, 1],
 'Data': [328, 1],
 'Design': [120, 0],
 'Strategy': [449, 0],
 'Web Dev': [331, 0]}

Learners who were tagged `"High risk"` or `"Low risk"` were categorized according to their respective departments.

In [49]:
risk_status_summary_df = (
    pd.DataFrame(status_summary)
    .rename_axis("Risk Status", axis=1)
    .rename(index={0: "Low risk", 1: "High risk"})
)


risk_status_summary_df

Risk Status,Copywriting,Data,Design,Strategy,Web Dev
Low risk,325,328,120,449,331
High risk,1,1,0,0,0


The category of learners' tagged `"High risk"` or `"Low risk"` was stored in `risk_status_summary_df`. This is a summary of findings about which learner's personality fits their chosen department, and which learner does not.

## Conclusion:

 Learners who are well-matched for their chosen department was determined using their personality trait scores. According to the above analysis, few learners are at risk, whilst most are well suited for the department they have chosen. This helps one make predictions about how learners' may perform in their chosen department, so that one can take pro-active steps to prevent any potential pitfalls. One can choose to inform learners of these findings to help them decide to either continue pursuing their chosen department, or switch to a different one.