# Data Wrangling

**Data wrangling, sometimes referred to as data munging, 
is the process of transforming and mapping data from one "raw" data form into another 
format with the intent of making it more appropriate 
and valuable for a variety of downstream purposes such as analytics.**
*Wikipedia*

## Importing data

In [1]:
import pandas as pd
import numpy as np

In [2]:
personality_data = pd.read_csv('personality_scores.csv',sep=';')

In [3]:
personality_data.head()

Unnamed: 0,ID,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],...,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,IPIP_HIGH_RISK
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)",...,,,,,,,,,,
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)",...,,,,,,,,,,
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)",...,,,,,,,,,,
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)",...,,,,,,,,,,
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)",...,,,,,,,,,,


In [4]:
# personality_data.info()

In [5]:
personality_data.isnull().sum()

ID                                                           0
Section 5 of 6 [I am always prepared.]                       0
Section 5 of 6 [I am easily disturbed.]                      0
Section 5 of 6 [I am exacting (demanding) in my work.]       0
Section 5 of 6 [I am full of ideas.]                         0
                                                          ... 
Unnamed: 65                                               1555
Unnamed: 66                                               1555
Unnamed: 67                                               1555
Unnamed: 68                                               1555
IPIP_HIGH_RISK                                            1555
Length: 70, dtype: int64

# Find Duplicate Rows based on selected columns

In [6]:
personality_data.ID.unique()

array([   0,    1,    2, ..., 1552, 1553, 1554])

In [7]:
personality_data[personality_data.duplicated(keep=False)]


Unnamed: 0,ID,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],...,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,IPIP_HIGH_RISK


In [8]:
personality_data[personality_data['ID'].duplicated() == True]

Unnamed: 0,ID,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],...,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,IPIP_HIGH_RISK


In [9]:
dups_id=personality_data.pivot_table(index=['ID'],aggfunc='size').sum()

print(dups_id)

1555


## Drop any duplicates that exist

In [10]:

data=personality_data.drop_duplicates(subset=['ID'])


### Comparing the length of the new data frame with old one that may have had duplicates.

*Python's assert statement is a debugging aid that tests a condition.
If the condition is true, it does nothing and your program just continues to execute.
But if the assert condition evaluates to false,
it raises an AssertionError exception with an optional error message.*


*The assert statement should show that the length of unique values of the original 
data is the same as the length of the  data set were duplicates are dropped.*

In [11]:
old_data=len(personality_data.ID.unique())

In [12]:
new_data=len(data)

In [13]:
assert old_data==new_data

In [14]:
print('The length of unique IDs is:',old_data)

The length of unique IDs is: 1555


In [15]:
print('The length of data set after dropping duplicates:',new_data)

The length of data set after dropping duplicates: 1555


*The unique values of old data set has the same length
as the new data set of which is the old data set with dropped duplicate values.*

### Dropping columns with null values

In [16]:
data=data.dropna(axis='columns')
data

Unnamed: 0,ID,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],...,Section 5 of 6 [I often forget to put things back in their proper place],Section 5 of 6 [I pay attention to details.],Section 5 of 6 [I seldom feel blue (down).],Section 5 of 6 [I spend time reflecting on things.],Section 5 of 6 [I start conversations.],Section 5 of 6 [I sympathize with others' feelings.],Section 5 of 6 [I take time out for others.],Section 5 of 6 [I talk to a lot of different people at parties.],Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.]
0,0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)",...,"(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
1,1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)",...,"(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)"
2,2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)",...,"(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
3,3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)",...,"(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)"
4,4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)",...,"(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1550,1550,"(3, 5)","(4, 5)","(3, 1)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)",...,"(3, 1)","(3, 5)","(4, 1)","(5, 3)","(1, 5)","(2, 5)","(2, 3)","(1, 1)","(5, 1)","(4, 5)"
1551,1551,"(3, 3)","(4, 5)","(3, 5)","(5, 3)","(2, 5)","(5, 3)","(2, 3)","(2, 5)","(5, 5)",...,"(3, 3)","(3, 3)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 3)"
1552,1552,"(3, 5)","(4, 3)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 3)","(2, 3)","(5, 5)",...,"(3, 3)","(3, 5)","(4, 5)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)"
1553,1553,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)",...,"(3, 5)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 3)","(2, 3)","(1, 1)","(5, 1)","(4, 3)"


## converting all the columns into a list

### creating a dictionary

In [17]:
Dict={1:'Extraversion', 2:'Agreeableness', 3:'Conscientiousness',
      4:'Emotional Stability',5:'Intellect'}

In [18]:
data=data.set_index('ID')

In [19]:
list_of_rows=data.values.tolist()

In [20]:
# lis_of_rows=data.values.tolist()
len(list_of_rows)

1555

In [21]:
data

Unnamed: 0_level_0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I often forget to put things back in their proper place],Section 5 of 6 [I pay attention to details.],Section 5 of 6 [I seldom feel blue (down).],Section 5 of 6 [I spend time reflecting on things.],Section 5 of 6 [I start conversations.],Section 5 of 6 [I sympathize with others' feelings.],Section 5 of 6 [I take time out for others.],Section 5 of 6 [I talk to a lot of different people at parties.],Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.]
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)",...,"(3, 5)","(3, 5)","(4, 3)","(5, 5)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(3, 5)","(3, 1)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)"
2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)",...,"(3, 5)","(3, 5)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)",...,"(3, 1)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)"
4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)",...,"(3, 5)","(3, 5)","(4, 5)","(5, 5)","(1, 3)","(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1550,"(3, 5)","(4, 5)","(3, 1)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(3, 1)","(3, 5)","(4, 1)","(5, 3)","(1, 5)","(2, 5)","(2, 3)","(1, 1)","(5, 1)","(4, 5)"
1551,"(3, 3)","(4, 5)","(3, 5)","(5, 3)","(2, 5)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 5)",...,"(3, 3)","(3, 3)","(4, 1)","(5, 3)","(1, 3)","(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 3)"
1552,"(3, 5)","(4, 3)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 3)","(2, 3)","(5, 5)","(1, 3)",...,"(3, 3)","(3, 5)","(4, 5)","(5, 5)","(1, 5)","(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)"
1553,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 5)",...,"(3, 5)","(3, 5)","(4, 1)","(5, 5)","(1, 5)","(2, 3)","(2, 3)","(1, 1)","(5, 1)","(4, 3)"


### creating a function for addition

In [22]:
p=[]
for l in list_of_rows:
    l=[eval(l[i]) for i in range(len(l))]
    
    totals = {}
    for uid, x in l:
        if uid not in totals :
            totals[uid] = x
        else :
            totals[uid] += x    
    p.append(totals)
    a=pd.DataFrame(p)
    
  

In [23]:
a.columns = a.columns.to_series().map(Dict)

In [24]:
a

Unnamed: 0,Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion
0,48,36,42,40,30
1,46,40,42,46,42
2,40,38,42,40,28
3,38,40,38,38,30
4,46,38,36,34,28
...,...,...,...,...,...
1550,38,44,44,48,32
1551,40,44,34,48,40
1552,40,46,48,44,42
1553,48,36,44,38,36


### merging the dataframes

*merging data and a*

In [25]:
 result = pd.concat([data, a], axis=1, ignore_index=False)

In [26]:
result

Unnamed: 0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I sympathize with others' feelings.],Section 5 of 6 [I take time out for others.],Section 5 of 6 [I talk to a lot of different people at parties.],Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion
0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)",...,"(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",48,36,42,40,30
1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",46,40,42,46,42
2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)",...,"(2, 5)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",40,38,42,40,28
3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)",...,"(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 1)",38,40,38,38,30
4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)",...,"(2, 3)","(2, 5)","(1, 3)","(5, 1)","(4, 3)",46,38,36,34,28
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1550,"(3, 5)","(4, 5)","(3, 1)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(2, 5)","(2, 3)","(1, 1)","(5, 1)","(4, 5)",38,44,44,48,32
1551,"(3, 3)","(4, 5)","(3, 5)","(5, 3)","(2, 5)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 5)",...,"(2, 5)","(2, 5)","(1, 5)","(5, 1)","(4, 3)",40,44,34,48,40
1552,"(3, 5)","(4, 3)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 3)","(2, 3)","(5, 5)","(1, 3)",...,"(2, 5)","(2, 5)","(1, 5)","(5, 3)","(4, 3)",40,46,48,44,42
1553,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 5)",...,"(2, 3)","(2, 3)","(1, 1)","(5, 1)","(4, 3)",48,36,44,38,36


## Import the data in departments.csv.

*Merge this data frame with the personality score data frame.*

In [27]:
department_data = pd.read_csv('departments.csv',sep=(';'))

In [28]:
department_data

Unnamed: 0,ID,Department
0,0,Data
1,1,Data
2,2,Data
3,3,Data
4,4,Data
...,...,...
1550,1550,Web dev
1551,1551,Web dev
1552,1552,Web dev
1553,1553,Web dev


In [29]:
 merge = pd.concat([result, department_data] ,axis=1, ignore_index=False)

In [30]:
merge

Unnamed: 0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I talk to a lot of different people at parties.],Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion,ID,Department
0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)",...,"(1, 3)","(5, 1)","(4, 3)",48,36,42,40,30,0,Data
1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(1, 5)","(5, 3)","(4, 3)",46,40,42,46,42,1,Data
2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)",...,"(1, 3)","(5, 1)","(4, 3)",40,38,42,40,28,2,Data
3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)",...,"(1, 5)","(5, 1)","(4, 1)",38,40,38,38,30,3,Data
4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)",...,"(1, 3)","(5, 1)","(4, 3)",46,38,36,34,28,4,Data
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1550,"(3, 5)","(4, 5)","(3, 1)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(1, 1)","(5, 1)","(4, 5)",38,44,44,48,32,1550,Web dev
1551,"(3, 3)","(4, 5)","(3, 5)","(5, 3)","(2, 5)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 5)",...,"(1, 5)","(5, 1)","(4, 3)",40,44,34,48,40,1551,Web dev
1552,"(3, 5)","(4, 3)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 3)","(2, 3)","(5, 5)","(1, 3)",...,"(1, 5)","(5, 3)","(4, 3)",40,46,48,44,42,1552,Web dev
1553,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 5)",...,"(1, 1)","(5, 1)","(4, 3)",48,36,44,38,36,1553,Web dev


*Use an assert statement to check that the newly created merged data frame has the same amount of 
rows as the department data frame, and the expected number of columns.*

In [31]:
assert len(result)==len(department_data)

In [32]:
len(result)


1555

In [33]:
len(department_data)

1555

*The expected number of columns on a merged DataFrame is 57 columns. 
The reason being that Personality data has 55 columns(Empty columns are dropped)
and Department data has 2.So there are 57 expected columns.*

In [34]:
assert len(merge.columns)==57

In [35]:
len(merge.columns)

57

In [36]:
len(result.columns)

55

In [37]:
for x in merge['ID']:
    id=[]
    if merge['conscientiousness'] < 30:
        
            return ID
    

SyntaxError: 'return' outside function (<ipython-input-37-d4e54a232928>, line 5)

In [None]:
len(score_30)

In [83]:
score = merge[(merge['Conscientiousness']<30) & (merge['Agreeableness']<30) & (merge['Emotional Stability'] <30)]

In [84]:
score

Unnamed: 0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion,ID,Department,Risk
881,"(3, 3)","(4, 1)","(3, 1)","(5, 5)","(2, 1)","(5, 3)","(2, 5)","(2, 3)","(5, 3)","(1, 5)",...,"(5, 1)","(4, 1)",26,28,36,28,30,881,Data,high risk
1197,"(3, 5)","(4, 5)","(3, 1)","(5, 1)","(2, 1)","(5, 3)","(2, 5)","(2, 1)","(5, 1)","(1, 5)",...,"(5, 5)","(4, 1)",26,26,28,22,40,1197,Copywriting,high risk


In [85]:
individuals=score[['ID','Department']]

In [86]:
print(individuals)

        ID   Department
881    881         Data
1197  1197  Copywriting


In [92]:
individuals

Unnamed: 0,ID,Department
881,881,Data
1197,1197,Copywriting


In [81]:
merge['Risk']=np.where((merge['Conscientiousness']<30) & (merge['Agreeableness']<30) & (merge['Emotional Stability'] <30),'high risk','low risk')

In [82]:
merge

Unnamed: 0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion,ID,Department,Risk
0,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 3)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 3)",...,"(5, 1)","(4, 3)",48,36,42,40,30,0,Data,low risk
1,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 3)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(5, 3)","(4, 3)",46,40,42,46,42,1,Data,low risk
2,"(3, 5)","(4, 3)","(3, 3)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 1)",...,"(5, 1)","(4, 3)",40,38,42,40,28,2,Data,low risk
3,"(3, 5)","(4, 5)","(3, 3)","(5, 5)","(2, 5)","(5, 3)","(2, 3)","(2, 3)","(5, 3)","(1, 3)",...,"(5, 1)","(4, 1)",38,40,38,38,30,3,Data,low risk
4,"(3, 3)","(4, 5)","(3, 3)","(5, 3)","(2, 3)","(5, 3)","(2, 3)","(2, 3)","(5, 5)","(1, 1)",...,"(5, 1)","(4, 3)",46,38,36,34,28,4,Data,low risk
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1550,"(3, 5)","(4, 5)","(3, 1)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 3)",...,"(5, 1)","(4, 5)",38,44,44,48,32,1550,Web dev,low risk
1551,"(3, 3)","(4, 5)","(3, 5)","(5, 3)","(2, 5)","(5, 3)","(2, 3)","(2, 5)","(5, 5)","(1, 5)",...,"(5, 1)","(4, 3)",40,44,34,48,40,1551,Web dev,low risk
1552,"(3, 5)","(4, 3)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 3)","(2, 3)","(5, 5)","(1, 3)",...,"(5, 3)","(4, 3)",40,46,48,44,42,1552,Web dev,low risk
1553,"(3, 5)","(4, 5)","(3, 5)","(5, 5)","(2, 5)","(5, 5)","(2, 5)","(2, 5)","(5, 5)","(1, 5)",...,"(5, 1)","(4, 3)",48,36,44,38,36,1553,Web dev,low risk


In [None]:
def f(row):
    if row['A'] == row['B']:
        val = 0
    elif row['A'] > row['B']:
        val = 1
    else:
        val = -1
    return val

In [77]:
i=[]
def f(row):
    for row in merge:
        if row== merge[(merge['Conscientiousness']<30) & (merge['Agreeableness']<30) & (merge['Emotional Stability'] <30)]:
            val = 'High risk'
        
        else:
            val='low risk'
        return val
i.append(val)


NameError: name 'val' is not defined

*apply this to the dataframe, firstly add the column *

In [64]:
score = merge[(merge['Conscientiousness']<30) & (merge['Agreeableness']<30) & (merge['Emotional Stability'] <30)]


In [65]:
score

Unnamed: 0,Section 5 of 6 [I am always prepared.],Section 5 of 6 [I am easily disturbed.],Section 5 of 6 [I am exacting (demanding) in my work.],Section 5 of 6 [I am full of ideas.],Section 5 of 6 [I am interested in people.],Section 5 of 6 [I am not interested in abstract ideas.],Section 5 of 6 [I am not interested in other people's problems.],Section 5 of 6 [I am not really interested in others.],Section 5 of 6 [I am quick to understand things.],Section 5 of 6 [I am quiet around strangers.],...,Section 5 of 6 [I use difficult words.],Section 5 of 6 [I worry about things.],Conscientiousness,Emotional Stability,Intellect,Agreeableness,Extraversion,ID,Department,Risk
881,"(3, 3)","(4, 1)","(3, 1)","(5, 5)","(2, 1)","(5, 3)","(2, 5)","(2, 3)","(5, 3)","(1, 5)",...,"(5, 1)","(4, 1)",26,28,36,28,30,881,Data,
1197,"(3, 5)","(4, 5)","(3, 1)","(5, 1)","(2, 1)","(5, 3)","(2, 5)","(2, 1)","(5, 1)","(1, 5)",...,"(5, 5)","(4, 1)",26,26,28,22,40,1197,Copywriting,


In [98]:
dataframe = 

In [99]:
print(dataframe)

     Department
0          Data
1          Data
2          Data
3          Data
4          Data
...         ...
1550    Web dev
1551    Web dev
1552    Web dev
1553    Web dev
1554    Web dev

[1555 rows x 1 columns]
