## Processing the results of a psychological test (using the Pandas library)

The file `extraversion.csv` contains the results of an educational psychometric study, its purpose is to identify the relationship between a person's level of extraversion and his propensity to participate in volunteer activities.

In [1]:
import pandas as pd
import numpy as np

In [2]:
ps = pd.read_csv("extraversion.csv", encoding = "UTF-8")
ps.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,sex,volunteer,Q 1,Q 2,Q 3,Q 4,Q 5,Q 6,...,Q 48,Q 49,Q 50,Q 51,Q 52,Q 53,Q 54,Q 55,Q 56,Q 57
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Да,Нет,Нет,Да,Да,Нет,Нет,Нет,Нет,Да
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Да,Да,Нет,Нет,Нет,Да,Нет,Да,Да,Нет
2,2,3,Женский,Да,Да,Да,Нет,Нет,Да,Да,...,Да,Да,Нет,Нет,Да,Нет,Нет,Да,Да,Нет
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Да,Нет,Да,Да,Нет,Нет,Нет,Да,Нет,Нет
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Да,Да,Нет,Нет,Да,Да,Да,Да,Да,Нет


The dataframe contains the following columns:

* `sex': gender of a respondent (Female, Male);
* `volunteer': regular participation in volunteer activities (Yes, No);
* `Q 1` - `Q 57`: answers to questions of the Eysenck questionnaire (Yes, No). The information about the questionnaire and the questions themselves can be found on [this](http://ipp.hse.ru/57-testytest-ajzenka-ekstraversiya-introversiya-nejrotizm ) the page.

In [3]:
print(f'The dataframe contains {ps.shape[0]} rows, {ps.shape[1]} columns.')

The dataframe contains 52 rows, 61 columns.


We rename columns `Q 1`-`Q 57` to `Q1`-`Q57`, removing spaces in the middle of the names for all the columns (if any).

In [4]:
ps.columns = list(map(lambda elem: elem.replace(' ', ''), ps.columns))
ps.head()

Unnamed: 0,Unnamed:0.1,Unnamed:0,sex,volunteer,Q1,Q2,Q3,Q4,Q5,Q6,...,Q48,Q49,Q50,Q51,Q52,Q53,Q54,Q55,Q56,Q57
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Да,Нет,Нет,Да,Да,Нет,Нет,Нет,Нет,Да
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Да,Да,Нет,Нет,Нет,Да,Нет,Да,Да,Нет
2,2,3,Женский,Да,Да,Да,Нет,Нет,Да,Да,...,Да,Да,Нет,Нет,Да,Нет,Нет,Да,Да,Нет
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Да,Нет,Да,Да,Нет,Нет,Нет,Да,Нет,Нет
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Да,Да,Нет,Нет,Да,Да,Да,Да,Да,Нет


We have to calculate the extraversion index. For this we:
* select columns Q1, Q3, Q8, Q10, Q13, Q17, Q22, Q25, Q27, Q39, Q44, Q46, Q49, Q53, Q56 and save them to a separate extra_yes dataframe,
* Select columns Q5, Q15, Q20, Q29, Q32, Q34, Q37,Q41, Q51 and save them to a separate extra_no dataframe.

In [5]:
extra_yes = ps.loc[:, ['Q1', 'Q3', 'Q8', 'Q10', 'Q13', 'Q17', 'Q22', 'Q25', 'Q27', 'Q39', 'Q44', 'Q46', 'Q49', 'Q53', 'Q56']]
extra_yes.head()

Unnamed: 0,Q1,Q3,Q8,Q10,Q13,Q17,Q22,Q25,Q27,Q39,Q44,Q46,Q49,Q53,Q56
0,Да,Нет,Нет,Нет,Да,Нет,Нет,Нет,Да,Нет,Нет,Нет,Нет,Нет,Нет
1,Да,Нет,Да,Да,Нет,Да,Нет,Да,Да,Да,Нет,Нет,Да,Да,Да
2,Да,Нет,Нет,Нет,Нет,Да,Нет,Да,Да,Да,Да,Нет,Да,Нет,Да
3,Да,Нет,Нет,Нет,Нет,Да,Нет,Нет,Нет,Нет,Нет,Нет,Нет,Нет,Нет
4,Да,Нет,Нет,Нет,Да,Да,Нет,Да,Да,Нет,Нет,Да,Да,Да,Да


In [6]:
extra_no = ps.loc[:, ['Q5', 'Q15', 'Q20', 'Q29', 'Q32', 'Q34', 'Q37', 'Q41', 'Q51']]
extra_no.head()

Unnamed: 0,Q5,Q15,Q20,Q29,Q32,Q34,Q37,Q41,Q51
0,Да,Да,Да,Да,Да,Нет,Да,Нет,Да
1,Да,Нет,Да,Нет,Нет,Да,Нет,Нет,Нет
2,Да,Да,Да,Нет,Нет,Да,Нет,Нет,Нет
3,Да,Нет,Да,Да,Да,Да,Да,Нет,Да
4,Да,Нет,Да,Нет,Да,Да,Да,Нет,Нет


We calculate the number of `Да (Yes)` answers for each row in the `extra_yes` dataframe and save the result to the `extra_yes_sum` variable.

In [7]:
extra_yes_sum = extra_yes.isin(['Да']).sum(axis=1)
extra_yes_sum.head()

0     3
1    10
2     8
3     2
4     9
dtype: int64

We calculate the number of `Нет (No)` answers for each row in the `extra_no` dataframe and save the result to the `extra_no_sum` variable.

In [8]:
extra_no_sum = extra_no.isin(['Нет']).sum(axis=1)
extra_no_sum.head()

0    2
1    6
2    5
3    2
4    4
dtype: int64

We add an `extra` column to the original dataframe.

This is an extraversion index, which is calculated as follows: the sum of the number of "Yes" answers in extra_yes and the number of "No" answers in extra_no.

In [9]:
ps['extra'] = extra_yes_sum + extra_no_sum
ps.head()

Unnamed: 0,Unnamed:0.1,Unnamed:0,sex,volunteer,Q1,Q2,Q3,Q4,Q5,Q6,...,Q49,Q50,Q51,Q52,Q53,Q54,Q55,Q56,Q57,extra
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Нет,Нет,Да,Да,Нет,Нет,Нет,Нет,Да,5
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Да,Нет,Нет,Нет,Да,Нет,Да,Да,Нет,16
2,2,3,Женский,Да,Да,Да,Нет,Нет,Да,Да,...,Да,Нет,Нет,Да,Нет,Нет,Да,Да,Нет,13
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Нет,Да,Да,Нет,Нет,Нет,Да,Нет,Нет,4
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Да,Нет,Нет,Да,Да,Да,Да,Да,Нет,13


We add a `female` column to the original dataframe. It is consisting of the values 0 and 1 (0 is Male, 1 is Female).

In [10]:
ps['female'] = (ps['sex'] == 'Женский').astype(int)
ps.head(15)

Unnamed: 0,Unnamed:0.1,Unnamed:0,sex,volunteer,Q1,Q2,Q3,Q4,Q5,Q6,...,Q50,Q51,Q52,Q53,Q54,Q55,Q56,Q57,extra,female
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Нет,Да,Да,Нет,Нет,Нет,Нет,Да,5,1
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Нет,Да,Нет,Да,Да,Нет,16,1
2,2,3,Женский,Да,Да,Да,Нет,Нет,Да,Да,...,Нет,Нет,Да,Нет,Нет,Да,Да,Нет,13,1
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Да,Да,Нет,Нет,Нет,Да,Нет,Нет,4,1
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Да,Да,Да,Да,Да,Нет,13,1
5,5,6,Женский,Нет,Да,Нет,Да,Да,Нет,Нет,...,Нет,Нет,Нет,Да,Да,Нет,Да,Нет,19,1
6,6,7,Женский,Нет,Да,Нет,Нет,Да,Да,Нет,...,Нет,Нет,Да,Да,Да,Да,Нет,Нет,15,1
7,7,8,Женский,Да,Да,Да,Да,Нет,Да,Да,...,Да,Нет,Да,Да,Да,Да,Да,Нет,21,1
8,8,9,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Да,Да,Нет,Да,Да,Нет,12,1
9,9,10,Женский,Да,Да,Да,Нет,Нет,Да,Нет,...,Нет,Нет,Нет,Да,Да,Да,Да,Нет,19,1


We select rows from the original dataframe that correspond to either volunteers with an extraversion index above 15 or non-volunteers with an extraversion index below 15.

We save to a `pure` dataframe.

In [11]:
pure = ps[((ps['volunteer'] == 'Да') & (ps['extra'] > 15)) | ((ps['volunteer'] == 'Нет') & (ps['extra'] < 15))]
pure.head()

Unnamed: 0,Unnamed:0.1,Unnamed:0,sex,volunteer,Q1,Q2,Q3,Q4,Q5,Q6,...,Q50,Q51,Q52,Q53,Q54,Q55,Q56,Q57,extra,female
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Нет,Да,Да,Нет,Нет,Нет,Нет,Да,5,1
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Нет,Да,Нет,Да,Да,Нет,16,1
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Да,Да,Нет,Нет,Нет,Да,Нет,Нет,4,1
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Да,Да,Да,Да,Да,Нет,13,1
7,7,8,Женский,Да,Да,Да,Да,Нет,Да,Да,...,Да,Нет,Да,Да,Да,Да,Да,Нет,21,1


In [12]:
print('The number of volunteers in the pure dataframe is:', pure[pure['volunteer'] == 'Да'].shape[0])

The number of volunteers in the pure dataframe is: 6


In [13]:
print('The number of non-volunteers in the pure dataframe is:', pure[pure['volunteer'] == 'Нет'].shape[0])

The number of non-volunteers in the pure dataframe is: 21


In [15]:
extra_min = pure['extra'].min()
print(f'The minimum value of the extraversion index in the pure dataframe: {extra_min}')

The minimum value of the extraversion index in the pure dataframe: 2


In [16]:
extra_max = pure['extra'].max()
print(f'The maximum value of the extraversion index in the pure dataframe: {extra_max}')

The maximum value of the extraversion index in the pure dataframe: 21


In [18]:
extra_mean = pure['extra'].mean()
print(f'The average value of the extraversion index in the pure dataframe: {round(extra_mean, 2)}')

The average value of the extraversion index in the pure dataframe: 11.22


In [20]:
extra_median = pure['extra'].median()
print(f'The median value of the extraversion index in the pure dataframe: {extra_median}')

The median value of the extraversion index in the pure dataframe: 11.0


We add a `high' column to the 'pure' dataframe, consisting of 0 and 1, where:
* 1 corresponds to respondents whose level of extraversion is higher than $m = \max\{\text{median}, \text{mean}\}$, in other words, the maximum of the median and average values,
* 0 corresponds to respondents with a level of extraversion not higher.

In [22]:
pure.head()

Unnamed: 0,Unnamed:0.1,Unnamed:0,sex,volunteer,Q1,Q2,Q3,Q4,Q5,Q6,...,Q50,Q51,Q52,Q53,Q54,Q55,Q56,Q57,extra,female
0,0,1,Женский,Нет,Да,Нет,Нет,Да,Да,Да,...,Нет,Да,Да,Нет,Нет,Нет,Нет,Да,5,1
1,1,2,Женский,Да,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Нет,Да,Нет,Да,Да,Нет,16,1
3,3,4,Женский,Нет,Да,Да,Нет,Нет,Да,Да,...,Да,Да,Нет,Нет,Нет,Да,Нет,Нет,4,1
4,4,5,Женский,Нет,Да,Да,Нет,Да,Да,Да,...,Нет,Нет,Да,Да,Да,Да,Да,Нет,13,1
7,7,8,Женский,Да,Да,Да,Да,Нет,Да,Да,...,Да,Нет,Да,Да,Да,Да,Да,Нет,21,1
