## This project demonstrates measurement of categorical variables.

Here we consider responses compiled from NPI(Narcissistic Personality Inventory),a personality test with 40 questions about personal preferences and self-view. 
There are two possible responses to each question. The sample we’ll be working with contains responses to the following:

influence: yes = I have a natural talent for influencing people; no = I am not good at influencing people.
blend_in: yes = I prefer to blend in with the crowd; no = I like to be the center of attention.
special: yes = I think I am a special person; no = I am no better or worse than most people.
leader: yes = I see myself as a good leader; no = I am not sure if I would make a good leader.
authority: yes = I like to have authority over other people; no = I don’t mind following orders.

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

Read first few lines of the NPI responses csv

In [6]:
os.listdir()
npi_df=pd.read_csv('NPI_responses.csv')
npi_df.head()

Unnamed: 0,influence,blend_in,special,leader,authority
0,no,yes,yes,yes,yes
1,no,yes,no,no,no
2,yes,no,yes,yes,yes
3,yes,no,no,yes,yes
4,yes,yes,no,yes,no


Contingency tables(cross-tab) helps us link/access the strength across 2 categorical variables.
To see whether a person who considers himselp "special" also sees himself as having authority over others we use cross-tab.

In [7]:
special_auth_freq=pd.crosstab(npi_df.special,npi_df.authority)
special_auth_freq

authority,no,yes
special,Unnamed: 1_level_1,Unnamed: 2_level_1
no,4069,1905
yes,2229,2894


The table above says that 4069 people interoggated who considers themselves as not having authority over others also sees themselves as not special.

To better comprehend the association we make contingency table values as proportions of all the responses obtained.

In [11]:
special_auth_freq_prop=special_auth_freq/len(npi_df)
special_auth_freq_prop

authority,no,yes
special,Unnamed: 1_level_1,Unnamed: 2_level_1
no,0.366676,0.171668
yes,0.200865,0.260791


As obtained earlier the above table goes on to confirm that a larger proportion of responses collected says people who considers themselsves having no authority also thinks of them as not special.

We compute the marginal proportions of both respondents of "authority" & "special" to find out the majority.

In [12]:
authority_prop=special_auth_freq_prop.sum(axis=0)
special_prop=special_auth_freq_prop.sum(axis=1)
print(authority_prop,special_prop)

authority
no     0.567541
yes    0.432459
dtype: float64 special
no     0.538344
yes    0.461656
dtype: float64


Hence from the above more people tends to think they aren't special and niether do they possess skills to have authority over others.

To find out associations we could use the expected contingency tables i.e. table that highlights proportions if there were no associations & then compare ours with the expected.

In [17]:
from scipy.stats import chi2_contingency
chi2,b,c,expected=chi2_contingency(special_auth_freq)
print(np.round(expected))

[[3390. 2584.]
 [2908. 2215.]]


When compared with our freq table its observed that there lies greater difference between respondents who answered "no" to authority & special question implying greater association.

Instead of the above we could also use the Chi-Square statistic to interpret how different the 2 tables are 
where,

ChiSquare= ∑(observed-expected)^2/expected

Usually for a 2x2 table a value of 4 is considered to indicate a strong relationship.

In [18]:
chi2

679.1219526170606

Value obtained is beyond 4 indicative of a strong relationship implying variables are highly associated.