# Statistical Inference - Non Parametric Tests Demonstration In Python

## Objectives

1. To demonstrate applications of non-parametric tests of hypothesis using Python
2. To interpret the output for statistical inferences

### Import Libraries

In [6]:
import pandas as pd
from scipy.stats import mannwhitneyu
from scipy.stats import wilcoxon
from scipy.stats import kruskal
from scipy.stats import chi2_contingency


### 1. Mann-Whitney test 

### Background 
The Objective of the study is to understand factors driving buying behaviour of potential customers. One of the factor is ‘Colour’ of mobile phone.

Rating indicates importance given to ‘Colour’ for  buying a mobile phone.

Factor rating is measured on Likert scale.  (1-5)

1: Least Important   5:Most Important
     
### Data Description
| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| resid  | Respondent ID     | Numeric  |
| Gender  | Gender   | Character  |
| Color  | Color     | Numeric  |


In [7]:
data=pd.read_csv('Mobile Consumer Behaviour.csv')
data
group1 = data[data['Gender'] == 'M']['Color']
group2 = data[data['Gender'] == 'F']['Color']
mannwhitneyu(group1, group2, alternative="two-sided")


MannwhitneyuResult(statistic=523.5, pvalue=0.13191003525847111)

#### Inference :
Do not reject H0 which suggests that ratings given to 'colour' for buying mobile phone are same for males and females


### 2. Wilcoxon Signed Rank Test for paired data

### Background
Patient’s assessment about pain level is measured twice: before treatment and after treatment

The patient’s assessment is recorded on 1-4 scale

4: Severe pain  3:Moderate pain 2: Mild pain  1:No pain
     
The objective is to compare pain level before and after treatment



### Data Description

| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| patient_id  | Patient ID    | Numeric  |
| pain_before   | Pain before treatment      | Numeric  |
| pain_after   | Pain after treatment       | Numeric  |

In [8]:
data=pd.read_csv('Pain Level Assessment.csv')
wilcoxon(data['pain_before'], data['pain_after'], alternative = 'greater')

WilcoxonResult(statistic=406.0, pvalue=9.429983332079046e-07)

#### Inference :
Since pvalue < 0.05 Reject H0 and conclude that pain level has decreased significantly after treatment.



### 3. Kruskal Wallis test

### Background

HR manager  is interested in comparing feedback rating in 3 functions namely Marketing, Finance and IT

Objective of the study is to assess feedback of employees about performance appraisal process in the large company.

The feedback is measured on Likert scale.  (1-5)

1: Not satisfied at all   5:Very satisfied

### Data Description

| Columns     | Description                                      | Type     |
|--------------|--------------------------------------------------|----------|
| Empno  | Employee No.     | Numeric  |
| Satscore   | Satisfaction score       | Numeric  |
| Function   | Department       | Character  |



In [9]:
data=pd.read_csv('Performance Appraisal Feedback.csv')

group1 = data[data['Function'] == 'IT']['Satscore']
group2 = data[data['Function'] == 'Finance']['Satscore']
group3 = data[data['Function'] == 'Marketing']['Satscore']

kruskal(group1, group2, group3)


KruskalResult(statistic=0.36732288209940916, pvalue=0.8322175108236503)

#### Inference :
Do not reject H0 which suggests feedback is similar from employees in 3 different functions


### 4. Chi-square test of Association

### Background
The data consists of information regarding the performance and recruitment source of an employee

The objective is to check whether the performance and source of recruitment are associated

### Data Description
| Columns  | Description         | Type      |
|-----------|---------------------|-----------|
| sn               | Serial no.                         | Numeric   |
| performance      | Performance of an employee         | Character |
| source           | Recruitment source of an employee  | Character |


In [10]:
data = pd.read_csv('Recruitment Source.csv')
cont_table = pd.crosstab(data.performance, data.source)
chi2_contingency(cont_table)


Chi2ContingencyResult(statistic=107.37856396477088, pvalue=2.6359873347121296e-22, dof=4, expected_freq=array([[110.        ,  83.33333333,  96.66666667],
       [113.79310345,  86.20689655, 100.        ],
       [106.20689655,  80.45977011,  93.33333333]]))

#### Inference :
Reject the null hypothesis and conclude that 'Recruitment Source’ And ‘Employee Performance’ are associated.
