### Statistical Experiment: Job level and Correlation
- After extracting out the features of the data before, I noticed that very interestingly job (management) level and employee satisfaction are not correlated (via Pearsons)
- However, Pearson's correlation alone is not a good gauge of there is a relationship between our data, it mainly measures linear strength
    - We can potentially have Minimal Correlation but Significant ANOVA Results (focuses more on group summary statistic differences, group differences)
- I will be conducting a hypothesis test to better understand the relationship.

#### Permutation test:
Looking into the 3 experiment groups, we will analyze if being in different management levels effect employee satisfaction to see whether there are significant differences between the means of our three independent groups, so if one level has a significantly different mean satisfaction score.

Null Hypothesis `𝐻0`: Assumes all level means are equal.

Alternative Hypothesis `𝐻1` : Assumes at least one level is different

In [4]:
import numpy as np
import pandas as pd

In [7]:
df = pd.read_csv('/Users/Andre/OneDrive/Desktop/1_Glassdoor_Project/CleanedData/clean_data_2.csv')
df.head(5)

Unnamed: 0,industry,firm,job_title,level,status,years,location,overall_rating,work_life_balance,culture_values,...,comp_benefits,senior_mgmt,headline,pros,cons,combined_text,processed_text,word_count,char_count,sentiment_score
0,Retail,ASDA,Night Stocker,1,0.0,1.0,"Glasgow, Scotland, Scotland",3,3.0,2.0,...,3.0,2.0,"Mixed, it very much depends upon the Skills of...",If you live nearby and are physically sound t...,Multi skills or greater performance are not fi...,"Mixed, it very much depends upon the Skills of...",mixed depend Skills Manager seriously absent c...,48,358,0.8692
1,Retail,ASDA,Warehouse Operative,2,1.0,10.0,"London, England, England",5,5.0,5.0,...,5.0,5.0,"very good the freshness,good,support,freedom a...","the freshness,good,support,freedom and attitude",nothing nothing nothing nothing nothing nothing,"very good the freshness,good,support,freedom a...",good freshness good support freedom attitude...,11,88,0.4927
2,Retail,ASDA,Availibility,2,1.0,1.0,"London, England, England",3,3.0,2.0,...,3.0,2.0,"Good company, cares about employees...",Helpful and friendly working environment,Salary is not attractive compare to the curren...,"Good company, cares about employees... Helpful...",good company care employee helpful friendly wo...,13,101,0.8589
3,Retail,ASDA,Customer Service Assistant,1,1.0,5.0,"Glasgow, Scotland, Scotland",3,5.0,4.0,...,2.0,4.0,"Good culture, Great group of people to work wi...","Easy work, good training, 10% off discount car...","Career progression slow, with many people sitt...","Good culture, Great group of people to work wi...",good culture great group people work career pr...,52,353,0.9268
4,Retail,ASDA,Checkout Support,1,1.0,1.0,"Cardiff, Wales, Wales",5,4.0,5.0,...,4.0,5.0,Working in asda,I have felt like i was working among my family...,I can't really think of any .,Working in asda I have felt like i was working...,work asda feel like work family work think,8,42,0.3612


In [1]:
def permutation_test_corr(x, y, num_permutations=10000):
    
    # Calculate the observed correlation
    observed_corr = np.corrcoef(x, y)[0, 1]
    # array to store permuted correlations
    permuted_corrs = np.zeros(num_permutations)
    
    for i in range(num_permutations):
        # Shuffle y and calculate the correlation with x
        shuffled_y = np.random.permutation(y)
        permuted_corrs[i] = np.corrcoef(x, shuffled_y)[0, 1]
    
    # Calculate the p-value
    p_value = np.sum(np.abs(permuted_corrs) >= np.abs(observed_corr)) / num_permutations
    
    return observed_corr, p_value

In [None]:
rate_cols = ['work_life_balance', 'culture_values', 'career_opp','comp_benefits', 'senior_mgmt']

for rate_col in rate_cols:
    observed_corr, p_value = permutation_test_corr(df['overall_rating'], df[rate_col])
    print(f"{rate_col} Observed Correlation: {observed_corr}")
    print(f"P-value: {p_value}")