In [1]:
import pandas as pd
import numpy as np

# Upload the file manually in Google Colab
from google.colab import files
uploaded = files.upload()

# Load CSV
df = pd.read_csv(next(iter(uploaded)))

Saving Diversity and inclusion_SPSS_DESCRIPTIVE3.csv to Diversity and inclusion_SPSS_DESCRIPTIVE3.csv


In [2]:
df.head()

Unnamed: 0,AGE,AGE1,AGE2,AGE3,AGE4,AGE5,Older_binary,Younger,Middle,Older,...,CI_IntegrationOfDiff5,CI_IntegrationOfDiff6,CI_IntegrationOfDiff7,CI_DecisionMaking1,CI_DecisionMaking2,CI_DecisionMaking3,CI,CI14,FINNISH,GenderBinary
0,2,0,1,0,0,0,0,0,1,0,...,1,1,1,1,1,1,1.266667,1.285714,0,1
1,2,0,1,0,0,0,0,0,1,0,...,3,5,6,6,5,5,4.8,4.714286,0,1
2,2,0,1,0,0,0,0,0,1,0,...,6,5,7,6,6,5,5.133333,5.0,0,0
3,1,1,0,0,0,0,0,1,0,0,...,1,1,1,3,2,3,2.6,2.714286,0,0
4,3,0,0,1,0,0,0,0,1,0,...,3,3,5,4,4,4,3.666667,3.571429,0,1


# rWG and ADI and SD

In the following, we compare rWG, ADI, and SD. SOme points to distinguish the 3:
- ADI is based on absolute deviation from the mean
- SD is based on squared deviation from the mean
- rWG is based on observed vs. expected variance

So while these three are related, they each offer a slightly different lens on agreement.

SOme **caution** about SD:
- It doesn’t account for group size effects (rWG does, ADI partially does).
- It doesn’t benchmark against a null expectation (like rWG does).
- It assumes **normality**, which may not always hold on Likert scales.

In [3]:
def rwg_fixed(scores, expected_var):
    scores = np.array(scores)
    if len(scores) < 2:
        return np.nan
    observed_var = np.var(scores, ddof=0)
    return 1 - (observed_var / expected_var)

def adi(scores):
    scores = np.array(scores)
    if len(scores) < 2:
        return np.nan
    return np.mean(np.abs(scores - np.mean(scores)))

# --- Expected variances for different null distributions ---
expected_vars = {
    "rWG_uniform": 4.0,
    "rWG_skewed": 2.90,
    "rWG_triangular": 2.10
}

In [8]:
# -- Social identity variables --
social_identity_groups = {
    "Older_binary": "Older_binary",
    "Caregiver": "CAREGIVER",
    "Finnish": "FINNISH",
    "GenderBinary": "GenderBinary"
}

social_identity_results = []

for var_label, var_name in social_identity_groups.items():
    for group_value in [0, 1]:
        group_data = df[df[var_name] == group_value]["CI14"]
        if len(group_data) >= 3:
            result = {
                "GroupType": "Social Identity",
                "Variable": var_label,
                "GroupValue": group_value,
                "GroupSize": len(group_data),
                "SD": np.std(group_data, ddof=0),
                "ADI": adi(group_data)
            }
            for label, sigma_sq in expected_vars.items():
                result[label] = rwg_fixed(group_data, sigma_sq)
            social_identity_results.append(result)

social_df_sd = pd.DataFrame(social_identity_results)

# -- Organizational Units --
org_units = ['AUDASS', 'TAXLEG', 'ADVISO', 'SHARED']

def get_org_unit(row):
    for unit in org_units:
        if row[unit] == 1:
            return unit
    return "Unknown"

df["OrgUnit"] = df.apply(get_org_unit, axis=1)

org_unit_results = []

for unit in df["OrgUnit"].unique():
    group_data = df[df["OrgUnit"] == unit]["CI14"]
    if len(group_data) >= 3:
        result = {
            "GroupType": "Organizational Unit",
            "Variable": "OrgUnit",
            "GroupValue": unit,
            "GroupSize": len(group_data),
            "SD": np.std(group_data, ddof=0),
            "ADI": adi(group_data)
        }
        for label, sigma_sq in expected_vars.items():
            result[label] = rwg_fixed(group_data, sigma_sq)
        org_unit_results.append(result)

org_df_sd = pd.DataFrame(org_unit_results)

# -- Combine and round results --
results_df = pd.concat([social_df_sd, org_df_sd], ignore_index=True)
results_df = results_df.round(3)

# -- View the final results --
results_df

Unnamed: 0,GroupType,Variable,GroupValue,GroupSize,SD,ADI,rWG_uniform,rWG_skewed,rWG_triangular
0,Social Identity,Older_binary,0,401,1.074,0.868,0.712,0.602,0.451
1,Social Identity,Older_binary,1,54,1.2,1.008,0.64,0.503,0.314
2,Social Identity,Caregiver,0,301,1.101,0.908,0.697,0.582,0.422
3,Social Identity,Caregiver,1,154,1.032,0.817,0.734,0.633,0.493
4,Social Identity,Finnish,0,43,1.264,1.123,0.601,0.449,0.239
5,Social Identity,Finnish,1,412,1.047,0.842,0.726,0.622,0.478
6,Social Identity,GenderBinary,0,168,1.095,0.891,0.7,0.587,0.429
7,Social Identity,GenderBinary,1,287,1.073,0.869,0.712,0.603,0.452
8,Organizational Unit,OrgUnit,ADVISO,119,1.093,0.875,0.701,0.588,0.431
9,Organizational Unit,OrgUnit,AUDASS,118,1.081,0.877,0.708,0.597,0.444


## Interpretation

rWG
- Under the uniform distribution (σ² = 4.0), **organizational units** displayed relatively stable and high rWG scores (ranging from 0.700 to 0.721), suggesting strong internal agreement about the climate for inclusion.
- **Social identity groups** exhibited greater variability, with rWG values ranging from 0.596 to 0.738. Some groups, such as Caregiver=1, showed higher agreement, while others, like Finnish=0, exhibited notably lower rWG.
- When applying more conservative distributional assumptions (skewed and triangular), rWG values declined across all groups — but this effect was more pronounced for social identity groups, whose agreement estimates became substantially lower than those of the organizational units. This suggests that perceived consensus in some identity groups may be more sensitive to assumptions about response behavior (e.g. social desirability bias or central tendency).

ADI
- **Organizational units** showed low and consistent ADI values (between 0.857 and 0.891), indicating tight clustering of responses around the group mean.
- **Social identity groups** had a wider spread of ADI values — from 0.807 (Caregiver=1) to 1.133 (Finnish=0) — further reflecting variability in the internal consistency of perceptions.
- These findings suggest that while some identity groups do demonstrate stronger internal consensus, this is not a generalizable pattern across all social categories.

SD
- A similar pattern appeared in standard deviation (SD), with organizational units again displaying low and narrow values — from 1.076 (TAXLEG) to 1.098 (SHARED) — indicating that members of these groups tend to evaluate the climate for inclusion similarly.
- **Social identity groups** showed a broader range, from 1.032 (Caregiver=1) to 1.264 (Finnish=0). Notably, the Finnish=0 group also had the highest ADI and lowest rWG, reinforcing the interpretation that this group experiences the climate for inclusion in a fragmented or inconsistent way.
- The SD values strengthen the interpretation that some social identity groups perceive inclusion in divergent ways, while formal organizational units maintain more cohesive internal perceptions


## Conclusion

The data does **not uniformly suppor**t the hypothesis that s**ocial identity groups exhibit stronger climate strength** than organizational units. While certain identity groups (e.g. Caregiver=1) show evidence of high agreement, overall:

- **Organizational units demonstrate more consistent and robust agreement**, especially under varied distributional assumptions.
- The use of multiple rWG models (uniform, skewed, triangular) reveals that **agreement in social identity groups may be less stable** and **more sensitive to the assumed response distribution.**


# Further tests

## Normality

In [11]:
from scipy.stats import shapiro


normality_results = []

for var_label, var_name in social_identity_groups.items():
    for group_value in [0, 1]:
        group_data = df[df[var_name] == group_value]["CI14"].dropna()
        if len(group_data) >= 3:
            stat, p = shapiro(group_data)
            normality_results.append({
                "Variable": var_label,
                "GroupValue": group_value,
                "GroupSize": len(group_data),
                "Shapiro-Wilk W": round(stat, 3),
                "p-value": round(p, 4),
                "Normal?": "Yes" if p > 0.05 else "No"
            })

# Convert to DataFrame
normality_df = pd.DataFrame(normality_results)
normality_df

Unnamed: 0,Variable,GroupValue,GroupSize,Shapiro-Wilk W,p-value,Normal?
0,Older_binary,0,401,0.978,0.0,No
1,Older_binary,1,54,0.945,0.0156,No
2,Caregiver,0,301,0.975,0.0,No
3,Caregiver,1,154,0.975,0.0075,No
4,Finnish,0,43,0.96,0.1343,Yes
5,Finnish,1,412,0.979,0.0,No
6,GenderBinary,0,168,0.965,0.0003,No
7,GenderBinary,1,287,0.978,0.0002,No


- In 7 of 8 group comparisons, the distribution of CI14 scores significantly deviates from normality.
- This means that most groups violate the assumption of normality, which has implications for parametric statistical tests like: Independent samples t-tests, ANOVAs, Pearson correlations

So I think going forward, we should remember to use non-parametric tests when comparing groups:
- Mann–Whitney U instead of t-tests (below)

Keep using median- or rank-based metrics like:
- ADI (based on absolute deviation)
- rWG (variance-based but not distributionally bound)
- SD is still useful, but interpret with caution given skew

Consider visualizing the distributions (e.g. histograms or boxplots), especially where you’re seeing unexpectedly low or high rWG/ADI values.

## Mann-Whitney U test

In [9]:
from scipy.stats import mannwhitneyu

# --- Filter the combined rWG/ADI dataframe ---
# Make sure this line comes after you've created `combined_rwg_df`
test_data = results_df[["GroupType", "rWG_uniform", "ADI"]].copy()

# --- Separate into two sets: Social Identity vs Organizational Units ---
social_rwg = test_data.loc[test_data["GroupType"] == "Social Identity", "rWG_uniform"]
org_rwg = test_data.loc[test_data["GroupType"] == "Organizational Unit", "rWG_uniform"]

social_adi = test_data.loc[test_data["GroupType"] == "Social Identity", "ADI"]
org_adi = test_data.loc[test_data["GroupType"] == "Organizational Unit", "ADI"]

# --- Run one-tailed Mann-Whitney U tests ---
# H1: Social rWG > Organizational rWG
rwg_test = mannwhitneyu(social_rwg, org_rwg, alternative='greater')

# H1: Social ADI < Organizational ADI
adi_test = mannwhitneyu(social_adi, org_adi, alternative='less')

# --- Package results into a DataFrame for clarity ---
test_results = pd.DataFrame({
    "Metric": ["rWG_uniform", "ADI"],
    "Test": ["Mann-Whitney U (greater)", "Mann-Whitney U (less)"],
    "U Statistic": [rwg_test.statistic, adi_test.statistic],
    "p-value (1-tailed)": [rwg_test.pvalue, adi_test.pvalue]
})

# --- View the results ---
print(test_results)

        Metric                      Test  U Statistic  p-value (1-tailed)
0  rWG_uniform  Mann-Whitney U (greater)         17.0            0.466103
1          ADI     Mann-Whitney U (less)         15.0            0.466667


Interpretation:
- rWG test: No statistically significant evidence that social identity groups have higher agreement than org units.
- ADI test: No statistically significant evidence that social identity groups have lower dispersion than org units.
- These p-values are well above the common significance threshold (0.05), so we fail to reject the null hypotheses.

# Outlook: Why our story may still hold

## Variability is also an insight/finding

We don't have to say that "Social groups didn’t have stronger agreement”. Instead, we could say that:

"The very variability in climate strength across social identity groups highlights why we must analyze inclusion climate at that level — because it is precisely through identity-based lenses that we can uncover where organizational inclusion is working for some and failing for others.”

--> Organizational units may appear “stronger” not because inclusion is universally experienced — but because they aggregate away difference.

## Use intersectionality to justify the social group lens

For example something like:

"“Our findings demonstrate that individuals within the same organizational unit can experience the inclusion climate very differently depending on their social identity. This supports arguments from intersectionality theory that inclusion is not universally experienced, and underscores the value of studying climate strength at the level of socially meaningful groups.”

## KEy takeaway

Even if the climate strength is higher in formal units, that doesn’t mean organizational units are a better lens for studying inclusion.

# Intersectionality?

Bewlow we calculate rWG and ADI for all 2- and 3-way combinations of social identity varibales.

In [None]:
from itertools import combinations

# --- Identity variables to combine ---
identity_vars = ["GenderBinary", "FINNISH", "CAREGIVER", "Older_binary"]

# --- Store results ---
intersectional_expanded_results = []

# Loop through all 2- and 3-way combinations of identity variables
for k in [2, 3]:
    for combo in combinations(identity_vars, k):
        group_name = "_".join(combo)
        # Create a new column in the DataFrame that encodes the group label
        group_col_name = f"Group_{group_name}"
        df[group_col_name] = df[list(combo)].astype(str).agg("_".join, axis=1)

        # Compute metrics for each unique group label
        for group_label, group_data in df.groupby(group_col_name):
            ci_scores = group_data["CI"]
            if len(ci_scores) >= 3:  # Only include groups with enough respondents
                result = {
                    "Grouping": group_name,
                    "GroupLabel": group_label,
                    "GroupSize": len(ci_scores),
                    "ADI": adi(ci_scores)
                }
                # Compute rWG values for each null distribution assumption
                for label, sigma_sq in expected_vars.items():
                    result[label] = rwg_fixed(ci_scores, sigma_sq)
                intersectional_expanded_results.append(result)

# --- Create and display results DataFrame ---
intersectional_expanded_df = pd.DataFrame(intersectional_expanded_results)
intersectional_expanded_df = intersectional_expanded_df.round(3)

# View the first few rows
intersectional_expanded_df

Unnamed: 0,Grouping,GroupLabel,GroupSize,ADI,rWG_uniform,rWG_skewed,rWG_triangular
0,GenderBinary_FINNISH,0_0,15,1.127,0.624,0.482,0.284
1,GenderBinary_FINNISH,0_1,153,0.83,0.738,0.638,0.501
2,GenderBinary_FINNISH,1_0,28,1.135,0.583,0.425,0.206
3,GenderBinary_FINNISH,1_1,259,0.818,0.742,0.644,0.509
4,GenderBinary_CAREGIVER,0_0,105,0.901,0.706,0.594,0.44
5,GenderBinary_CAREGIVER,0_1,63,0.788,0.767,0.679,0.556
6,GenderBinary_CAREGIVER,1_0,196,0.886,0.708,0.597,0.443
7,GenderBinary_CAREGIVER,1_1,91,0.797,0.741,0.643,0.508
8,GenderBinary_Older_binary,0_0,143,0.875,0.712,0.602,0.451
9,GenderBinary_Older_binary,0_1,25,0.691,0.839,0.778,0.693


## Findings

### 1. Climate strength varies meaningfully across intersectional identities

The group “Finnish men who are caregivers” (GenderBinary=0, FINNISH=1, CAREGIVER=1) showed the strongest agreement in inclusion climate perceptions, with:

- rWG_uniform = 0.787
- ADI = 0.774
- Group size = 59
This suggests a shared and cohesive perception of inclusion among this group.

In contrast, the group “non-Finnish women who are not caregivers” (GenderBinary=1, FINNISH=0, CAREGIVER=0) displayed very weak agreement, with:

- rWG_uniform = 0.491
- ADI = 1.264
- Group size = 18
This points to a fragmented or ambiguous climate, where inclusion is experienced inconsistently — or even contested — within the group.

### 2. Organizational units cannot capture these divergences

When compared to these intersectional findings, organizational units such as TAXLEG or ADVISO showed relatively consistent, yet unremarkable climate strength:

- All rWG_uniform values clustered between 0.700 and 0.721
- A-DI values ranged narrowly between 0.857 and 0.891

This homogeneity may reflect structural cohesion, but it also risks masking deep experiential differences across social identities within those units.

### 3. Intersectional analysis reveals inclusion gaps

Certain combinations of social characteristics appear to compound exclusion or ambiguity. For instance:

- “Non-Finnish male caregivers” (G0_F0_C1) had a rWG_uniform of just 0.514 and an ADI of 1.133, with only 4 members.

- This contrasts sharply with “Finnish male non-caregivers” (G0_F1_C0), who had rWG_uniform = 0.725 and ADI = 0.857 with 94 members.

The implication is that nationality and caregiver status intersect in ways that shape whether inclusion is experienced as shared — or fractured.

But we need to be a bit careful here with difference in sample, size, one "wrong" answer will throw off the result in the 4 group.

## Conclusion

My point with all this is that intersectionality may reveal more than just identity effects; it exposes where climate breaks down.

I thinkt the findings support our central argument: inclusion climate is best understood through the lens of social identity and intersectionality, not just formal organizational structures. **Organizational units may exhibit stronger surface-level agreement, but intersectional analysis uncovers hidden patterns of inclusion and exclusion that are essential to advancing both climate theory and EDI practice.**

