> H1: Self-concept dimensions (i.e., physical, social, temperamental, educational, moral, and intellectual) increases the impact of
anxiety.

### Robost Regression

In [1]:
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import HuberRegressor
import pandas as pd

df = pd.read_csv('merged_data.csv')

# 1. Prepare the data
X = df[['Physical', 'Social', 'Temperamental', 'Educational', 'Moral', 'Intellectual']]  # IVs
y = df['TOTAL_BAI']  # DV

# 2. Standardizing the data (optional but recommended for robust models)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Add a constant (intercept) term for the regression
X_scaled = sm.add_constant(X_scaled)

# 4. Fit a Robust Linear Model using Huber Regressor from sklearn
model = HuberRegressor()
model.fit(X_scaled, y)

# 5. Print out the coefficients and intercept
print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")

# 6. Predicting values
y_pred = model.predict(X_scaled)

# 7. You can also calculate residuals or other model diagnostics
residuals = y - y_pred

# Optionally: Checking the summary statistics using statsmodels for robust regression
robust_model = sm.RLM(y, X_scaled, M=sm.robust.norms.HuberT()).fit()
print(robust_model.summary())

Intercept: 10.071904360504979
Coefficients: [10.07103848 -3.97517686  0.06107994 -3.70422799 -1.33819399 -0.23192287
  1.68744141]
                    Robust linear Model Regression Results                    
Dep. Variable:              TOTAL_BAI   No. Observations:                  211
Model:                            RLM   Df Residuals:                      204
Method:                          IRLS   Df Model:                            6
Norm:                          HuberT                                         
Scale Est.:                       mad                                         
Cov Type:                          H1                                         
Date:                Sat, 30 Nov 2024                                         
Time:                        16:56:04                                         
No. Iterations:                    17                                         
                 coef    std err          z      P>|z|      [0.025      0.975]


The model assessing the relationship between self-concept dimensions and anxiety (TOTAL_BAI) using a Robust Linear Model (RLM) found that physical (β = -4.13, p < 0.001) and temperamental (β = -3.27, p = 0.006) self-concept dimensions were significantly negatively related to anxiety, suggesting that higher physical and temperamental self-concepts are associated with lower anxiety. In contrast, intellectual self-concept (β = 1.97, p = 0.042) was positively related to anxiety, indicating that a higher intellectual self-concept is associated with increased anxiety. Social (p = 0.592), educational (p = 0.248), and moral (p = 0.636) self-concept dimensions did not significantly impact anxiety. Overall, the model indicates that self-concept dimensions do influence anxiety, but the effect varies across dimensions, with only physical, temperamental, and intellectual dimensions showing significant effects.

Therefore, we reject H1 as not all hypothesized relationships between self-concept dimensions (Social, Educational, and Moral) and anxiety were significant.

> H2: There is a significant mean difference between the physical, social, temperamental, educational, intellectual, and moral self-concepts of males and females.

In [2]:
import pandas as pd
from scipy import stats


# Split the DataFrame into male and female groups
df_male = df[df['Gender'] == 'M']
df_female = df[df['Gender'] == 'F']

# List of self-concept dimensions to test
dimensions = ['Physical', 'Social', 'Temperamental', 'Educational', 'Moral', 'Intellectual']

# Initialize a dictionary to store results
results = {}

# Loop through each dimension and perform appropriate tests
for dim in dimensions:
    male_data = df_male[dim]
    female_data = df_female[dim]
    
    # Check if the data is normally distributed using the Shapiro-Wilk test
    _, p_value_male = stats.shapiro(male_data)
    _, p_value_female = stats.shapiro(female_data)
    
    # If both groups are normal, use a t-test
    if p_value_male > 0.05 and p_value_female > 0.05:
        # Perform independent t-test (assuming equal variances as per Levene's Test)
        t_stat, p_value = stats.ttest_ind(male_data, female_data, equal_var=True)
        test_type = 't-test'
    else:
        # If either group is not normal, use Kruskal-Wallis H test (for non-normally distributed data)
        h_stat, p_value = stats.kruskal(male_data, female_data)
        test_type = 'Kruskal-Wallis H'
    
    # Store the results
    results[dim] = {
        'Test Type': test_type,
        'Statistic': h_stat if test_type == 'Kruskal-Wallis H' else t_stat,
        'P-Value': p_value
    }

# Convert results to DataFrame for better readability
results_df = pd.DataFrame(results).T
print(results_df)


                      Test Type Statistic   P-Value
Physical       Kruskal-Wallis H  0.724968  0.394519
Social                   t-test -0.898542  0.369931
Temperamental  Kruskal-Wallis H  0.231253  0.630596
Educational              t-test -1.063054  0.288984
Moral          Kruskal-Wallis H  0.248027  0.618468
Intellectual   Kruskal-Wallis H  0.984691  0.321043


A series of t-tests and Kruskal-Wallis tests were conducted to examine potential gender differences in self-concept dimensions. The results indicated no significant differences between males and females for any of the self-concept dimensions (physical, social, temperamental, educational, moral, and intellectual). Therefore, **Hypothesis 2 was not supported**. 

In [3]:
import pandas as pd
from scipy import stats

df_male = df[df['Gender'] == 'M']
df_female = df[df['Gender'] == 'F']

dimensions = ['Physical', 'Social', 'Temperamental', 'Educational', 'Moral', 'Intellectual']

# Initialize a dictionary to store results
results_ttest = {}

# Loop through each dimension and perform a t-test and 'Mann-Whitney U'
for dim in dimensions:
    male_data = df_male[dim]
    female_data = df_female[dim]
    
    # Perform independent t-test assuming equal variances (Levene's test suggested equality of variance)
    t_stat, p_value = stats.ttest_ind(male_data, female_data, equal_var=True)
    # If either group is not normal, use Kruskal-Wallis H test (for non-normally distributed data)
    h_stat, h_p_value = stats.kruskal(male_data, female_data)
    test_type = 'Kruskal-Wallis H'
    
    # Store the results
    results_ttest[dim] = {
        'Test Type': 't-test',
        't-Statistic': t_stat,
        'P-Value(t-stat)': p_value,
        'Kruskal-Wallis H':h_stat,
        'P-Value(Kruskal-Wallis H)': h_p_value
    }

# Convert results to DataFrame for better readability
results_ttest_df = pd.DataFrame(results_ttest).T
print(results_ttest_df)


              Test Type t-Statistic P-Value(t-stat) Kruskal-Wallis H  \
Physical         t-test     0.50173        0.616386         0.724968   
Social           t-test   -0.898542        0.369931         1.208565   
Temperamental    t-test    0.178291        0.858667         0.231253   
Educational      t-test   -1.063054        0.288984         1.572835   
Moral            t-test   -0.912334        0.362644         0.248027   
Intellectual     t-test    1.309798        0.191702         0.984691   

              P-Value(Kruskal-Wallis H)  
Physical                       0.394519  
Social                         0.271617  
Temperamental                  0.630596  
Educational                    0.209796  
Moral                          0.618468  
Intellectual                   0.321043  


A series of t-tests and Kruskal-Wallis H tests were conducted to compare self-concept dimensions between males and females. No significant gender differences were found for physical, social, temperamental, educational, moral, or intellectual self-concept. Thus, gender differences in self-concept is rejected.

> H3: There is a significant mean difference between the anxiety scores of males and females.

In [4]:
import pandas as pd
import pingouin as pg

print("TOTAL_BAI")
print(pg.normality(df['TOTAL_BAI']))

TOTAL_BAI
                  W      pval  normal
TOTAL_BAI  0.962684  0.000024   False


In [5]:
import pandas as pd
from scipy.stats import mannwhitneyu

# Split data by gender
male_scores = df[df['Gender'] == 'M']['TOTAL_BAI']
female_scores = df[df['Gender'] == 'F']['TOTAL_BAI']

# Perform the Mann-Whitney U test
stat, p_value = stats.kruskal(male_data, female_data)

# Print results
print(f'Kruskal-Wallis H statistic: {stat}')
print(f'p-value: {p_value}')

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in anxiety scores between males and females.")
else:
    print("There is no significant difference in anxiety scores between males and females.")

Kruskal-Wallis H statistic: 0.9846914184877255
p-value: 0.32104330864735464
There is no significant difference in anxiety scores between males and females.


## H4: There is a relation between overall self-concept and anxiety
### Since each latent variable can only be directly equated with one observed variable, we can only conduct a path analysis, which estimates relationships among observed variables

In [6]:
import numpy as np
from scipy.stats import spearmanr

# Example data (replace these arrays with your actual data)
self_concept = df['TOTAL']
bai = df['TOTAL_BAI']

# Calculate Spearman correlation
correlation, p_value = spearmanr(self_concept, bai)

print(f"correlation: {correlation}")
print(f"p_value: {p_value}")

correlation: -0.35981202410671403
p_value: 7.586842080088851e-08


A significant negative correlation was found between overall self-concept and anxiety (r = -.35, p < .001). This indicates that as overall self-concept increases, anxiety tends to decrease. 

### Path-Analysis

In [7]:
import pandas as pd
from semopy import Model, Optimizer

data = df.copy()

model_description = """
# Measurement Model
Physical ~ Physical
Social ~ Social
Temperamental ~ Temperamental
Educational ~ Educational
Moral ~ Moral
Intellectual ~ Intellectual

# Structural Model
TOTAL_BAI ~ Physical + Social + Temperamental + Educational + Moral + Intellectual
"""

# Initialize and fit the model
model = Model(model_description)
model.fit(data)

SolverResult(fun=1.4635173262140428, success=True, n_it=35, x=array([ 2.92379183e-01,  2.92265684e-01,  2.92465801e-01,  2.92507929e-01,
        2.92282661e-01,  2.91121551e-01, -8.42351743e-01,  2.54544859e-01,
       -6.72318940e-01, -3.38012467e-01, -1.08256480e-01,  5.25951481e-01,
        9.20272334e+00,  6.07102074e+00,  7.39900727e+00,  1.12289541e+01,
        9.02418142e+00,  1.51888851e+02,  9.39371851e+00]), message='Optimization terminated successfully', name_method='SLSQP', name_obj='MLW')

In [8]:
from semopy import calc_stats
stats = calc_stats(model)
print(stats)

       DoF  DoF Baseline        chi2  chi2 p-value  chi2 Baseline       CFI  \
Value    9            21  308.802156           0.0     358.783848  0.112444   

            GFI     AGFI       NFI       TLI     RMSEA        AIC       BIC  \
Value  0.139309 -1.00828  0.139309 -1.070965  0.398278  35.072965  98.75827   

         LogLik  
Value  1.463517  


In [9]:
# Basic Model: Only one predictor
model_description_1 = """
# Structural Model
TOTAL_BAI ~ Physical
"""

# Fit the model
model_1 = Model(model_description_1)
model_1.fit(df)

# Calculate fit statistics
stats_1 = calc_stats(model_1)
print("Model 1 AIC:", stats_1["AIC"].Value)
print("Model 1 BIC:", stats_1["BIC"].Value)

Model 1 AIC: 3.999999993049731
Model 1 BIC: 10.703716260001865


In [10]:
# Extended Model: Two predictors
model_description_2 = """
# Structural Model
TOTAL_BAI ~ Physical + Social
"""

# Fit the model
model_2 = Model(model_description_2)
model_2.fit(df)

# Calculate fit statistics
stats_2 = calc_stats(model_2)
print("Model 2 AIC:", stats_2["AIC"].Value)
print("Model 2 BIC:", stats_2["BIC"].Value)

Model 2 AIC: 5.9999998793045854
Model 2 BIC: 16.055574279732785


In [11]:
# Full Model: All predictors
model_description_3 = """
# Structural Model
TOTAL_BAI ~ Physical + Social + Temperamental + Educational + Moral + Intellectual
"""

# Fit the model
model_3 = Model(model_description_3)
model_3.fit(df)

# Calculate fit statistics
stats_3 = calc_stats(model_3)
print("Model 3 AIC:", stats_3["AIC"].Value)
print("Model 3 BIC:", stats_3["BIC"].Value)

Model 3 AIC: 13.999999574005411
Model 3 BIC: 37.463006508337884


In [12]:
# Model with measurement and structural components
model_description_4 = """
# Measurement Model
Physical ~ Physical
Social ~ Social

# Structural Model
TOTAL_BAI ~ Physical + Social
"""

# Fit the model
model_4 = Model(model_description_4)
model_4.fit(df)

# Calculate fit statistics
stats_4 = calc_stats(model_4)
print("Model 4 AIC:", stats_4["AIC"].Value)
print("Model 4 BIC:", stats_4["BIC"].Value)

Model 4 AIC: 13.796094340592322
Model 4 BIC: 37.25910127492479


  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))


In [13]:
# Basic Model: Only one predictor
model_description_1 = """
# Structural Model
TOTAL_BAI ~ Physical + Moral
"""

# Fit the model
model_1 = Model(model_description_1)
model_1.fit(df)

# Calculate fit statistics
stats_1 = calc_stats(model_1)
print("Model 1 AIC:", stats_1["AIC"].Value)
print("Model 1 BIC:", stats_1["BIC"].Value)

Model 1 AIC: 5.999999996962963
Model 1 BIC: 16.05557439739116


AIC and BIC scores remain consistently lower with only one predictor, and the scores do not improve when adding more predictors. Predictors are highly correlated, adding them does not contribute additional information. This can also be seen from pearson correlation.