# H1

H1: Mainstream media in both Ireland and Nepal are more likely to emphasise female stereotypes, such as personal attributes (e.g., appearance, family roles), of female politicians rather than their political achievements.

The hypothesis treats stereotypes as the dependent variable (DV), with nation and media type as independent variables (IVs). Both IVs are binary (Nepal/Ireland and alternative/mainstream), while the DV is a numerical value calculated by the model. However, the DV is highly skewed, with approximately 60% of cases being 0 (no stereotypes). Therefore, a two-part modeling approach was adopted:

1. Do nation and media type influence the presence of stereotypes?
The DV was first converted into a binary variable: 0 (no stereotypes) and 1 (stereotypes present). Logit regression was then conducted to examine the effects of nation and media type on the existence of stereotypes.

2. Do nation and media type influence the degree of stereotypes?
Next, articles without stereotypes were excluded, and linear regression was performed on the remaining data to assess how nation and media type affect the degree of stereotypes.

To be noticed that, the stereotype was calculated in articles only mentioning female or male, to maximum remove the noise of co-existence.

In [28]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
import numpy as np
from sklearn.linear_model import LogisticRegression

In [16]:
# Read the datset and add addtributes
df1 = pd.read_csv("../data/stereotype_calculated/f_ir_al_with_stereotype_score.csv")
df1["nation"] = "ir"
df1["type"] = "al"

df2 = pd.read_csv("../data/stereotype_calculated/f_ir_ms_with_stereotype_score.csv")
df2["nation"] = "ir"
df2["type"] = "ms"

df3 = pd.read_csv("../data/stereotype_calculated/f_np_al_with_stereotype_score.csv")
df3["nation"] = "np"
df3["type"] = "al"

df4 = pd.read_csv("../data/stereotype_calculated/f_np_ms_with_stereotype_score.csv")
df4["nation"] = "np"
df4["type"] = "ms"

In [34]:
# Combine datasets
sum_dataset = pd.DataFrame()
sum_dataset = pd.concat([sum_dataset,df1,df2,df3,df4],ignore_index=True)

In [35]:
# Inspect the new dataset
print(sum_dataset.info())
print(sum_dataset.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181 entries, 0 to 180
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   date              181 non-null    object 
 1   content           181 non-null    object 
 2   female            181 non-null    int64  
 3   male              181 non-null    int64  
 4   stereotype_score  181 non-null    float64
 5   nation            181 non-null    object 
 6   type              181 non-null    object 
dtypes: float64(1), int64(2), object(4)
memory usage: 10.0+ KB
None
        date                                            content  female  male  \
0  2024/9/13  "They saved his life" - Mary Lou McDonald’s hu...       1     0   
1  2024/7/18  Dublin man arrested after online threats again...       1     0   
2  2024/7/19  "I didn’t mean it to go so viral" - Dublin man...       1     0   
3  2024/7/17  Irish man threatens to kill Mary Lou McDonald ...       1     0   


## Part 1


In [19]:
dataset1 = sum_dataset
dataset1["stereotype_score"] = (dataset1["stereotype_score"] > 0 ).astype(int)

Before conducting regression, the study inspected VIF to ensure the weak correlation between IVs. Both VIFs for nation and type are around 1 and less than 10, indicating the ideal condition for regression.

In [20]:
# Prepare for the Regression

# 1. Inspect Nulls
print('Missing values:')
print(dataset1.isnull().sum())

# 2. Inspect DV distribution
print('\nDV:')
print(dataset1['stereotype_score'].value_counts())

# 3. Inspect VIF
# since IVs are categorical, they must first be converted into dummy variables then into floats
X = pd.get_dummies(dataset1[['nation','type']], drop_first=True)
X = X.astype(float)

X['intercept'] = 1

vif = pd.DataFrame()
vif['variable'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print("\nVIF")
print(vif)

Missing values:
date                0
content             0
female              0
male                0
stereotype_score    0
nation              0
type                0
dtype: int64

DV:
stereotype_score
0    111
1     70
Name: count, dtype: int64

VIF
    variable       VIF
0  nation_np  1.006250
1    type_ms  1.006250
2  intercept  6.106105


However, the distribution of IVs are skewed as well, resulting in the fail on convenge of the regression model including the interaction effect even with more iterations. Hence, the model was further simplified into the main effect model without the interection term.

In [22]:
print(dataset1['stereotype_score'].value_counts())
print(dataset1['nation'].value_counts())
print(dataset1['type'].value_counts())

stereotype_score
0    111
1     70
Name: count, dtype: int64
nation
ir    151
np     30
Name: count, dtype: int64
type
ms    151
al     30
Name: count, dtype: int64


In [27]:
# Type conversion
dataset1['nation'] = dataset1['nation'].astype('category')
dataset1['type'] = dataset1['type'].astype('category')

# Construct the model
model1 = smf.logit('stereotype_score ~ nation * type', data=dataset1).fit(maxiter=500)
print(model1.summary())

         Current function value: 0.597892
         Iterations: 500
                           Logit Regression Results                           
Dep. Variable:       stereotype_score   No. Observations:                  181
Model:                          Logit   Df Residuals:                      177
Method:                           MLE   Df Model:                            3
Date:                Sat, 17 Jan 2026   Pseudo R-squ.:                  0.1040
Time:                        21:09:31   Log-Likelihood:                -108.22
converged:                      False   LL-Null:                       -120.78
Covariance Type:            nonrobust   LLR p-value:                 1.462e-05
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -0.3747      0.392     -0.957      0.339      -1.142       0.393
nation[T.np]             



In [25]:
# Type conversion
dataset1['nation'] = dataset1['nation'].astype('category')
dataset1['type'] = dataset1['type'].astype('category')

# Construct the model
model1 = smf.logit('stereotype_score ~ nation + type', data=dataset1).fit()
print(model1.summary())

Optimization terminated successfully.
         Current function value: 0.598359
         Iterations 7
                           Logit Regression Results                           
Dep. Variable:       stereotype_score   No. Observations:                  181
Model:                          Logit   Df Residuals:                      178
Method:                           MLE   Df Model:                            2
Date:                Sat, 17 Jan 2026   Pseudo R-squ.:                  0.1033
Time:                        21:05:45   Log-Likelihood:                -108.30
converged:                       True   LL-Null:                       -120.78
Covariance Type:            nonrobust   LLR p-value:                 3.831e-06
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept       -0.3869      0.390     -0.992      0.321      -1.151       0.377
nation[T.np]    -3.2173

According to the results, only nation (coef_np = -3.217, std = 1.031, p = 0.002, 95% CI = [-5.238, -1.197]) significantly influences the presence of stereotypes in the female context, while media type (coef_ms = 0.260, std = 0.429, p = 0.544, 95% CI = [-0.580, 1.100]) does not have a statistically significant effect. Furthermore, Nepalese media are more likely to contain stereotyped content compared to Irish media. The model explains approximately 10.3% of the variance in the data, indicating moderate explanatory power.

## Part 2

In [40]:
dataset2 = sum_dataset
dataset2 = dataset2[dataset2["stereotype_score"] > 0]

Before conducting regression, the study inspected VIF to ensure weak correlation between independent variables. Both VIFs for nation and type were around 1 and well below 10, indicating ideal conditions for regression. However, for the independent variable nation, there was only one instance for Nepal, meaning the model could not efficiently verify the effect due to insufficient data for that group.

The second part cannot be implemented.

In [41]:
# Prepare for the Regression

# 1. Inspect Nulls
print('Missing values:')
print(dataset2.isnull().sum())

# 2. Inspect DV distribution
print('\nDV:')
print(dataset2['stereotype_score'].value_counts())

# 3. Inspect VIF
# since IVs are categorical, they must first be converted into dummy variables then into floats
X = pd.get_dummies(dataset2[['nation','type']], drop_first=True)
X = X.astype(float)

X['intercept'] = 1

vif = pd.DataFrame()
vif['variable'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print("\nVIF")
print(vif)

Missing values:
date                0
content             0
female              0
male                0
stereotype_score    0
nation              0
type                0
dtype: int64

DV:
stereotype_score
0.179862    1
0.094298    1
0.091431    1
0.060591    1
0.058141    1
           ..
0.011474    1
0.010542    1
0.010384    1
0.008315    1
0.015264    1
Name: count, Length: 70, dtype: int64

VIF
    variable       VIF
0  nation_np  1.002709
1    type_ms  1.002709
2  intercept  6.363636


In [42]:
print(dataset2['stereotype_score'].value_counts())
print(dataset2['nation'].value_counts())
print(dataset2['type'].value_counts())

stereotype_score
0.179862    1
0.094298    1
0.091431    1
0.060591    1
0.058141    1
           ..
0.011474    1
0.010542    1
0.010384    1
0.008315    1
0.015264    1
Name: count, Length: 70, dtype: int64
nation
ir    69
np     1
Name: count, dtype: int64
type
ms    59
al    11
Name: count, dtype: int64
