# H3


H3: Women politicians receive greater visibility in Irish media (both mainstream and alternative) compared to Nepalese media, given Ireland's higher ranking in global gender equality indices.

The hypothesis takes female visibility as the dependent variable (DV), and nation and media type as independent variables (IVs). The DV is a binary variable: 1 (the gender mentioned in an article) and 0 (not mentioned) while the IVs are two-category variables (Nepal\*Ireland and alternative*mainstream). Therefore the study employed Logit Regression to verify the correlation. 

Since the study did not directly examine all politicians, it used parliament members as proxies. To reduce influential noise, articles  mentioning neither female or male MPs were removed at the outset.

In [13]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.outliers_influence import variance_inflation_factor
import numpy as np

In [14]:
# Read the datset and add addtributes
df1 = pd.read_csv("../data/filtered/gender_ir_al.csv")
df1["nation"] = "ir"
df1["type"] = "al"

df2 = pd.read_csv("../data/filtered/gender_ir_ms.csv")
df2["nation"] = "ir"
df2["type"] = "ms"

df3 = pd.read_csv("../data/filtered/gender_np_al.csv")
df3["nation"] = "np"
df3["type"] = "al"

df4 = pd.read_csv("../data/filtered/gender_np_ms.csv")
df4["nation"] = "np"
df4["type"] = "ms"

In [20]:
# Combine datasets
sum_dataset = pd.DataFrame()
sum_dataset = pd.concat([sum_dataset,df1,df2,df3,df4],ignore_index=True)

In [16]:
# Inspect the new dataset
print(sum_dataset.info())
print(sum_dataset.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1834 entries, 0 to 1833
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   date     1834 non-null   object
 1   content  1834 non-null   object
 2   female   1834 non-null   int64 
 3   male     1834 non-null   int64 
 4   nation   1834 non-null   object
 5   type     1834 non-null   object
dtypes: int64(2), object(4)
memory usage: 86.1+ KB
None
        date                                            content  female  male  \
0  2024/1/10  Irish LGBTQI+ groups steadfastly support surro...       0     1   
1  2024/1/11  Ireland's new National Cycling Network to feat...       0     1   
2  2024/1/11  WATCH: Irish lawyer delivers argument in The H...       0     1   
3  2024/1/14  Thousands gather in Dublin for "biggest Irish ...       1     0   
4  2024/1/15  Dublin preps for arrival of Chinese Premier (a...       0     1   

  nation type  
0     ir   al  
1     ir   al  
2     i


Before conducting regression, the study inspected VIF to ensure the weak correlation between IVs. Both VIFs for nation and type are around 1 and less than 10, indicating the ideal condition for regression.

In [27]:
# Prepare for the Regression

# 1. Inspect Nulls
print('Missing values:')
print(sum_dataset.isnull().sum())

# 2. Inspect DV distribution
print('\nDV1:')
print(sum_dataset['female'].value_counts())

# 3. Inspect VIF
# since IVs are categorical, they must first be converted into dummy variables then into floats
X = pd.get_dummies(sum_dataset[['nation','type']], drop_first=True)
X = X.astype(float)

X['intercept'] = 1

vif = pd.DataFrame()
vif['variable'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print("\nVIF")
print(vif)

Missing values:
date          0
content       0
female        0
male          0
nation        0
type          0
gender_gap    0
dtype: int64

DV1:
female
0    1089
1     745
Name: count, dtype: int64

VIF
    variable       VIF
0  nation_np  1.005021
1    type_ms  1.005021
2  intercept  5.849942


Since the study examined the relationship between the two independent variables, the interaction effect on the dependent variables was also tested. The model used Ireland/alternative media (Ir/al) as the reference group (0/0) and assessed whether the Nepal/mainstream media (Np/ms) combination showed a significant effect compared to the reference. The results indicated that the interaction effect was not significant (std = 0.524, p = 0.130, 95% CI = [-1.822, 0.233]). Therefore, the model was re-estimated to focus on the main effects without the interaction term.

In [26]:
# Type conversion
sum_dataset['nation'] = sum_dataset['nation'].astype('category')
sum_dataset['type'] = sum_dataset['type'].astype('category')

# Construct the model
model1 = smf.logit('female ~ nation * type', data=sum_dataset).fit()
print(model1.summary())

Optimization terminated successfully.
         Current function value: 0.661294
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:                 female   No. Observations:                 1834
Model:                          Logit   Df Residuals:                     1830
Method:                           MLE   Df Model:                            3
Date:                Sat, 17 Jan 2026   Pseudo R-squ.:                 0.02096
Time:                        19:49:44   Log-Likelihood:                -1212.8
converged:                       True   LL-Null:                       -1238.8
Covariance Type:            nonrobust   LLR p-value:                 3.100e-11
                              coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------------
Intercept                  -0.9163      0.129     -7.098      0.000      -1.169   

In [25]:
# CRemove the interaction term
model2 = smf.logit('female ~ nation + type', data=sum_dataset).fit()
print(model2.summary())

Optimization terminated successfully.
         Current function value: 0.661864
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:                 female   No. Observations:                 1834
Model:                          Logit   Df Residuals:                     1831
Method:                           MLE   Df Model:                            2
Date:                Sat, 17 Jan 2026   Pseudo R-squ.:                 0.02012
Time:                        19:47:40   Log-Likelihood:                -1213.9
converged:                       True   LL-Null:                       -1238.8
Covariance Type:            nonrobust   LLR p-value:                 1.505e-11
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept       -0.8751      0.125     -6.988      0.000      -1.121      -0.630
nation[T.np]    -0.7875

The results showed that both nation (coef_np =  -0.787, std = 0.162, p = 0.000, 95% CI = [-1.105, -0.470]) and media type (coef_ms = 0.694, std = 0.136, p = 0.000, 95% CI = [0.428, 0.961]) have significant effects on the visibility of female politicians in news articles. Specifically, articles from Nepal are significantly less likely to mention female politicians compared to those from Ireland, while mainstream media are significantly more likely to mention female politicians than alternative media. Nevertheless, the Pseudo R-squ. only shows 0.02, indicating the weak explanatory power of the model.