# H2

H2: Alternative media provide more issue-focused coverage of female politicians compared to mainstream media.

The dependent variable (DV), “issue-focused” degree, is a numerical but discrete variable with only six fixed values ranging from 0 to 1. Traditional regression or ordinal regression is not suitable in this case, as the values represent actual similarity scores rather than ordered categories. Given the irregular nature of the dependent variable, a Tobit model was employed to handle this type of censored data. The Tobit model assumes the existence of an underlying latent variable, of which only a portion is observed in the data.

In [26]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.base.model import GenericLikelihoodModel
import numpy as np
from scipy import stats

In [27]:
# Read the datset and add addtributes
df1 = pd.read_csv("../data/clusters/cluster_f_ir_al.csv")
df1["type"] = "al"
focus_map_al = {0: 0.12, 1: 0.15, 2: 0.26}
df1["focusness"] = df1["Topic"].map(focus_map_al).fillna(0.0)
        
df2 = pd.read_csv("../data/clusters/cluster_f_ir_ms.csv")
df2["type"] = "ms"
focus_map_ms = {0: 0.13, 1: 0.14, 2: 0.14}
df2["focusness"] = df2["Topic"].map(focus_map_ms).fillna(0.0)

In [28]:
# Combine datasets
sum_dataset = pd.DataFrame()
sum_dataset = pd.concat([sum_dataset,df1,df2],ignore_index=True)

In [29]:
# Inspect the new dataset
print(sum_dataset.info())
print(sum_dataset.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 151 entries, 0 to 150
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Document                 151 non-null    object 
 1   Topic                    151 non-null    int64  
 2   Name                     151 non-null    object 
 3   Representation           151 non-null    object 
 4   Representative_Docs      151 non-null    object 
 5   Top_n_words              151 non-null    object 
 6   Probability              151 non-null    float64
 7   Representative_document  151 non-null    bool   
 8   type                     151 non-null    object 
 9   focusness                151 non-null    float64
dtypes: bool(1), float64(2), int64(1), object(6)
memory usage: 10.9+ KB
None
                                            Document  Topic  \
0  Thousands gather in Dublin for "biggest Irish ...      2   
1  New Ireland is in “touching distance,” Si

Before conducting the Tobit analysis, it is important to ensure that the data meets the prerequisite condition: a substantial proportion of observations should be concentrated at the boundaries (maximum and minimum). In this dataset, 14.7% of the values fall at the boundaries, indicating clear clustering and supporting the use of the Tobit model. Additionally, the data does not obey Gauss Distribution significantly, thus is not suitable for Ordinal Regression.

In [30]:
lower_limit = sum_dataset['focusness'].min()
upper_limit = sum_dataset['focusness'].max()
censored_count = len(sum_dataset[sum_dataset['focusness'] == lower_limit]) + len(sum_dataset[sum_dataset['focusness'] == upper_limit])
print(f"censored ratio: {censored_count / len(sum_dataset):.2%}")

censored ratio: 14.57%


In [31]:
# The IV is not numerical
X = pd.get_dummies(sum_dataset['type'], drop_first=True).astype("float")
X = sm.add_constant(X)

y = sum_dataset['focusness']

# Test the fit of OLS
model_ols = sm.OLS(y, X).fit()
_, p_val = stats.shapiro(model_ols.resid)
print(f"p value: {p_val:.4f}")

p value: 0.0000


In [32]:
# Define Tobit Likelihood
class Tobit(GenericLikelihoodModel):
    def loglike(self, params):
        beta = params[:-1]
        sigma = params[-1]
        mu = np.dot(self.exog, beta)
        
        # Defining censoring limits)
        # Define 0 as the left limit，1 as the right limit
        left_censored = (self.endog <= y.min())
        right_censored = (self.endog >= y.max())
        uncensored = ~(left_censored | right_censored)
        
        ll = np.zeros(len(self.endog))
        ll[uncensored] = stats.norm.logpdf(self.endog[uncensored], mu[uncensored], sigma)
        ll[left_censored] = stats.norm.logcdf(y.min(), mu[left_censored], sigma)
        ll[right_censored] = stats.norm.logsf(y.max(), mu[right_censored], sigma)
        return ll.sum()

In [33]:
res = Tobit(y, X).fit(start_params=np.append(model_ols.params.values, [y.std()]))
print(res.summary())

Optimization terminated successfully.
         Current function value: -1.023140
         Iterations: 49
         Function evaluations: 86
                                Tobit Results                                 
Dep. Variable:              focusness   Log-Likelihood:                 154.49
Model:                          Tobit   AIC:                            -303.0
Method:            Maximum Likelihood   BIC:                            -293.9
Date:                Sun, 18 Jan 2026                                         
Time:                        12:05:33                                         
No. Observations:                 151                                         
Df Residuals:                     149                                         
Df Model:                           1                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------



According to the results, only the intercept (const = 0.109, std = 0.011, p < 0.001, 95% CI = [0.087, 0.131]) is statistically significant, indicating the baseline level of issue-focused coverage. The media type variable (ms = 0.012, std = 0.012, p = 0.340, 95% CI = [-0.012, 0.036]) does not have a statistically significant effect on the degree of issue-focused reporting. Therefore, there is no significant difference in issue-focused coverage between mainstream and alternative media.