#### Hypothesis 2: The number of engagements (i.e., likes, replies, retweets, and quote retweets) on tweets spreading misinformation/disinformation also dropped over time after the release of Rappler's Fact Checking Article last April 25, 2019.

Based on the hypothesis and the structure of the dataset, the appropriate statistical test for **Hypothesis 2** is an **independent samples t-test**. This test is suitable for comparing the means of the engagement metric (i.e., number of engagements) between two independent groups: tweets before April 25, 2019, and tweets after that date.

Here's a more detailed explanation of why an independent samples t-test is appropriate for Hypothesis 2:

- **Independent Groups**: The hypothesis aims to compare the number of engagements (i.e., likes, replies, retweets, and quote retweets) on tweets before and after the release of Rappler's Fact Checking Article on April 25, 2019. The tweets before and after this date represent two independent groups. The engagements in one group are not related to or dependent on the engagements in the other group. Therefore, an independent samples t-test is suitable for comparing the means between these two independent groups.

- **Continuous Variables**: The engagement metrics (likes, replies, retweets, and quote retweets) are continuous variables that can be treated as numerical quantities. An independent samples t-test can effectively compare the means of these continuous variables between the two groups.

- **Research Question**: The hypothesis states that the number of engagements on tweets spreading misinformation/disinformation dropped over time after the release of Rappler's Fact Checking Article. The focus is on the difference in means between engagements on tweets before and after April 25, 2019. An independent samples t-test is well-suited for addressing this research question by comparing the means of the engagement metrics in the two groups.

By employing an independent samples t-test, you can effectively analyze whether there is a significant difference in the means of engagements (likes, replies, retweets, and quote retweets) between tweets before and after the release of Rappler's Fact Checking Article on April 25, 2019, thus addressing **Hypothesis 2** in your analysis.

**Step 0: Repeated measures ANOVA assumptions check**

In [17]:
import scipy.stats as stats
import pandas as pd
import numpy as np

# import data to pandas dataframe
df = pd.read_csv('../../Data Exploration/cleaned_dataset.csv')

df['Engagements'] = df['Quote Tweets'] + df['Replies'] + df['Likes'] + df['Retweets']

df['Date posted'] = pd.to_datetime(df['Date posted'])

# before: engagement metrics before April 25, 2019
before = df[df['Date posted'] < '2019-04-25']['Engagements']

# after: engagement metrics after April 25, 2019
after = df[df['Date posted'] >= '2019-04-25']['Engagements']

1. Independence

In [18]:
# Perform the independence test
statistic, p_value = stats.mannwhitneyu(before, after, alternative='two-sided')

# Set significance level (alpha)
alpha = 0.05

# Print the results
print('Statistic:', statistic)
print('P-value:', p_value)

# Check if the p-value is less than the significance level
if p_value < alpha:
    print("The null hypothesis of independence is rejected. The groups are dependent.")
else:
    print("The null hypothesis of independence is not rejected. The groups are independent.")


Statistic: 1582.5
P-value: 0.34849080628465023
The null hypothesis of independence is not rejected. The groups are independent.


2. Normality

In [19]:
# Perform the Shapiro-Wilk test for normality
statistic_before, p_value_before = stats.shapiro(before)
statistic_after, p_value_after = stats.shapiro(after)

# Print the results
print('Statistic before:', statistic_before)
print('P-value before:', p_value_before)

print('Statistic after:', statistic_after)
print('P-value after:', p_value_after)

# Set significance level (alpha)
alpha = 0.05

# Check if the p-values are greater than the significance level
if p_value_before > alpha and p_value_after > alpha:
    print("The engagement metrics in both groups approximate a normal distribution.")
else:
    print("The engagement metrics in at least one of the groups do not approximate a normal distribution.")

Statistic before: 0.5465950965881348
P-value before: 1.2118325010135322e-08
Statistic after: 0.45283329486846924
P-value after: 9.835117512661089e-19
The engagement metrics in at least one of the groups do not approximate a normal distribution.


3. Homogenity of Variances

In [20]:
# Perform Levene's test for homogeneity of variance
statistic, p_value = stats.levene(before, after)

# Calculate the standard deviations of the two groups
std_dev_before = np.std(before)
std_dev_after = np.std(after)

# Set significance level (alpha)
alpha = 0.05

# Check if the p-value is greater than the significance level
if p_value > alpha:
    print("The variances of engagement metrics in the two groups are equal.")
else:
    print("The variances of engagement metrics in the two groups are not equal.")

# Compare the standard deviations
if std_dev_before > std_dev_after:
    print("The standard deviation of engagement metrics in the 'before' group is larger.")
elif std_dev_before < std_dev_after:
    print("The standard deviation of engagement metrics in the 'after' group is larger.")
else:
    print("The standard deviations of engagement metrics in the two groups are equal.")

The variances of engagement metrics in the two groups are equal.
The standard deviation of engagement metrics in the 'before' group is larger.


4. Interval or Ratio Data

- The assumption that the engagement metrics should be measured on an interval or ratio scale is typically satisfied since engagement metrics like likes, replies, retweets, and quote retweets are indeed continuous variables. Therefore, no specific statistical test is required to assess this assumption.
- However, it is always good practice to ensure that the engagement metrics in your dataset are indeed numerical and continuous. You can perform some basic checks to confirm this. Here's an example code snippet that demonstrates how to check if the engagement metrics are numeric:

In [21]:
# Check the data types of the engagement metric columns
engagement_metrics = ['Likes', 'Replies', 'Retweets', 'Quote Tweets']

for metric in engagement_metrics:
    if pd.api.types.is_numeric_dtype(df[metric]):
        print(f"{metric} is a numeric column.")
    else:
        print(f"{metric} is not a numeric column.")

Likes is a numeric column.
Replies is a numeric column.
Retweets is a numeric column.
Quote Tweets is a numeric column.


**Step 1: Selection of a non-parametric test**

While the assumptions of independence, homogeneity of variance, and interval or ratio data are satisfied in this analysis, the assumption of normality is not met for the engagement metrics in both groups. Therefore, the independent samples t-test, which assumes normality, is not appropriate for comparing the engagement metrics between the two groups before and after April 25, 2019. Let's try looking into Non-parametric tests, in particular let's look into the Mann-Whitney U test or permutation test are more suitable alternatives. These tests do not rely on the assumption of normality and can be used to assess the difference between two independent groups based on their ranks or by permutation of the data.

Here's an in-depth explanation of why the Mann-Whitney U test or permutation test are more suitable alternatives when the assumption of normality is not satisfied:

1. **Mann-Whitney U test**: The Mann-Whitney U test, also known as the Wilcoxon rank-sum test, is a non-parametric test used to compare two independent groups. It does not rely on the assumption of normality and instead compares the ranks of the observations between the two groups. Here's why it is a suitable alternative:

   - Robust to non-normality: The Mann-Whitney U test does not assume a specific distribution for the data, making it robust to violations of the normality assumption. It focuses on the ordering of the observations rather than their exact values.
   
   - Based on medians: The Mann-Whitney U test compares the medians of the two groups. If the distributions of the engagement metrics in the two groups differ, it is likely to result in a significant difference in the medians.
   
   - Assumptions: The Mann-Whitney U test assumes that the observations in each group are independent and that the engagement metrics are measured on an ordinal scale or higher.
   
   - Interpretation: The test provides a p-value that indicates the likelihood of obtaining the observed difference in medians (or more extreme) by chance alone. If the p-value is below a pre-defined significance level, it suggests a significant difference between the two groups.

2. **Permutation test**: A permutation test, also known as a randomization test or exact test, is a non-parametric resampling-based test that provides an alternative way to assess the statistical significance of the difference between two groups. Here's why it is a suitable alternative:

   - No assumptions about the data distribution: The permutation test does not rely on any assumptions about the data distribution, including normality. It works by randomly permuting the observations between the two groups, recalculating the test statistic, and repeating this process many times to obtain the empirical null distribution.
   
   - Flexible and adaptable: The permutation test can be applied to a wide range of data types and study designs, making it a versatile tool for hypothesis testing. It allows for customized test statistics based on your research question and provides a p-value based on the empirical null distribution.
   
   - Robustness: The permutation test is robust against violations of assumptions like normality and homogeneity of variance. It provides reliable results even when the data does not meet the assumptions of traditional parametric tests.
   
   - Validity: The permutation test provides exact p-values and maintains the desired type I error rate, meaning that the reported p-value is the exact probability of observing a test statistic as extreme as or more extreme than the observed statistic under the null hypothesis.
   
   - Interpretation: The p-value obtained from the permutation test represents the likelihood of obtaining the observed test statistic (or a more extreme value) under the null hypothesis. If the p-value is below a pre-defined significance level, it suggests a significant difference between the two groups.

**Step 2: Perform the Mann-Whitney U Test and interpret the results**


In [22]:
# Perform the Mann-Whitney U test for the engagements of the 2 groups
statistic, p_value = stats.mannwhitneyu(before, after, alternative='two-sided')

# Set significance level (alpha)
alpha = 0.05

# Print the test result
print(f"Mann-Whitney U test statistic: {statistic}")
print(f"p-value: {p_value}")

if p_value < alpha:
    print("There is a significant difference between the two groups.")
else:
    print("There is no significant difference between the two groups.")

Mann-Whitney U test statistic: 1582.5
p-value: 0.34849080628465023
There is no significant difference between the two groups.


**Step 3: Perform the Permutation test**

In [23]:
# Compute the observed test statistic (e.g., difference in means, difference in medians)
observed_statistic = np.mean(before) - np.mean(after)

# Perform the permutation test
num_permutations = 1000  # Number of permutations
permutation_stats = []

for _ in range(num_permutations):
    # Shuffle the data between the two groups
    combined_data = np.concatenate((before, after))
    np.random.shuffle(combined_data)
    
    # Compute the test statistic for the permuted data
    permuted_statistic = np.mean(combined_data[:len(before)]) - np.mean(combined_data[len(before):])
    permutation_stats.append(permuted_statistic)

# Compute the p-value as the proportion of permutation statistics greater than or equal to the observed statistic
p_value = (np.abs(permutation_stats) >= np.abs(observed_statistic)).mean()

# Print the results
print(f"Observed test statistic: {observed_statistic}")
print(f"p-value: {p_value}")

# Set significance level (alpha)
alpha = 0.05

# Check if the p-value is less than the significance level
if p_value < alpha:
    print("There is a significant difference between the two groups.")
else:
    print("There is no significant difference between the two groups.")

Observed test statistic: 0.4142614601018675
p-value: 0.703
There is no significant difference between the two groups.


**CONCLUSION**

Based on the results of both the **Mann-Whitney U test** and the **permutation test**, which indicate that there is **no significant difference** between the two groups, we **fail to reject the null hypothesis**.

Therefore, we **do not have sufficient evidence** to conclude that the number of engagements (likes, replies, retweets, and quote retweets) on tweets spreading misinformation/disinformation dropped over time after the release of Rappler's Fact Checking Article last April 25, 2019. The data does not support the hypothesis that the engagement metrics significantly decreased following the article's release.

It's important to note that failing to find a significant difference does not necessarily mean that there is no effect. It could be possible that the effect is small or that other factors are influencing the engagement metrics. However, based on the available data and the statistical tests performed, we cannot confidently conclude that there was a significant drop in engagements after the article's release.