# Statistic Hypothesis Test
For this, we'll be using the **time series hypothesis test** as described [here](https://elvyna.github.io/2018/time-series-hypothesis-testing/#:~:text=Time%20Series%20Hypothesis%20Test,have%20residuals%20of%20the%20series.).

The first thing we need to do is load our data set into the Python Notebook. Just for background, we have the tweets between the 25 December 2021 to 23 June 2023 which span the 45 days before the start of campaign period until 45 days after elections.

From this, we note the following important dates:
- Campaign period (for nationally elected positions): 8 Feb 2022 to 7 May 2022
- Election silence: 8 May 2022
- Election day (for non-absentee voters in the Philippines): 9 May 2022

In [2]:
import pandas as pd
import io

df = pd.read_csv('cs132_tweets_model_and_results.csv')
pd.set_option('display.max_colwidth', None)
df.head().style.set_properties(**{'text-align': 'left'})

Unnamed: 0,ID,"Account type ""Identified Anonymous Media""",Tweet Type,Tweet Type (2),Tweet Type (3),Date posted,Screenshot,Tweet ID,Tweet,Content type
0,00-1,Identified,Reply,,,12/26/21 22:35,00-1.png,1475113095395966976,"@_ultravioletred @jersonality If their ED was done by a CPP cadre, the educator would've been reprimanded for sure. Whoever that educator is, they're probably still thinking in the 90s. Wake up kas, RA/RJ split is not that relevant anymore, lol. Embrace your comrades like what Neri, Labog, and Zarate did.",Rational
1,00-2,Identified,Reply,,,12/28/21 13:25,00-2.png,1475699422982189056,@lukatmhe Colmenarez family ay part sa legal front ng NPA. At saka may first cousin si Angel Locsin na big time drug dealer ng shabu sa Cebu. Siya ay former Local Beauty Queen. Mismo si Neri Colmenarez naki usap sa mga police na wag galawin or itumba.,Rational
2,00-3,Anonymous,Reply,,,12/29/21 20:47,00-3.png,1476173017458016256,"@weirdnow1 @pnagovph yay 😂😂😂 kaya pa nmn mag pakamatay yang sina Elago,Zarate,Colmenares,Nato Reyes sa pakiki bakbakan.Tanggaaa yung mga nasapi sa NPA",Emotional
3,00-4,Anonymous,Reply,,,01/07/22 14:37,00-4.png,1479341371219902464,@ColmenaresPH Shut Up Colmenares NPA. SALOT KA  #Nerveagain,Emotional
4,00-5,Anonymous,Reply,,,01/08/22 06:25,00-5.png,1479580070922846208,AHAHAHAHA neri niyo NPA,Emotional


## Creating a daily tally of tweets
Here, we create a complete log of the included dates and the corresponding frequency of tweets for each day.

In [83]:
# Copy the dataframe to one we can process
df_daily = df.copy()

# Transform the data to reflect daily logs
df_daily['Date posted'] = pd.to_datetime(df['Date posted'])
df_daily['Date posted'] = df_daily['Date posted'].dt.strftime('%Y-%m-%d')
df_daily['Date'] = pd.to_datetime(df_daily['Date posted'], format='%Y-%m-%d')

pd.set_option('display.max_colwidth', None)
df_daily.head().style.set_properties(**{'text-align': 'left'})

Unnamed: 0,ID,"Account type ""Identified Anonymous Media""",Tweet Type,Tweet Type (2),Tweet Type (3),Date posted,Screenshot,Tweet ID,Tweet,Content type,Date
0,00-1,Identified,Reply,,,2021-12-26,00-1.png,1475113095395966976,"@_ultravioletred @jersonality If their ED was done by a CPP cadre, the educator would've been reprimanded for sure. Whoever that educator is, they're probably still thinking in the 90s. Wake up kas, RA/RJ split is not that relevant anymore, lol. Embrace your comrades like what Neri, Labog, and Zarate did.",Rational,2021-12-26 00:00:00
1,00-2,Identified,Reply,,,2021-12-28,00-2.png,1475699422982189056,@lukatmhe Colmenarez family ay part sa legal front ng NPA. At saka may first cousin si Angel Locsin na big time drug dealer ng shabu sa Cebu. Siya ay former Local Beauty Queen. Mismo si Neri Colmenarez naki usap sa mga police na wag galawin or itumba.,Rational,2021-12-28 00:00:00
2,00-3,Anonymous,Reply,,,2021-12-29,00-3.png,1476173017458016256,"@weirdnow1 @pnagovph yay 😂😂😂 kaya pa nmn mag pakamatay yang sina Elago,Zarate,Colmenares,Nato Reyes sa pakiki bakbakan.Tanggaaa yung mga nasapi sa NPA",Emotional,2021-12-29 00:00:00
3,00-4,Anonymous,Reply,,,2022-01-07,00-4.png,1479341371219902464,@ColmenaresPH Shut Up Colmenares NPA. SALOT KA  #Nerveagain,Emotional,2022-01-07 00:00:00
4,00-5,Anonymous,Reply,,,2022-01-08,00-5.png,1479580070922846208,AHAHAHAHA neri niyo NPA,Emotional,2022-01-08 00:00:00


In [104]:
# Create a counts dataframe
s_freqinc = df_daily['Date'].value_counts()
df_freqinc = s_freqinc.to_frame()
df_freqinc.columns = ['Frequency']
df_freqinc.head()

# Create range of dates
new_dates = pd.date_range(start='2021-12-25',end='2022-06-23',freq='D')

# Reindex the dates
df_freq = df_freqinc.reindex(new_dates).fillna(0)
df_freq['Frequency'] = df_freq['Frequency'].astype('int')
df_freq.head()

Unnamed: 0,Frequency
2021-12-25,0
2021-12-26,1
2021-12-27,0
2021-12-28,1
2021-12-29,1


In [107]:
df_datefreq = df_freq.copy()
df_datefreq['Date'] = df_datefreq.index
df_datefreq.reset_index(drop=True, inplace=True)
df_datefreq = df_datefreq[['Date', 'Frequency']]

df_freq.to_csv('cs132_datefreq.csv')
df_datefreq.head()

Unnamed: 0,Date,Frequency
0,2021-12-25,0
1,2021-12-26,1
2,2021-12-27,0
3,2021-12-28,1
4,2021-12-29,1


## Test 1: Before vs. After Elections
H0: There is no difference between the frequency of tweets before and after elections.

H1: The frequency of tweets significantly change before and after elections.

### Two-sample t-test
Here, we use a two-sample t-test to know whether the average of each period is significantly different to the other group. We follow through by computing the t-values of the data groups before and after the election date (9 May 2022) as described [here](https://www.statology.org/pandas-t-test/).

In [111]:
import pandas as pd
from scipy.stats import ttest_ind

# Create data groups
data_pre = df_datefreq[df_datefreq['Date'] < '2022-05-09']
data_post = df_datefreq[df_datefreq['Date'] > '2022-05-09']

# Perform independent two sample t-test
ttest_ind(data_pre['Frequency'], data_post['Frequency'])

Ttest_indResult(statistic=3.7858909942362833, pvalue=0.0002091868258127852)

As seen in the result, we have a t-test statistic of 3.7858909942362833 and a p-value of 0.0002091868258127852 which means we can reject the null hypothesis, leading us to the conclusion that **there is a significant difference between the average frequency of tweets before and after elections**.