# **A/B test - TikTok**

<hr style="height: 3px; border:none; color:#000; background-color:#000;" />

This report presents an analysis of TikTok video performance data to evaluate key relationships and differences across various content and user characteristics. Using hypothesis testing as the primary method, the analysis aims to uncover statistically significant patterns and insights related to video engagement metrics such as views, likes, comments, shares, and downloads. The findings will provide a deeper understanding of how factors like claim status, verification status, and author ban status influence audience interactions and engagement on the platform

In [2]:
# Libraries
import numpy as np
import pandas as pd

from scipy import stats

In [3]:
df_tiktok = pd.read_csv("Datasets/tiktok_dataset.csv")
df_tiktok.head(10)

Unnamed: 0,#,claim_status,video_id,video_duration_sec,video_transcription_text,verified_status,author_ban_status,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
0,1,claim,7017666017,59,someone shared with me that drone deliveries a...,not verified,under review,343296.0,19425.0,241.0,1.0,0.0
1,2,claim,4014381136,32,someone shared with me that there are more mic...,not verified,active,140877.0,77355.0,19034.0,1161.0,684.0
2,3,claim,9859838091,31,someone shared with me that american industria...,not verified,active,902185.0,97690.0,2858.0,833.0,329.0
3,4,claim,1866847991,25,someone shared with me that the metro of st. p...,not verified,active,437506.0,239954.0,34812.0,1234.0,584.0
4,5,claim,7105231098,19,someone shared with me that the number of busi...,not verified,active,56167.0,34987.0,4110.0,547.0,152.0
5,6,claim,8972200955,35,someone shared with me that gross domestic pro...,not verified,under review,336647.0,175546.0,62303.0,4293.0,1857.0
6,7,claim,4958886992,16,someone shared with me that elvis presley has ...,not verified,active,750345.0,486192.0,193911.0,8616.0,5446.0
7,8,claim,2270982263,41,someone shared with me that the best selling s...,not verified,active,547532.0,1072.0,50.0,22.0,11.0
8,9,claim,5235769692,50,someone shared with me that about half of the ...,not verified,active,24819.0,10160.0,1050.0,53.0,27.0
9,10,claim,4660861094,45,someone shared with me that it would take a 50...,verified,active,931587.0,171051.0,67739.0,4104.0,2540.0


In [7]:
df_tiktok.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19382 entries, 0 to 19381
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   #                         19382 non-null  int64  
 1   claim_status              19084 non-null  object 
 2   video_id                  19382 non-null  int64  
 3   video_duration_sec        19382 non-null  int64  
 4   video_transcription_text  19084 non-null  object 
 5   verified_status           19382 non-null  object 
 6   author_ban_status         19382 non-null  object 
 7   video_view_count          19084 non-null  float64
 8   video_like_count          19084 non-null  float64
 9   video_share_count         19084 non-null  float64
 10  video_download_count      19084 non-null  float64
 11  video_comment_count       19084 non-null  float64
dtypes: float64(5), int64(3), object(4)
memory usage: 1.8+ MB


In [11]:
# Generate a table of descriptive statistics about the data
df_tiktok.describe(include='all')

Unnamed: 0,#,claim_status,video_id,video_duration_sec,video_transcription_text,verified_status,author_ban_status,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
count,19382.0,19084,19382.0,19382.0,19084,19382,19382,19084.0,19084.0,19084.0,19084.0,19084.0
unique,,2,,,19012,2,3,,,,,
top,,claim,,,a friend read in the media a claim that badmi...,not verified,active,,,,,
freq,,9608,,,2,18142,15663,,,,,
mean,9691.5,,5627454000.0,32.421732,,,,254708.558688,84304.63603,16735.248323,1049.429627,349.312146
std,5595.245794,,2536440000.0,16.229967,,,,322893.280814,133420.546814,32036.17435,2004.299894,799.638865
min,1.0,,1234959000.0,5.0,,,,20.0,0.0,0.0,0.0,0.0
25%,4846.25,,3430417000.0,18.0,,,,4942.5,810.75,115.0,7.0,1.0
50%,9691.5,,5618664000.0,32.0,,,,9954.5,3403.5,717.0,46.0,9.0
75%,14536.75,,7843960000.0,47.0,,,,504327.0,125020.0,18222.0,1156.25,292.0


In [12]:
# Check for missing values
df_tiktok.isna().sum()

#                             0
claim_status                298
video_id                      0
video_duration_sec            0
video_transcription_text    298
verified_status               0
author_ban_status             0
video_view_count            298
video_like_count            298
video_share_count           298
video_download_count        298
video_comment_count         298
dtype: int64

In [14]:
# Drop rows with missing values
df_tiktok = df_tiktok.dropna(axis=0).reset_index()
df_tiktok.head(10)

Unnamed: 0,index,#,claim_status,video_id,video_duration_sec,video_transcription_text,verified_status,author_ban_status,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
0,0,1,claim,7017666017,59,someone shared with me that drone deliveries a...,not verified,under review,343296.0,19425.0,241.0,1.0,0.0
1,1,2,claim,4014381136,32,someone shared with me that there are more mic...,not verified,active,140877.0,77355.0,19034.0,1161.0,684.0
2,2,3,claim,9859838091,31,someone shared with me that american industria...,not verified,active,902185.0,97690.0,2858.0,833.0,329.0
3,3,4,claim,1866847991,25,someone shared with me that the metro of st. p...,not verified,active,437506.0,239954.0,34812.0,1234.0,584.0
4,4,5,claim,7105231098,19,someone shared with me that the number of busi...,not verified,active,56167.0,34987.0,4110.0,547.0,152.0
5,5,6,claim,8972200955,35,someone shared with me that gross domestic pro...,not verified,under review,336647.0,175546.0,62303.0,4293.0,1857.0
6,6,7,claim,4958886992,16,someone shared with me that elvis presley has ...,not verified,active,750345.0,486192.0,193911.0,8616.0,5446.0
7,7,8,claim,2270982263,41,someone shared with me that the best selling s...,not verified,active,547532.0,1072.0,50.0,22.0,11.0
8,8,9,claim,5235769692,50,someone shared with me that about half of the ...,not verified,active,24819.0,10160.0,1050.0,53.0,27.0
9,9,10,claim,4660861094,45,someone shared with me that it would take a 50...,verified,active,931587.0,171051.0,67739.0,4104.0,2540.0


## **2. Variable Selection for A/B Testing:**
Based on the exploratory analysis, the following variables were chosen for A/B testing due to their relevance and potential to impact the analysis:
- **Claim_status:** This variable categorizes videos as either "opinion" or "claim." It is important for understanding differences in user engagement metrics (e.g., views, likes) based on the nature of the video content.
- **Verified_status:** Indicates whether the video was published by a "verified" or "not verified" user, which can influence metrics like viewership and engagement levels.
- **Author_ban_status:** Categorizes the author's account status as "active," "under scrutiny," or "banned." 
- **Video_view_count:** Represents the number of times a video has been viewed.
- **Video_like_count:** Indicates the total number of likes a video has received, providing insights into user engagement and audience preferences.
- **Video_share_count:** Measures the number of times a video has been shared, an important metric for understanding content virality.

### **Categorical Variables**

In [19]:
# Display unique values for categorical variables (selected)
print("Unique values for the 'claim_status'")
print(f'-> {df_tiktok["claim_status"].unique()}')
print()
print("Unique values for the 'verified_status'")
print(f'-> {df_tiktok["verified_status"].unique()}')
print()
print("Unique values for the 'author_ban_status'")
print(f'-> {df_tiktok["author_ban_status"].unique()}')

Unique values for the 'claim_status'
-> ['claim' 'opinion']

Unique values for the 'verified_status'
-> ['not verified' 'verified']

Unique values for the 'author_ban_status'
-> ['under review' 'active' 'banned']


## **3.  Hypothesis Testing:**
This section outlines the hypotheses formulated to analyze differences between groups in the dataset. Each hypothesis includes a null hypothesis (H₀) and an alternative hypothesis (H₁) to evaluate significant relationships among key variables:

**Case 1:Analysis of Video View Count Differences by Claim Status (Opinion vs Claim)**
- $H_0$: There is no significant difference in the average video view count (video_view_count) between videos labeled as "opinion" and those labeled as "claim."
- $H_A$: There is a significant difference in the average video view count (video_view_count) between videos labeled as "opinion" and those labeled as "claim."

**Case 2:Analysis of Video View Count Differences by Verified Status (Verified vs Not Verified)**
- $H_0$: There is no significant difference in the average video view count (video_view_count) between videos labeled as "opinion" and those labeled as "claim."
- $H_A$: There is a significant difference in the average video view count (video_view_count) between videos labeled as "opinion" and those labeled as "claim."

**Case 3: Analysis of Video Like Count Differences by Verified Status (Verified vs Not Verified)**
- $H_0$: There is no significant difference in the average video like count (video_like_count) between videos published by "verified" and "not verified" users."
- $H_A$: There is a significant difference in the average video like count (video_like_count) between videos published by "verified" and "not verified" users."

**Case 4: Analysis of Video Comment Count Differences by Author Ban Status (Active vs Banned)**
- $H_0$: There is no significant difference in the average video comment count (video_comment_count) between authors with "active" status and those "banned."
- $H_A$: There is a significant difference in the average video comment count (video_comment_count) between authors with "active" status and those "banned."

**Case 5: Analysis of Video Share Count Differences by Claim Status (Opinion vs Claim)**
- $H_0$: There is no significant difference in the average video share count (video_share_count) between videos labeled as "opinion" and those labeled as "claim."
- $H_A$: There is a significant difference in the average video share count (video_share_count) between videos labeled as "opinion" and those labeled as "claim."

**Case 6: Analysis of Video Download Count Differences by Author Ban Status (Active vs Banned)**
- $H_0$: There is no significant difference in the average video download count (video_download_count) between authors with "active" status and those "banned."
- $H_A$: There is a significant difference in the average video download count (video_download_count) between authors with "active" status and those "banned."


## **4.  Perform A/B Testing:**
<p style="text-align: justify;"> This section presents the application of hypothesis testing to evaluate differences in key video performance metrics across various groups. Six specific cases are analyzed: (1) differences in video view count by claim status (Opinion vs Claim), (2) differences in video view count by verified status (Verified vs Not Verified), (3) differences in video like count by verified status, (4) differences in video comment count by author ban status (Active vs Banned), (5) differences in video share count by claim status, and (6) differences in video download count by author ban status. For each case, Levene's Test is used to assess the homogeneity of variances, followed by a Two-Sample T-Test (with Welch’s correction if needed) to determine whether the observed differences in means are statistically significant. </p>

**Considerations:**
- A significance level of 5% will be used for the analysis.
- To perform hypothesis testing, the assumptions of normality and homogeneity of variances should be evaluated.
- For this analysis, it is assumed that the normality condition is met due to the large sample size, relying on the Central Limit Theorem.

### **4.1. Case 1: Analysis of Video View Count Differences by Claim Status (Opinion vs Claim)**

Preliminary Analysis: Average 'video_view_count' by "claim_status"

In [26]:
df_tiktok.groupby("claim_status").mean(numeric_only=True)[["video_view_count"]]

Unnamed: 0_level_0,video_view_count
claim_status,Unnamed: 1_level_1
claim,501029.452748
opinion,4956.43225


The descriptive analysis shows a notable difference in the average "video view count" between videos with a "claim" status (501029.45) and those with an "opinion" status (4956.43). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_view_count' by "claim_status"

In [30]:
sig_level = 0.05
claim = df_tiktok[df_tiktok["claim_status"] == "claim"]["video_view_count"]
opinion = df_tiktok[df_tiktok["claim_status"] == "opinion"]["video_view_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [31]:
stat, p = stats.levene(claim, opinion)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 28065.719920139956, p-value = 0.0
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [32]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=claim, b=opinion, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: 166.88857822856752
P-value: 0.0


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video view count" between videos with a "claim" status and those with an "opinion" status.

### **4.2. Case 2: Analysis of Video View Count Differences by Verified Status (Verified vs Not Verified)**

Preliminary Analysis: Average 'video_view_count' by "verified_status"

In [66]:
df_tiktok.groupby("verified_status").mean(numeric_only=True)[["video_view_count"]]

Unnamed: 0_level_0,video_view_count
verified_status,Unnamed: 1_level_1
not verified,265663.785339
verified,91439.164167


The descriptive analysis shows a notable difference in the average "video view count" between accounts with a "verified" status (91439.16) and those with a "no verified" status (265663.79). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_view_count' by "claim_status"

In [68]:
sig_level = 0.05
verified = df_tiktok[df_tiktok["verified_status"] == "verified"]["video_view_count"]
not_verified = df_tiktok[df_tiktok["verified_status"] == "not verified"]["video_view_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [69]:
stat, p = stats.levene(verified, not_verified)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 392.01297439069697, p-value = 2.221527997940107e-86
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [72]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=verified, b=not_verified, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: -25.499441780633777
P-value: 2.6088823687177823e-120


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video view count" between videos from verified accounts and those from not verified accounts

### **4.3. Case 3: Analysis of Video Like Count Differences by Verified Status (Verified vs Not Verified)**

Preliminary Analysis: Average 'video_like_count' by verified_status

In [34]:
df_tiktok.groupby("verified_status").mean(numeric_only=True)[["video_like_count"]]

Unnamed: 0_level_0,video_like_count
verified_status,Unnamed: 1_level_1
not verified,87925.772422
verified,30337.633333


The descriptive analysis shows a notable difference in the average "video like count" between accounts with a "verified" status (30337.63) and those with a "no verified" status (87925.77). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_like_count' by "verified_status"

In [37]:
sig_level = 0.05
verified = df_tiktok[df_tiktok["verified_status"] == "verified"]["video_like_count"]
not_verified = df_tiktok[df_tiktok["verified_status"] == "not verified"]["video_like_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [39]:
stat, p = stats.levene(verified, not_verified)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 214.40197806082364, p-value = 2.751408468690354e-48
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [40]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=verified, b=not_verified, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: -21.315562151092116
P-value: 4.6511316028672245e-89


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video like count" between videos from verified accounts and those from not verified accounts

### **4.4. Case 4: Analysis of Video Comment Count Differences by Author Ban Status (Active vs Banned)**

Preliminary Analysis: Average 'video_comment_count' by "author_ban_status"

In [42]:
df_tiktok.groupby("author_ban_status").mean(numeric_only=True)[["video_comment_count"]]

Unnamed: 0_level_0,video_comment_count
author_ban_status,Unnamed: 1_level_1
active,295.134499
banned,614.956575
under review,542.480639


The descriptive analysis shows a notable difference in the average "video comment count" between accounts with a "active" status (295.13) and those with a "banned" status (614.96). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_comment_count' by "verified_status"

In [47]:
sig_level = 0.05
active = df_tiktok[df_tiktok["author_ban_status"] == "active"]["video_comment_count"]
banned = df_tiktok[df_tiktok["author_ban_status"] == "banned"]["video_comment_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [55]:
stat, p = stats.levene(active, banned)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 183.5200131699242, p-value = 1.3570254106498014e-41
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [49]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=active, b=banned, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: -12.372524780434118
P-value: 8.086980632387777e-34


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video comment count" between videos from active accounts and banned accounts.

### **4.5. Case 5: Analysis of Video Share Count Differences by Claim Status (Opinion vs Claim)**

Preliminary Analysis: Average 'video_share_count' by "claim_status"

In [52]:
df_tiktok.groupby("claim_status").mean(numeric_only=True)[["video_share_count"]]

Unnamed: 0_level_0,video_share_count
claim_status,Unnamed: 1_level_1
claim,33026.416216
opinion,217.145631


The descriptive analysis shows a notable difference in the average "video share count" between videos with a "claim" status (33,026.42) and those with an "opinion" status (217.15). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_share_count' by "verified_status"

In [53]:
sig_level = 0.05
claim = df_tiktok[df_tiktok["claim_status"] == "claim"]["video_share_count"]
opinion = df_tiktok[df_tiktok["claim_status"] == "opinion"]["video_share_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [56]:
stat, p = stats.levene(active, banned)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 183.5200131699242, p-value = 1.3570254106498014e-41
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [65]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=claim, b=opinion, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: 82.92341391655332
P-value: 0.0


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video share count" between videos with a "claim" status and those with an "opinion" status.

### **4.6. Case 6: Analysis of Video Download Count Differences by Author Ban Status (Active vs Banned)**

Preliminary Analysis: Average 'video_dowload_count' by "author_ban_status"

In [74]:
df_tiktok.groupby("author_ban_status").mean(numeric_only=True)[["video_download_count"]]

Unnamed: 0_level_0,video_download_count
author_ban_status,Unnamed: 1_level_1
active,882.276344
banned,1886.296024
under review,1631.734753


The descriptive analysis shows a notable difference in the average "video download count" between accounts with a "active" status (882.28) and those with a "banned" status (1886.30). To formally evaluate whether this difference is statistically significant and not due to random variation, a Two-Sample t-test will be performed. This test will provide statistical evidence to support or refute the observed disparity.

Data Preparation: Splitting 'video_download_count' by "author_ban_status"

In [None]:
sig_level = 0.05
active = df_tiktok[df_tiktok["author_ban_status"] == "active"]["video_download_count"]
banned = df_tiktok[df_tiktok["author_ban_status"] == "banned"]["video_download_count"]

Before performing the Two-Sample t-test, we evaluated the assumption of homogeneity of variances (equal variance).

In [75]:
stat, p = stats.levene(active, banned)
print(f"Levene’s Test: stat = {stat}, p-value = {p}")

# Check if the variances are equal (p-value >= 0.05)
if p >= 0.05:
    print("-> Homogeneity of variances (equal variances) is met (p >= 0.05).")
else:
    print("-> Homogeneity of variances (equal variances) is not met (p < 0.05).")

Levene’s Test: stat = 183.5200131699242, p-value = 1.3570254106498014e-41
-> Homogeneity of variances (equal variances) is not met (p < 0.05).


In [76]:
# Perform Two Sample t-test
tstatistic, pvalue= stats.ttest_ind(a=active, b=banned, equal_var=False)
print("Results:")
print(f'T-statistic: {tstatistic}')
print(f'P-value: {pvalue}')

Results:
T-statistic: -12.372524780434118
P-value: 8.086980632387777e-34


**Conclusion:** The p-value is far below the significance level of 0.05, so we reject the null hypothesis ($H_0$). This indicates that there is a statistically significant difference in the "video download count" between videos from active accounts and banned accounts.

## **5.  CONCLUSIONS AND RECOMMENDATIONS:**

- **Impact of Claim Status on Video View Count:** A statistically significant difference was found in the average video view count between videos labeled as "claim" and those labeled as "opinion." Videos with a "claim" status recorded significantly higher view counts, suggesting that this labeling strongly influences audience engagement.
- **Influence of Verified Status on Video View Count:** Videos from accounts with a "not verified" status demonstrated significantly higher average view counts compared to videos from "verified" accounts. This finding challenges the assumption that verified accounts automatically attract larger audiences.
- **Effect of Verified Status on Video Like Count:** A statistically significant difference was observed in the average video like count between verified and not verified accounts. Videos from "not verified" users had higher average like counts, indicating a potential disconnect between verification status and audience engagement in terms of likes.
- **Influence of Author Ban Status on Video Comment Count:** Videos published by authors with a "banned" status recorded significantly higher average comment counts compared to those with "active" status. This suggests that controversial or banned accounts may drive higher engagement in terms of comments, potentially reflecting user responses to contentious content.
- **Impact of Claim Status on Video Share Count:** A statistically significant difference was found in the average video share count between videos labeled as "claim" and those labeled as "opinion." Videos with a "claim" status were shared significantly more frequently, highlighting their potential for virality.
- **Influence of Author Ban Status on Video Download Count:** Videos published by authors with a "banned" status recorded significantly higher download counts compared to those with an "active" status. This may indicate that controversial content tends to be downloaded more often for later viewing or sharing.
- Further analyses could investigate how different types of content within "claim" and "opinion" categories affect user engagement metrics like likes, comments, and shares.
- Analyze why "banned" accounts drive higher engagement across comments and downloads, and consider if this content aligns with platform guidelines and user preferences.
- Extend the analysis to explore whether other factors, such as video duration to identify other potential drivers of performance.
