# **TikTok Claims A-B Testing**

**The goal** of this third notebook is to <span style="color: var(--vscode-foreground);">discover if&nbsp;there is a relationship between the account-verified-status and the average-view-counts. We will then&nbsp;create a visualization of our findings and add that visualization to the report we share to stakeholders.</span>

**Part 1:** Load the Data

**Part 2:** <span style="color: var(--vscode-foreground);">&nbsp;Prepare the Data</span>

**Part 3:** Construct the A-B Test

**Part 4:** Share Findings with Stakeholders

## **1: Load the Data**

### **Build dataframe**

In [2]:
# Import packages for data manipulation
import pandas as pd
import numpy as np

# Import packages for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Import packages for statistical analysis/hypothesis testing
from scipy import stats

# Load dataset into dataframe
data = pd.read_csv("tiktok_dataset.csv")


## **2: Prepare the Data**

### **Clean Data**

In [4]:
# Drop rows with missing values
data = data.dropna(axis=0)


### **Preliminary Data Exploration**

We are interested in the relationship between `verified_status` and `video_view_count`, so we examine the mean values of `video_view_count` for each group of `verified_status` in the sample data.

In [5]:
# Compute the mean `video_view_count` for each group in `verified_status`
data.groupby("verified_status")["video_view_count"].mean()


verified_status
not verified    265663.785339
verified         91439.164167
Name: video_view_count, dtype: float64

Based on the averages shown, it appears that not-verified accounts have higher view-counts than verified-accounts. However, this difference might arise from random sampling, rather than being a true difference in fare amount. To assess whether the difference is statistically significant, we conduct a hypothesis test.

## **3: Construct the A-B Test**

### **Hypothesis Testing**

- **Null hypothesis**: There is no difference in number of views between TikTok videos posted by verified accounts and TikTok videos posted by unverified accounts (any observed difference in the sample data is due to chance or sampling variability).
- **Alternative hypothesis**: There is a difference in number of views between TikTok videos posted by verified accounts and TikTok videos posted by unverified accounts (any observed difference in the sample data is due to an actual difference in the corresponding population means).

### **Significance Level**

We choose 5% as the significance level and proceed with a two-sample t-test.

### **Find P-Value**

In [9]:
# Conduct a two-sample t-test to compare means

# Save each sample in a variable
not_verified = data[data["verified_status"] == "not verified"]["video_view_count"]
verified = data[data["verified_status"] == "verified"]["video_view_count"]

# Implement a t-test using the two samples
stats.ttest_ind(a=not_verified, b=verified, equal_var=False)

Ttest_indResult(statistic=25.499441780633777, pvalue=2.6088823687177823e-120)

### **Hypothesis Result**

<span style="color: var(--vscode-foreground);">Since the p-value is extremely small (much smaller than the significance level of 5%), we reject the null hypothesis.&nbsp;</span> 

<span style="color: var(--vscode-foreground);">Our conclusion is that there </span> **is** <span style="color: var(--vscode-foreground);"> a statistically significant difference in the mean video view count between verified and unverified accounts on TikTok.</span>

## **4: Share Findings with Stakeholders**

### **Conclusions**

<span style="color: var(--vscode-foreground);">The analysis shows that there is a statistically significant difference in the average view counts between videos from verified accounts and videos from unverified accounts. This suggests there might be fundamental behavioral differences between these two groups of accounts.</span>

It would be interesting to investigate the root cause of this behavioral difference. For example, do unverified accounts tend to post more clickbait videos? Or are unverified accounts associated with spam bots that help inflate view counts?

### **Next Steps**

The next step will be to build a regression model on verified\_status. A regression model is the natural next step because the end goal is to make predictions on claim status. A regression model for verified\_status can help analyze user behavior in this group of verified users. TECHNICAL NOTE: Because the data is skewed, and there is a significant difference in account types, we will probably need to build a _logistic_ regression model.