# **TikTok Project**
**Course 2 - Get Started with Python**

Welcome to the TikTok Project!

You have just started as a data professional at TikTok.

The team is still in the early stages of the project. You have received notice that TikTok's leadership team has approved the project proposal. To gain clear insights to prepare for a claims classification model, TikTok's provided data must be examined to begin the process of exploratory data analysis (EDA).

A notebook was structured and prepared to help you in this project. Please complete the following questions.

# **Course 2 End-of-course project: Inspect and analyze data**

In this activity, you will examine data provided and prepare it for analysis.
<br/>

**The purpose** of this project is to investigate and understand the data provided. This activity will:

1.   Acquaint you with the data

2.   Compile summary information about the data

3.   Begin the process of EDA and reveal insights contained in the data

4.   Prepare you for more in-depth EDA, hypothesis testing, and statistical analysis

**The goal** is to construct a dataframe in Python, perform a cursory inspection of the provided dataset, and inform TikTok data team members of your findings.
<br/>
*This activity has three parts:*

**Part 1:** Understand the situation
* How can you best prepare to understand and organize the provided TikTok information?

**Part 2:** Understand the data

* Create a pandas dataframe for data learning and future exploratory data analysis (EDA) and statistical activities

* Compile summary information about the data to inform next steps

**Part 3:** Understand the variables

* Use insights from your examination of the summary data to guide deeper investigation into variables

<br/>

To complete the activity, follow the instructions and answer the questions below. Then, you will us your responses to these questions and the questions included in the Course 2 PACE Strategy Document to create an executive summary.

Be sure to complete this activity before moving on to Course 3. You can assess your work by comparing the results to a completed exemplar after completing the end-of-course project.

# **Identify data types and compile summary information**


Throughout these project notebooks, you'll see references to the problem-solving framework PACE. The following notebook components are labeled with the respective PACE stage: Plan, Analyze, Construct, and Execute.

# **PACE stages**

<img src="images/Pace.png" width="100" height="100" align=left>

   *        [Plan](#scrollTo=psz51YkZVwtN&line=3&uniqifier=1)
   *        [Analyze](#scrollTo=mA7Mz_SnI8km&line=4&uniqifier=1)
   *        [Construct](#scrollTo=Lca9c8XON8lc&line=2&uniqifier=1)
   *        [Execute](#scrollTo=401PgchTPr4E&line=2&uniqifier=1)

<img src="images/Plan.png" width="100" height="100" align=left>


## **PACE: Plan**

Consider the questions in your PACE Strategy Document and those below to craft your response:



### **Task 1. Understand the situation**

*   How can you best prepare to understand and organize the provided information?


*Begin by exploring your dataset and consider reviewing the Data Dictionary.*

==> ENTER YOUR RESPONSE HERE

<img src="images/Analyze.png" width="100" height="100" align=left>

## **PACE: Analyze**

Consider the questions in your PACE Strategy Document to reflect on the Analyze stage.

### **Task 2a. Imports and data loading**

Start by importing the packages that you will need to load and explore the dataset. Make sure to use the following import statements:
*   `import pandas as pd`

*   `import numpy as np`


In [1]:
# Import packages

import pandas as pd
import numpy as np

Then, load the dataset into a dataframe. Creating a dataframe will help you conduct data manipulation, exploratory data analysis (EDA), and statistical activities.

**Note:** As shown in this cell, the dataset has been automatically loaded in for you. You do not need to download the .csv file, or provide more code, in order to access the dataset and proceed with this lab. Please continue with this activity by completing the following instructions.

In [2]:
# Load dataset into dataframe
data = pd.read_csv("tiktok_dataset.csv")

### **Task 2b. Understand the data - Inspect the data**

View and inspect summary information about the dataframe by **coding the following:**

1. `data.head(10)`
2. `data.info()`
3. `data.describe()`

*Consider the following questions:*

**Question 1:** When reviewing the first few rows of the dataframe, what do you observe about the data? What does each row represent?

**Question 2:** When reviewing the `data.info()` output, what do you notice about the different variables? Are there any null values? Are all of the variables numeric? Does anything else stand out?

**Question 3:** When reviewing the `data.describe()` output, what do you notice about the distributions of each variable? Are there any questionable values? Does it seem that there are outlier values?

















In [3]:
data.head()

Unnamed: 0,#,claim_status,video_id,video_duration_sec,video_transcription_text,verified_status,author_ban_status,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
0,1,claim,7017666017,59,someone shared with me that drone deliveries a...,not verified,under review,343296.0,19425.0,241.0,1.0,0.0
1,2,claim,4014381136,32,someone shared with me that there are more mic...,not verified,active,140877.0,77355.0,19034.0,1161.0,684.0
2,3,claim,9859838091,31,someone shared with me that american industria...,not verified,active,902185.0,97690.0,2858.0,833.0,329.0
3,4,claim,1866847991,25,someone shared with me that the metro of st. p...,not verified,active,437506.0,239954.0,34812.0,1234.0,584.0
4,5,claim,7105231098,19,someone shared with me that the number of busi...,not verified,active,56167.0,34987.0,4110.0,547.0,152.0


In [4]:
data.shape

(19382, 12)

In [5]:
# Get summary info
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19382 entries, 0 to 19381
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   #                         19382 non-null  int64  
 1   claim_status              19084 non-null  object 
 2   video_id                  19382 non-null  int64  
 3   video_duration_sec        19382 non-null  int64  
 4   video_transcription_text  19084 non-null  object 
 5   verified_status           19382 non-null  object 
 6   author_ban_status         19382 non-null  object 
 7   video_view_count          19084 non-null  float64
 8   video_like_count          19084 non-null  float64
 9   video_share_count         19084 non-null  float64
 10  video_download_count      19084 non-null  float64
 11  video_comment_count       19084 non-null  float64
dtypes: float64(5), int64(3), object(4)
memory usage: 1.8+ MB


In [6]:
# Get summary statistics
data.describe()

Unnamed: 0,#,video_id,video_duration_sec,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
count,19382.0,19382.0,19382.0,19084.0,19084.0,19084.0,19084.0,19084.0
mean,9691.5,5627454000.0,32.421732,254708.558688,84304.63603,16735.248323,1049.429627,349.312146
std,5595.245794,2536440000.0,16.229967,322893.280814,133420.546814,32036.17435,2004.299894,799.638865
min,1.0,1234959000.0,5.0,20.0,0.0,0.0,0.0,0.0
25%,4846.25,3430417000.0,18.0,4942.5,810.75,115.0,7.0,1.0
50%,9691.5,5618664000.0,32.0,9954.5,3403.5,717.0,46.0,9.0
75%,14536.75,7843960000.0,47.0,504327.0,125020.0,18222.0,1156.25,292.0
max,19382.0,9999873000.0,60.0,999817.0,657830.0,256130.0,14994.0,9599.0


===> ENTER YOUR RESPONSE TO QUESTIONS 1-3 HERE

Question 1: When reviewing the first few rows of the dataframe, what do you observe about the data? What does each row represent?
Observations:
1.claim_status: The "claim_status" column indicates whether the video is flagged with a "claim." 
2.video_id: This column contains unique identifiers for each video.
3.video_duration_sec: The "video_duration_sec" column shows the duration of each video in seconds.
4.video_transcription_text: This column contains the transcription of the video's audio, which includes the content that led to the "claim."
5.verified_status: This indicates whether the video has been "verified" or "not verified." 
6.author_ban_status: This reflects the status of the author's account. 
7.video_view_count: This column shows the number of views each video has received.
8.video_like_count: This indicates the number of likes each video has garnered.

Question 2: When reviewing the data.info() output, what do you notice about the different variables? Are there any null values? Are all of the variables numeric? Does anything else stand out?

1.The DataFrame has a mix of data types: integers (int64), floats (float64), and objects (which typically represent strings or categorical data).
2.Most of the columns have 19,382 non-null values, indicating that they are fully populated.
3.I think that, The column # appears to be an index column or identifier, likely not relevant for analysis.

Question 3: When reviewing the data.describe() output, what do you notice about the distributions of each variable? Are there any questionable values? Does it seem that there are outlier values?
1.Distributions: The data is right-skewed as indicated by the mean values being higher than the 50% i.e median values in most columns.
2.Questionable Values: The minimum values for video_like_count, video_share_count, video_download_count, and video_comment_count being zero may be valid but should be verified within the context.
3.Outliers: The large gap between the maximum values and the interquartile range (25% and 75%) suggests potential outliers in columns like video_view_count, video_like_count, and others.

### **Task 2c. Understand the data - Investigate the variables**

In this phase, you will begin to investigate the variables more closely to better understand them.

You know from the project proposal that the ultimate objective is to use machine learning to classify videos as either claims or opinions. A good first step towards understanding the data might therefore be examining the `claim_status` variable. Begin by determining how many videos there are for each different claim status.

In [7]:
# What are the different values for claim status and how many of each are in the data?
claim_status_count= (data['claim_status'].str.lower() == 'claim').sum()
opinion_status_count = (data['claim_status'].str.lower() == 'opinion').sum()

In [8]:
# lets check for black in the claim_status field
missing_claim_status = data['claim_status'].isnull().sum()

In [9]:
print(claim_status_count)
print(opinion_status_count)
print(missing_claim_status)

9608
9476
298


**Question:** What do you notice about the values shown?

1. Balanced Distribution: The counts for "claim" and "opinion" are relatively balanced, which is good for your machine learning model, as it reduces the risk of bias towards one category.
2. Handling Missing Data: The 298 missing values in the claim_status column will need to be addressed before modeling. Depending on your analysis goals, you could either exclude these rows or try to impute a value.

Next, examine the engagement trends associated with each different claim status.

Start by using Boolean masking to filter the data according to claim status, then calculate the mean and median view counts for each claim status.

In [10]:
claim_data = data[data['claim_status'].str.lower() == 'claim']
opinion_data= data[data['claim_status'].str.lower()== 'opinion']

In [11]:
claim_data

Unnamed: 0,#,claim_status,video_id,video_duration_sec,video_transcription_text,verified_status,author_ban_status,video_view_count,video_like_count,video_share_count,video_download_count,video_comment_count
0,1,claim,7017666017,59,someone shared with me that drone deliveries a...,not verified,under review,343296.0,19425.0,241.0,1.0,0.0
1,2,claim,4014381136,32,someone shared with me that there are more mic...,not verified,active,140877.0,77355.0,19034.0,1161.0,684.0
2,3,claim,9859838091,31,someone shared with me that american industria...,not verified,active,902185.0,97690.0,2858.0,833.0,329.0
3,4,claim,1866847991,25,someone shared with me that the metro of st. p...,not verified,active,437506.0,239954.0,34812.0,1234.0,584.0
4,5,claim,7105231098,19,someone shared with me that the number of busi...,not verified,active,56167.0,34987.0,4110.0,547.0,152.0
...,...,...,...,...,...,...,...,...,...,...,...,...
9603,9604,claim,3883493316,49,a colleague discovered on the radio a claim th...,not verified,active,737177.0,460743.0,54550.0,8119.0,3372.0
9604,9605,claim,4765029942,9,a colleague discovered on the radio a claim th...,verified,active,546987.0,360080.0,79346.0,4537.0,2432.0
9605,9606,claim,3513102998,27,a colleague discovered on the radio a claim th...,not verified,under review,885521.0,209475.0,44286.0,1210.0,794.0
9606,9607,claim,9461481859,27,a colleague discovered on the radio a claim th...,not verified,active,356747.0,99394.0,21016.0,1163.0,497.0


In [12]:
# What is the average view count of videos with "claim" status?
claim_mean_video_count = claim_data['video_view_count'].mean()
claim_median_video_count = claim_data['video_view_count'].median()

In [13]:
print(claim_mean_video_count)
print(claim_median_video_count)

501029.4527477102
501555.0


In [14]:
# What is the average view count of videos with "opinion" status?
opinion_mean_video_count = opinion_data['video_view_count'].mean()
opinion_median_video_count = opinion_data['video_view_count'].median()


In [15]:
print(opinion_mean_video_count)
print(opinion_median_video_count)

4956.43224989447
4953.0


**Question:** What do you notice about the mean and media within each claim category?

Claim Observation: The mean and median view counts for videos with a "claim" status are very close, suggesting that the distribution of view counts is likely symmetric with few extreme outliers. This could indicate that "claim" videos generally receive consistent engagement across the dataset.

Opinion Observation: Similarly, the mean and median view counts for videos with an "opinion" status are also very close, indicating a similar distribution pattern to "claim" videos. However, the view counts are significantly lower compared to "claim" videos, suggesting that "opinion" videos generally receive much less engagement.

Engagement Difference: There is a significant difference in engagement between videos labeled as "claim" and those labeled as "opinion," with "claim" videos receiving much higher average and median view counts. This suggests that videos categorized as "claims" are generally more popular or engaging with the audience compared to "opinions."


Now, examine trends associated with the ban status of the author.

Use `groupby()` to calculate how many videos there are for each combination of categories of claim status and author ban status.

In [16]:
# Get counts for each group combination of claim status and author ban status

ban_status_counts= data.groupby(['claim_status', 'author_ban_status']).size()

In [17]:
print(ban_status_counts)

claim_status  author_ban_status
claim         active               6566
              banned               1439
              under review         1603
opinion       active               8817
              banned                196
              under review          463
dtype: int64


In [18]:
# Reset the index to convert the result into a DataFrame
ban_status_counts = ban_status_counts.reset_index(name='video_count')

In [19]:
print(ban_status_counts)

  claim_status author_ban_status  video_count
0        claim            active         6566
1        claim            banned         1439
2        claim      under review         1603
3      opinion            active         8817
4      opinion            banned          196
5      opinion      under review          463


1. Question: What do you notice about the number of claims videos with banned authors? Why might this relationship occur?
Observations:
Number of Claim Videos with Banned Authors:

There are 1,439 videos with a "claim" status where the author has been banned.
Comparison with Other Categories:

The number of "claim" videos with banned authors is significantly higher than "opinion" videos with banned authors (which has only 196 videos).
However, "claim" videos are still more commonly associated with active authors (6,566 videos) than with banned ones.
Possible Reasons for This Relationship:
Content Violations:

Authors of "claim" videos might be more prone to making statements or spreading content that violates platform policies, leading to a higher likelihood of being banned. Claims might involve misinformation, controversial topics, or unverified facts, which could lead to actions against the authors.
Enforcement of Platform Rules:

The platform might have stricter monitoring and enforcement for content labeled as "claims," especially if these are flagged by users or automated systems as potentially harmful or misleading. This could result in more authors being banned if their content is found to breach guidelines.
Nature of Claims vs. Opinions:

"Claims" typically assert something as fact, which might attract more scrutiny compared to "opinions," which are seen as personal views. This higher scrutiny could lead to more bans if the claims are found to be false or dangerous.
Conclusion:
The data suggests that "claim" videos are associated with a higher risk of authors being banned, possibly due to the nature of the content and the platform's effort to enforce community guidelines. Understanding this relationship is important, especially if you're considering how different types of content are moderated on the platform.


Continue investigating engagement levels, now focusing on `author_ban_status`.

Calculate the median video share count of each author ban status.

In [20]:
# finding the datatyupe
data['video_share_count'].dtype

dtype('float64')

In [21]:
# finding missing values in the video_share_count field
missing_values=data['video_share_count'].isnull().sum()

In [22]:
print(missing_values)

298


In [23]:
# If there are missing values, you can either drop them or fill them with a placeholder (e.g., 0)

data['video_share_count'].fillna(0, inplace =True)

In [25]:
# Group by 'author_ban_status' and calculate the median of 'video_share_count'
median_share_count_by_ban_status = data.groupby('author_ban_status')['video_share_count'].median()


In [26]:
# What's the median video share count of each author ban status?
print(median_share_count_by_ban_status)

author_ban_status
active            411.0
banned          14429.0
under review     9300.0
Name: video_share_count, dtype: float64


2. Question: What do you notice about the share count of banned authors, compared to that of active authors? Explore this in more depth
Observations:
Median Share Count:
Active Authors: The median share count for videos by active authors is 411.
Banned Authors: The median share count for videos by banned authors is significantly higher at 14,429.
Under Review: The median share count for videos by authors under review is 9,380.
Exploration of the Discrepancy:
High Engagement Before Ban:

The high median share count for banned authors suggests that their content was highly engaging or controversial, leading to a large number of shares before they were banned. This could indicate that their videos were viral or widely discussed, which might have drawn attention from moderators.
Content Nature:

Banned authors may have been posting content that was provocative, controversial, or violated platform rules, attracting both a lot of attention and a lot of shares before their accounts were suspended. This type of content tends to spread quickly, especially if it triggers strong reactions.
Platform Response Time:

The platform might take time to identify and ban accounts that violate guidelines, during which the content continues to gain shares. By the time the account is banned, the video may already have been widely distributed.
Potential Viral Nature of Content:

The difference in median share counts also suggests that videos by banned authors might have a higher likelihood of going viral. The content could be sensational, misleading, or emotionally charged, which often drives higher engagement.

Use `groupby()` to group the data by `author_ban_status`, then use `agg()` to get the count, mean, and median of each of the following columns:
* `video_view_count`
* `video_like_count`
* `video_share_count`

Remember, the argument for the `agg()` function is a dictionary whose keys are columns. The values for each column are a list of the calculations you want to perform.

In [27]:
data.groupby(['author_ban_status']).agg({
    'video_view_count' : ['count', 'mean', 'median'],
    'video_like_count' : ['count', 'mean', 'median'],
    'video_share_count' : ['count', 'mean', 'median'],
})

Unnamed: 0_level_0,video_view_count,video_view_count,video_view_count,video_like_count,video_like_count,video_like_count,video_share_count,video_share_count,video_share_count
Unnamed: 0_level_1,count,mean,median,count,mean,median,count,mean,median
author_ban_status,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
active,15383,215927.039524,8616.0,15383,71036.533836,2222.0,15663,13859.202196,411.0
banned,1635,445845.439144,448201.0,1635,153017.236697,105573.0,1639,29925.729713,14429.0
under review,2066,392204.836399,365245.5,2066,128718.050339,71204.5,2080,25601.213462,9300.0


3.Question: What do you notice about the number of views, likes, and shares for banned authors compared to active authors?

Views:

Banned Authors: Banned authors have significantly higher mean and median view counts compared to active authors. Specifically, the average (mean) number of views for videos by banned authors is over twice that of active authors, and the median is dramatically higher. This indicates that videos by banned authors tend to attract more viewers.
Active Authors: Although active authors have a greater number of videos, their videos generally receive fewer views on average and at the median level.
Likes:

Banned Authors: The videos by banned authors also have a much higher mean and median like count. This suggests that their content is not only being viewed more but is also engaging more users to the point of receiving likes.
Active Authors: In contrast, the videos by active authors receive fewer likes on both an average and median basis, indicating lower overall engagement.
Shares:

Banned Authors: The share count for videos by banned authors is also significantly higher. The median share count for banned authors is particularly high, suggesting that their content is often shared more widely, possibly because it is controversial or provocative.
Active Authors: Videos by active authors, while more numerous, are shared less frequently on average and at the median level, which again points to lower engagement.

Now, create three new columns to help better understand engagement rates:
* `likes_per_view`: represents the number of likes divided by the number of views for each video
* `comments_per_view`: represents the number of comments divided by the number of views for each video
* `shares_per_view`: represents the number of shares divided by the number of views for each video

In [28]:
# Create a likes_per_view column
data['likes_per_view']= data['video_like_count']/data['video_view_count']

# Create a comments_per_view column
# Create the 'comments_per_view' column
data['comments_per_view'] = data['video_comment_count'] / data['video_view_count']


# Create a shares_per_view column
data['shares_per_view'] = data['video_share_count'] / data['video_view_count']

In [29]:
# Display the first few rows to confirm the new columns
print(data[['likes_per_view', 'comments_per_view', 'shares_per_view']].head())

   likes_per_view  comments_per_view  shares_per_view
0        0.056584           0.000000         0.000702
1        0.549096           0.004855         0.135111
2        0.108282           0.000365         0.003168
3        0.548459           0.001335         0.079569
4        0.622910           0.002706         0.073175


Use `groupby()` to compile the information in each of the three newly created columns for each combination of categories of claim status and author ban status, then use `agg()` to calculate the count, the mean, and the median of each group.

In [30]:
data.groupby(['claim_status', 'author_ban_status']).agg({
    'likes_per_view' : ['count', 'mean', 'median'], 
    'comments_per_view' : ['count', 'mean', 'median'], 
    'shares_per_view' : ['count', 'mean', 'median'], 
})


Unnamed: 0_level_0,Unnamed: 1_level_0,likes_per_view,likes_per_view,likes_per_view,comments_per_view,comments_per_view,comments_per_view,shares_per_view,shares_per_view,shares_per_view
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,median,count,mean,median,count,mean,median
claim_status,author_ban_status,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
claim,active,6566,0.329542,0.326538,6566,0.001393,0.000776,6566,0.065456,0.049279
claim,banned,1439,0.345071,0.358909,1439,0.001377,0.000746,1439,0.067893,0.051606
claim,under review,1603,0.327997,0.320867,1603,0.001367,0.000789,1603,0.065733,0.049967
opinion,active,8817,0.219744,0.21833,8817,0.000517,0.000252,8817,0.043729,0.032405
opinion,banned,196,0.206868,0.198483,196,0.000434,0.000193,196,0.040531,0.030728
opinion,under review,463,0.226394,0.228051,463,0.000536,0.000293,463,0.044472,0.035027


4. Question

How does the data for claim videos and opinion videos compare or differ? Consider views, comments, likes, and shares.
Summary:

Claim videos receive higher engagement across likes, comments, and shares compared to opinion videos.
Banned authors tend to produce content that has higher engagement rates, particularly in likes_per_view and shares_per_view.
Overall, claim videos are more effective at driving viewer interaction, likely due to their more assertive or provocative nature.

<img src="images/Construct.png" width="100" height="100" align=left>

## **PACE: Construct**

**Note**: The Construct stage does not apply to this workflow. The PACE framework can be adapted to fit the specific requirements of any project.




<img src="images/Execute.png" width="100" height="100" align=left>

## **PACE: Execute**

Consider the questions in your PACE Strategy Document and those below to craft your response.

### **Given your efforts, what can you summarize for Rosie Mae Bradshaw and the TikTok data team?**

*Note for Learners: Your answer should address TikTok's request for a summary that covers the following points:*

*   What percentage of the data is comprised of claims and what percentage is comprised of opinions?
*   What factors correlate with a video's claim status?
*   What factors correlate with a video's engagement level?

Summary for Rosie Mae Bradshaw and the TikTok Data Team:
Percentage of Claims vs. Opinions:

Claims: Approximately 50.3% of the videos in the dataset are classified as claims.
Opinions: The remaining 49.7% of the videos are classified as opinions.
Factors Correlating with a Video’s Claim Status:

Author Ban Status: There is a noticeable correlation between a video’s claim status and whether the author is banned. Claim videos are more frequently associated with banned authors, indicating that claims might be more likely to contain content that violates platform guidelines.
Engagement Metrics: Claim videos generally receive higher views, likes, comments, and shares compared to opinion videos. This suggests that content labeled as a claim tends to be more engaging and may attract more attention from viewers.
Factors Correlating with a Video’s Engagement Level:

Claim Status: Videos classified as claims exhibit higher engagement rates across all metrics (likes, comments, shares) than opinion videos.
Author Ban Status: Videos from banned authors, particularly those that are claims, have higher engagement levels. These videos tend to be more controversial or provocative, leading to more interaction before being flagged or banned.
Likes, Comments, Shares per View: Higher ratios of likes, comments, and shares per view are observed in claim videos, especially those by banned authors, indicating that these factors strongly correlate with a video's overall engagement.
Key Insights:
Claim videos are generally more engaging than opinion videos.
The ban status of an author is a significant factor influencing both the nature of the content (claim vs. opinion) and its engagement level, with banned authors producing more engaging yet potentially problematic content.
This summary should help TikTok better understand how content type (claims vs. opinions) and author behavior (ban status) relate to user engagement, which could inform future content moderation and recommendation strategies.


**Congratulations!** You've completed this lab. However, you may not notice a green check mark next to this item on Coursera's platform. Please continue your progress regardless of the check mark. Just click on the "save" icon at the top of this notebook to ensure your work has been logged.