Skip to content

Time series forecasting and natural language processing models that predict what time, text, and hashtag of social media content will drive engagement. E-commerces and businesses can benefit from using our predictive models by gaining the most branded-effect possible with a worldwide audience.

License

Notifications You must be signed in to change notification settings

Social-Media-Capstone/Social-Media-Engagement-Forecasting

Repository files navigation

Social Media Engagement Forecasting

By : Brad Gauvin, Jess Gardin, Meredith Wang, Saroj Duwal

Date: 09/2022 - present

Table of Content

Project Description

TikTok, a video sharing and relatively new social media platform (funded in 2016), has gained tremendous amount of popularity over the past few years. Understanding their "success metric" and knowing how to attract engagement is extremely important for business and individuals who want to develop their presence on there.

Along with other deliveralbes, a web APP with interactive dashboard is developed as an additional component to the project for both technical and non-technical skate-holders to grasp the key findings.

Business Goal

We used time series models to forecast engagement over time, along with natural language processing regression models to predict the key words that are likely to generate viral content. E-commerce, retail businesses, influencers, etc. can stratigically utilize ourpredictive model to push out content that would gain the most branded-effect possible with worlwide audience and generate revenue.

Initial Questions

▪️ What does trending video's duration distribution look like? Is most vidoes on TikTok short (<15?)

▪️ Is the avg. video duration of a category more than the other? If so, which video length drives the highest engagement for each category?

▪️ What's the avg engagement metrics of trending vidoe's in the past two years?

▪️ Where does TikTok stand compared to YouTube and Instagram?

▪️ Is a certian category's engagement significant more than the other?

▪️ Does creator's follower size correlate with engagement?

▪️ Are there certain key words/hashtags that drive engagement?

Deliverables

  • Report Notebook final_report.ipynb
  • Web app with interactive dashboard
  • Slide presentation for technical and non-technical skateholders
  • Project white paper for non-technical audience

Requirements

Before you run this notebook, please ensure you have the below packages installed.

Dependencies can be installed quickly with just a few lines of code.

%pip install notebook
%pip install numpy
%pip install pandas
%pip install matplotlib
%pip install seaborn
%pip install scipy
%pip install sklearn
%pip install nltk
%pip install xgboost
%pip install youtube-search-python
%pip install tensorflow

Data Dictionary

Variable Value Meaning
commentCount int The number of comments on a video
diggCount int The number of likes on a video
playCount int The number of views on a video
shareCount int The number of shares on a video
followerCount int The number of followers a creator has
heartCount int The total number of likes a creator has gotten for that account
videoCount int The number of videos a creator has posted as public
description string object The description of videos
time datetime object The time a video is posted, formatted in epoch time
hashtag string object The hashtag the caption of the video contains
category string obejct The category the video belongs to

Process

Acquisition

  • Data acquisition contains 3 platforms: TikTok, Youtube, Instagram; 5 categories: Fahion & Beauty, Humor, Political Contents, Food, Fitness & Lifestyle

  • Tiktok data is acquired through TikAPI, using search hashtag and search top influencers approach, detailed steps please reference acquisition

  • Youtube data is acquired through youtube-search-python 1.6.6 (built-in Python library), detailed steps please reference acquisition

  • Instagram data is acquired and extracted through a confidential source and Instagram Graph API. If you're interested in the data source, please contact Meredith Wang directly

  • Created env.py that contains API key credentials to access the data from TikTok API, and Instagram Graph API

  • Created acquire.py file that contains functions for data acuiqisiont

Preparation

Automated Data Extraction

Data acquired resulted in multiple data structures and nested dictionary. We automated the process to extract useful information from the messy data.

Missing Values
  • All the null values are dropped

  • We feel comfortable dropping null values because there's only 1 row of missing value out of 1.6 million observations

Data Type Conversion
  • The date and video duration of the data from 3 platforms all follow different format. We converted them into a universal datetime object and numerical data type

  • Numercial features are converted to its correct data type

Data Encoding
  • Categorized numerial features into categorical variables

  • Created dummy variables of categorical features

  • Converted wide dataframe format to long format for engagement metric, both long and wide formats are used for explore

Text Cleaning
  • Convert text to all lower case for normality

  • Remove any accented characters, non-ASCII characters

  • Remove special characters

  • Lemmatization

  • Remove stopwords

  • TF-IDF to convert text to numerical values based on text importance

Exploration

roughviz

  • Addressed initial questions to find what are the key features that drive engagement

  • Explored each feature's correlation with engagement metrics

  • Used statistical testing and visualizations to understand relationships between features and find driver of engagement.

  • We built interactive dashboard for the audience to grasp the key findings of our exploration.

Modeling

Time Series Forecasting

  • Last Observed Value (Baseline)
  • Moving Average
  • Holt's Linear Trend
  • Previous Cycle
  • Facebook Prophet
  • ARIMA
  • Long Short Term Memory Neural Network

Natural Language Processing

  • Linear Regression
  • Random Forest Regressor
  • Lasso-Lars
  • Generalized Linear Model

Conclusion

Key Findings

▪️ Over 93% of trending content on TikTok are short(0-15s) & medium(15-60s) videos.

▪️ Video duration and engagement rate is dependent on the cateogory. For example: humor content have the highest performance with extra-long (>3mins) videos, whereas political content perform the best with short (0-15s) videos.

▪️ Trending content of all categories on TikTok have 11M views, 1.4M likes, 10.7K comments, and 34.5K shares on average.

▪️ Total engagement of 2-year global trending content of each platform: TikTok is 6x more than YouTube, and more than 1000x more than Instagram.

▪️ TikTok total engagement has increated 980% from 2019 to Sep 2022.

▪️ TikTok users respond to major social/political events significantly. Engagement peak/rise present prior, during, and after time period of the events.

▪️ Trending content creators' follower size has decreased since Jan 2021. TikTok's algorithm has been incentivizing small creators to push out content.

▪️ Content-description text frequency DOES NOT correlate with engagement. There are specific words that drive engagement for each niche. Our natural language processing general linear model predicts word choice 42% more accurate than baseline.

▪️ Facebook Prophet model forecast engagement with 57% improvement compared to baseline.

▪️ Total engagement on TikTok is predicted to increase 27% within the next year (Oct 2022 - Oct 2023).

Next Steps

Despite the overall effectiveness of our best-performing model, there is always room for improvement and optimization. We're currently working on future devlopenet including:

▪️ Taking a closer look the differences between influencers and common users.

▪️ Including more niches/categories into our scope. For example: pets, sports, dance.

▪️ Doing bi-gram & tri-gram analysis on content description as long as the content of comments on videos.

▪️ Getting users' demographic data and analyzing the relatinship of location, user's age, etc. with engagement.

About

Time series forecasting and natural language processing models that predict what time, text, and hashtag of social media content will drive engagement. E-commerces and businesses can benefit from using our predictive models by gaining the most branded-effect possible with a worldwide audience.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published