# Analyzing Trends on TikTok with Topic Modeling & Account Scraping

This tutorial does the following:

1. Connects to an existing (open) Chrome instance. **[Part 1](#sec1)**
2. It shows how we can get information from a TikTok account page. **[Part 2](#sec2)**
3. Shows how to use topic modeling analysis to underlying themes. **[Part 3](#sec3)**

<a id="sec1"></a>
## Part 1: Create Chrome Instance

**Important:** For this to work, you should already have the Google instance running on your computer. To do that, open a console and run the command for your browser (see below).

**On Mac:**

In [None]:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome_dev_test"

**On Windows:**

In [None]:
C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\selenium\ChromeTestProfile

**New installation**

If you don't have the following package, install it once.

In [97]:
pip install webdriver_manager

Note: you may need to restart the kernel to use updated packages.


Now you are ready to run the code below:

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time

# Set up Chrome options
options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

# Path to your ChromeDriver
service = Service(ChromeDriverManager().install())

# Connect to the existing Chrome browser session
driver = webdriver.Chrome()

<a id="sec2"></a>
## Part 2: Getting Information from a TikTok Page

The following function scrapes a given account for its username, name, follower count, and total number of likes.

In [5]:
def getAccountInformation(driver):
    """
    Given an open driver instance on a TikTok account page, 
    get the account metrics that are accessible.
    """
    time.sleep(2) # in case the page hasn't loaded yet

    account_info = {}

    # Get the username 
    try:
        account_info['author_username'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h1').text
    except Exception as e:
        print(f"Username: An unexpected error occurred: {e}")

    # Get the name 
    try:
        account_info['author_name'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h2').text
    except Exception as e:
        print(f"Name: An unexpected error occurred: {e}")

    # Get the bio
    try:
        account_info['author_bio'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h2').text
    except Exception as e:
        print(f"Likes: An unexpected error occurred: {e}")

    # Get the number of followers  
    try:
        account_info['author_followers'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[2]/strong').text
    except Exception as e:
        print(f"Followers: An unexpected error occurred: {e}")

    # Get the number of likes 
    try:
        account_info['author_likes'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[3]/strong').text
    except Exception as e:
        print(f"Likes: An unexpected error occurred: {e}")


    return account_info

We can now run PykTok's "author_username" column through this function to get information about each account!

This may take a while, depending on the size of your dataset. For this tutorial we are only taking a subset of the videos!

In [4]:
import pandas as pd
df = pd.read_csv("/Users/fernandagonzalez/Desktop/school/CS_315/CS315_Project-3/pyktok/accounts.csv")

# Get unique values from the "account_username" column and convert it to a list
accounts = df['author_username'].unique().tolist()

# Initialize an empty list to store account information dictionaries
all_account_info = []

for acc in accounts:
    url = f"https://tiktok.com/@{acc}"
    driver.get(url)

    # Get account information
    account_info = getAccountInformation(driver)

    # Append the dictionary to the list
    all_account_info.append(account_info)

# Convert list of dictionaries to DataFrame
all_accounts = pd.DataFrame(all_account_info)

# Drop NaN rows
# data = all_accounts.dropna()
# data

Username: An unexpected error occurred: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h1"}
  (Session info: chrome=124.0.6367.62); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x0000000102516934 chromedriver + 4368692
1   chromedriver                        0x000000010250edc8 chromedriver + 4337096
2   chromedriver                        0x0000000102132c04 chromedriver + 289796
3   chromedriver                        0x0000000102174e00 chromedriver + 560640
4   chromedriver                        0x00000001021ad5ec chromedriver + 792044
5   chromedriver                        0x0000000102169ab4 chromedriver + 514740
6   chromedriver                        0x000000010216a50c chromedriver + 517388
7   chromedriver                        0x

Unnamed: 0,author_username,author_name,author_bio,author_followers,author_likes
1,shopmomoslimes,ShopMomoslimes,We are specialized in high quality slimes✨\nRe...,552.8K,17.5M
2,chumbawambamynumberoneba,lex🪐🌈,i get knocked down but i dont get up again\n🏳️...,1876,308.4K
5,carlyrivlin,Carly,instagram: @carlyrivlin\nnyc\ntalent@pontefirm...,321.9K,37.9M
6,calesucks,Cale,just a salad that be dressing,11.6K,1.6M
7,millie.sno.w,𝐦𝐛𝐛 𝐧𝐨𝐭𝐢𝐜𝐞𝐝 𝟑𝐱,✩ℳ𝒾𝓁𝓁𝓈✩\nℳℯℯ𝓉 ℳℬℬ 1𝒳\n𝒻𝒶𝓃𝓅𝒶𝑔ℯ🫧,1075,6090
9,joshkmaddox,Joshmaddox,Josh Maddox\nFamily | Fun | Marriage\nPurdue |...,285.3K,11.9M
15,nissanaltayma,taylorbrady,No bio yet.,449,227.3K
16,maybemaycee,maycee,currently slaying,1164,387.5K
18,livvymchugh,Liv McHugh,girlboss nation,2233,1.6M
21,_victorianygaard,Vic 📖❤️‍🔥,🇩🇰 Copenhagen,3938,553.6K


We now have to merge this new dataframe onto our original dataset **accounts**!

In [6]:
data.to_csv("accountInformation.csv", index=False)

Unnamed: 0,author_username,author_name,author_bio,author_followers,author_likes,video_id,video_timestamp,video_duration,video_locationcreated,suggested_words,video_diggcount,video_sharecount,video_commentcount,video_playcount,video_description,hashtags
0,shopmomoslimes,ShopMomoslimes,We are specialized in high quality slimes✨\nRe...,552.8K,17.5M,7.330000e+18,2024-02-09T17:44:12,63.0,US,"Slime, cloud slime, slime shops, slime scoopab...",224100.0,3401.0,1647.0,2000000.0,13 ts slimes (iykyk) returning this sunday🤍 ha...,
1,shopmomoslimes,ShopMomoslimes,We are specialized in high quality slimes✨\nRe...,552.8K,17.5M,7.330000e+18,2024-02-09T17:44:12,63.0,US,"slime, cloud slime, slime shops, slime scoopab...",226100.0,3417.0,1653.0,2000000.0,13 ts slimes (iykyk) returning this sunday🤍 ha...,
2,shopmomoslimes,ShopMomoslimes,We are specialized in high quality slimes✨\nRe...,552.8K,17.5M,7.330000e+18,2024-02-09T17:44:12,63.0,US,"Slime, cloud slime, slime shops, slime scoopab...",227100.0,3421.0,1661.0,2000000.0,13 ts slimes (iykyk) returning this sunday🤍 ha...,
3,chumbawambamynumberoneba,lex🪐🌈,i get knocked down but i dont get up again\n🏳️...,1876,308.4K,7.290000e+18,2023-10-18T19:21:51,6.0,US,"lobotomy core, Surgeon, what does lobotomy act...",10200.0,65.0,354.0,1200000.0,#lobotomy #surgerytiktok #euthanasia,"lobotomy, surgerytiktok, euthanasia"
4,chumbawambamynumberoneba,lex🪐🌈,i get knocked down but i dont get up again\n🏳️...,1876,308.4K,7.290000e+18,2023-10-18T19:21:51,6.0,US,"lobotomy core, Surgeon, what does lobotomy act...",10300.0,65.0,354.0,1200000.0,#lobotomy #surgerytiktok #euthanasia,"lobotomy, surgerytiktok, euthanasia"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
121,hulu,hulu,Better Together 💚\n#HuluOnDisneyPlus,5.5M,43.4M,7.320000e+18,2024-01-09T20:14:30,8.0,US,,31100.0,702.0,395.0,22500000.0,"Jake Johnson, Anna Kendrick & Andy Samberg? Sa...",SelfReliance
122,alecksis13,alecksis 🤍,she’s so ny when she’s in la \n23 💍\n💌those89c...,78.2K,18.3M,7.220000e+18,2023-04-06T15:46:31,33.0,US,"Eras Tour, Taylor Swift, invisible string eras...",38400.0,658.0,227.0,530800.0,she has my mental stability in the palm of her...,"taylorswift, swiftie, swifttok, theerastour, t..."
123,_im_viccky_,_V_,Kherson 🇺🇦 Winnipeg 🇨🇦\nIG: mariukhno.viktoriia,16.6K,1.5M,7.310000e+18,2023-12-11T20:38:39,13.0,CA,"CapCut, who did i marry, willito, Capcut Edit,...",1500000.0,3371.0,3847.0,45900000.0,#CapCut #viral#fyp #пропуститеврекомендации #w...,"CapCut, viral, fyp, пропуститеврекомендации, w..."
124,cripandip,Mads,i luv carbs & pickles\ndaily stories on my ins...,474.3K,22.3M,7.320000e+18,2024-01-14T16:17:34,152.0,GB,"Ramen, Ramen Noodle, Ramen Bowl, Homemade Rame...",174300.0,166.0,219.0,1500000.0,a *not* hungover takeaway time hehe 🍜🍜🍜 i have...,
