# Analyzing Trends on TikTok with Topic Modeling & Account Scraping

This tutorial does the following:

1. Connects to an existing (open) Chrome instance. **[Part 1](#sec1)**
2. It shows how we can get information from a TikTok account page. **[Part 2](#sec2)**
3. Shows how to use topic modeling analysis to underlying themes. **[Part 3](#sec3)**

<a id="sec1"></a>
## Part 1: Create Chrome Instance

**Important:** For this to work, you should already have the Google instance running on your computer. To do that, open a console and run the command for your browser (see below).

**On Mac:**

In [1]:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome_dev_test"

SyntaxError: unexpected character after line continuation character (445046561.py, line 1)

**On Windows:**

In [None]:
C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\selenium\ChromeTestProfile

**New installation**

If you don't have the following package, install it once.

In [1]:
pip install webdriver_manager

Note: you may need to restart the kernel to use updated packages.


Now you are ready to run the code below:

In [10]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time

# Set up Chrome options
options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")

# Path to your ChromeDriver
service = Service(ChromeDriverManager().install())

# Connect to the existing Chrome browser session
driver = webdriver.Chrome()

<a id="sec2"></a>
## Part 2: Getting Information from a TikTok Page

The following function scrapes a given account for its username, name, follower count, and total number of likes.

In [11]:
def getAccountInformation(driver):
    """
    Given an open driver instance on a TikTok account page, 
    get the account metrics that are accessible.
    """
    time.sleep(2) # in case the page hasn't loaded yet

    account_info = {}

    # Get the username 
    try:
        account_info['author_username'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h1').text
    except Exception as e:
        print(f"Username: An unexpected error occurred: {e}")

    # Get the name 
    try:
        account_info['author_name'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h2').text
    except Exception as e:
        print(f"Name: An unexpected error occurred: {e}")

    # Get the bio
    try:
        account_info['author_bio'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h2').text
    except Exception as e:
        print(f"Likes: An unexpected error occurred: {e}")

    # Get the number of followers  
    try:
        account_info['author_followers'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[2]/strong').text
    except Exception as e:
        print(f"Followers: An unexpected error occurred: {e}")

    # Get the number of likes 
    try:
        account_info['author_likes'] = driver.find_element(By.XPATH, '//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[3]/strong').text
    except Exception as e:
        print(f"Likes: An unexpected error occurred: {e}")


    return account_info

We can now run PykTok's "author_username" column through this function to get information about each account!

This may take a while, depending on the size of your dataset. For this tutorial we are only taking a subset of the videos!

In [19]:
import pandas as pd
df = pd.read_csv("accounts - accounts.csv")

# df = df[['video_id', 'video_timestamp', 'video_duration',
#        'video_locationcreated', 'suggested_words', 'video_diggcount',
#        'video_sharecount', 'video_commentcount', 'video_playcount',
#        'video_description', 'hashtags', 'author_username']]

# Get unique values from the "account_username" column and convert it to a list
accounts = df['author_username'].tolist()
#subsetaccount = accounts[6000:9000]

subsetaccount2 = accounts[3000:6000]
# Initialize an empty list to store account information dictionaries
all_account_info = []

for acc in subsetaccount2:
    url = f"https://tiktok.com/@{acc}"
    driver.get(url)

    # Get account information
    account_info = getAccountInformation(driver)

    # Append the dictionary to the list
    all_account_info.append(account_info)

# Convert list of dictionaries to DataFrame
all_accounts = pd.DataFrame(all_account_info)

# Drop NaN rows
data = all_accounts.dropna()
data

Username: An unexpected error occurred: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h1"}
  (Session info: chrome=124.0.6367.92); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x00000001027e28cc chromedriver + 4368588
1   chromedriver                        0x00000001027dad60 chromedriver + 4336992
2   chromedriver                        0x00000001023fec04 chromedriver + 289796
3   chromedriver                        0x0000000102440e00 chromedriver + 560640
4   chromedriver                        0x00000001024795ec chromedriver + 792044
5   chromedriver                        0x0000000102435ab4 chromedriver + 514740
6   chromedriver                        0x000000010243650c chromedriver + 517388
7   chromedriver                        0x

Unnamed: 0,author_username,author_name,author_bio,author_followers,author_likes
0,shespamzy,Pami 🎀,📍 Mnl | Ceb\nLiving life 🎀🧿\n💌 workwith.pamela...,114.4K,13.8M
1,eburneae0,Eburneae,Empresa de Marfil orgánico 🌱\n¡ Proyecto liceal !,2729,133.6K
2,toj1phobic,valen 🐌,18 | occasional editor \nmarried to yuki & sha...,5509,991.9K
3,szatanasluga,gugs,im just bored sometimes,1824,158K
4,dyllllly,dylan,i’m not yelling this is my normal volume\nmtl\...,210.1K,26.9M
...,...,...,...,...,...
2995,nezeisa,Idź się zesraj,Y stan siedemnaście ‼️\nOrto specjalnie pa 🫰,3075,1.1M
2996,ammarise,Jena Ammarise,22yrs old Mama🫶\n@FashionNova Influencer❣️\nIn...,841.5K,19.6M
2997,_bluuw,Blu,👻Snap: bluuw18\nInstagram: _bluuw \nSecond acc...,155.8K,7.5M
2998,tpstofficial_,템페스트(TEMPEST),#TEMPEST #템페스트 Official TIKTOK💙,1.4M,42.2M


We now have to merge this new dataframe onto our original dataset **accounts**!

In [20]:
accountInformation = data.merge(df, on="author_username", how="inner").drop_duplicates()
accountInformation

Unnamed: 0,author_username,author_name,author_bio,author_followers,author_likes
0,shespamzy,Pami 🎀,📍 Mnl | Ceb\nLiving life 🎀🧿\n💌 workwith.pamela...,114.4K,13.8M
1,eburneae0,Eburneae,Empresa de Marfil orgánico 🌱\n¡ Proyecto liceal !,2729,133.6K
2,toj1phobic,valen 🐌,18 | occasional editor \nmarried to yuki & sha...,5509,991.9K
3,szatanasluga,gugs,im just bored sometimes,1824,158K
4,dyllllly,dylan,i’m not yelling this is my normal volume\nmtl\...,210.1K,26.9M
...,...,...,...,...,...
2982,nezeisa,Idź się zesraj,Y stan siedemnaście ‼️\nOrto specjalnie pa 🫰,3075,1.1M
2983,ammarise,Jena Ammarise,22yrs old Mama🫶\n@FashionNova Influencer❣️\nIn...,841.5K,19.6M
2984,_bluuw,Blu,👻Snap: bluuw18\nInstagram: _bluuw \nSecond acc...,155.8K,7.5M
2985,tpstofficial_,템페스트(TEMPEST),#TEMPEST #템페스트 Official TIKTOK💙,1.4M,42.2M


In [21]:
type(accountInformation)
accountInformation.to_csv("3000-6000.csv", index=False)