<a href="https://colab.research.google.com/github/Frabat/CS6460-EdTech/blob/main/EdTech_CS6460_Francesco_Battista.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CS6460 - EdTech**

##### **Francesco Battista**
##### **fbattista6@gatech.edu**
Full research available [here](https://docs.google.com/document/d/1AnAGXw6-aS6OFwv-9a5if5MsAWjOTlvhms9z6vpFEqQ/edit?usp=sharing)

## **Abstract**

This notebook is meant to be used as a companion for the overall research paper and project presentation.
The goal of the project is to determine weather or not short-form video content, in particular TikTok videos, can be used as effective tools when it comes to delivering educational content.
In particular, the hypothesis that my research focuses are:

### **Null Hypothesis (H<sub>0</sub>)**:
*There is no significant relationship between the characteristics of STEM-related TikToks (e.g., creator demographics, content features) and audience engagement or perceived educational value.*

### **Alternative Hypothesis (H<sub>1</sub>)**:
*There is a significant relationship between the characteristics of STEM-related TikToks and audience engagement or percieved educational value*.<br>
The ***Alternative Hypothesis** can be furtherly divided into three smaller sections:
- ***H<sub>1</sub>a*** : content created by educators or STEM professionals receive higher engagement than those created by general influencers
- ***H<sub>1</sub>b*** : Videos with multiple modalities (e.g., voiceover, subtitles, graphics) correlate with more positive sentiment in comments
- ***H<sub>1</sub>c*** : Students report higher educational value in videos with high production quality and clear structure


## **Project Architecture**

### **Introduction**

This Jupyter Notebook is delivered through Google Colab. To the evaluators, a copy of the code will be shared both as a Colab Notebook and shared with the teachers in the required .zip file at the end of the course.
For obvious reasons, in the sharable copy of this notebook, secrets and direct access to raw data will be omitted. However, a copy of the data will be available for the instructors to review in the .zip file containing the material.

### **Architecture**
This notebook uses the [TikTokApi](https://github.com/davidteather/TikTok-Api) library. The library runs an headless instance of Google Chrome, controlled via Playwright, that can retrieve information directly from the social network.
The library cannot access to any kind of user-protected routes, therefore can only retrieve data that is pubblicly accessible.

In [25]:
# @title **Setup**
### **Install required libraries**
!pip install TikTokApi
!python -m playwright install

### **Install Google Chrome**
!wget https://mirror.cs.uchicago.edu/google-chrome/pool/main/g/google-chrome-stable/google-chrome-stable_115.0.5790.170-1_amd64.deb -O google-chome.deb
!apt-get -y install `pwd`/google-chome.deb

╔══════════════════════════════════════════════════════╗
║ Host system is missing dependencies to run browsers. ║
║ Missing libraries:                                   ║
║     libwoff2dec.so.1.0.2                             ║
║     libgstgl-1.0.so.0                                ║
║     libgstcodecparsers-1.0.so.0                      ║
║     libavif.so.13                                    ║
║     libharfbuzz-icu.so.0                             ║
║     libenchant-2.so.2                                ║
║     libsecret-1.so.0                                 ║
║     libhyphen.so.0                                   ║
║     libmanette-0.2.so.0                              ║
╚══════════════════════════════════════════════════════╝
    at validateDependenciesLinux (/usr/local/lib/python3.11/dist-packages/playwright/driver/package/lib/server/registry/dependencies.js:269:9)
[90m    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)[39m
    at async Registry._

In [26]:
# @title **Imports**
from TikTokApi import TikTokApi
import asyncio
import os
import pandas as pd
from google.colab import userdata, drive

In [27]:
# @title **Define variables**

ms_token = userdata.get('ms_token')
executable_path="/opt/google/chrome/google-chrome"
headless=True
sleep_after=3
timeout=0
user_handles=['geodesaurus', 'astrosamantha','annelisethearchaeologist','lab_shenanigans','drdre4000', 'chemteacherphil', 'instituteofhumananatomy','muhtanya', 'markrober','astroathens','coolchemistryguy','techience','neildegrassetyson','thephysicshouse','billnye']


In [28]:
# @title **Define Metadata Scraping Method**

async def scrape_metadata(user_handle, count):
  async with TikTokApi() as api:
    videos = list()
    await api.create_sessions(ms_tokens=[ms_token], headless=True, sleep_after=3, executable_path="/opt/google/chrome/google-chrome", timeout=0)
    user = api.user(user_handle)
    for _ in range(int(count/30)):
      async for video in user.videos(count=30):
        videos.append(video.as_dict)
    return videos


In [None]:
# @title **Define Comment Scraping Method**



In [31]:
# @title **Run the loop for all the user handles.**
frames = []
for handle in user_handles:
  try:
    metadata = await scrape_metadata(handle, 90)
    df = pd.DataFrame(metadata)
    frames.append(df)
  except EmptyResponseException as e:
    print(e)
  finally:
    print(f"Finished scraping {handle}")
# Now concat all the dataframes in the list
result_df = pd.concat(frames)

Finished scraping geodesaurus
Finished scraping astrosamantha
Finished scraping annelisethearchaeologist
Finished scraping lab_shenanigans
Finished scraping drdre4000
Finished scraping chemteacherphil
Finished scraping instituteofhumananatomy
Finished scraping muhtanya
Finished scraping markrober
Finished scraping astroathens
Finished scraping coolchemistryguy
Finished scraping techience
Finished scraping neildegrassetyson
Finished scraping thephysicshouse
Finished scraping billnye


In [32]:
# @title **Export dataframe to Google Drive as a .csv file to avoid refetching all the data**
drive.mount('/drive')
result_df.to_csv('/drive/My Drive/EdTech_Final_Project_2025/video_metadata.csv')

Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).


In [15]:
# @title test block
user_data = await scrape_metadata('fioremanni', 60)
print(user_data)

EmptyResponseException: None -> TikTok returned an empty response. They are detecting you're a bot, try some of these: headless=False, browser='webkit', consider using a proxy

In [None]:
import json
df = pd.DataFrame(user_data)
print(user_data)
df.head()
videos = df[["video"]]
formatted = df.to_json()
print(formatted)

[{'AIGCDescription': '', 'CategoryType': 0, 'author': {'UserStoryStatus': 0, 'avatarLarger': 'https://p16-sign-useast2a.tiktokcdn.com/tos-useast2a-avt-0068-giso/46479a0aae7b149b1118de08fdb8f6c7~tplv-tiktokx-cropcenter:1080:1080.jpeg?dr=14579&refresh_token=26bdf6f7&x-expires=1753326000&x-signature=1Jqcuzb9T5bMxx6YcbTDbAXt%2FK0%3D&t=4d5b0474&ps=13740610&shp=a5d48078&shcp=81f88b70&idc=my', 'avatarMedium': 'https://p16-sign-useast2a.tiktokcdn.com/tos-useast2a-avt-0068-giso/46479a0aae7b149b1118de08fdb8f6c7~tplv-tiktokx-cropcenter:720:720.jpeg?dr=14579&refresh_token=6dea8c54&x-expires=1753326000&x-signature=dICWVcLxMgV%2Fklvy2qDGg4C8ads%3D&t=4d5b0474&ps=13740610&shp=a5d48078&shcp=81f88b70&idc=my', 'avatarThumb': 'https://p16-sign-useast2a.tiktokcdn.com/tos-useast2a-avt-0068-giso/46479a0aae7b149b1118de08fdb8f6c7~tplv-tiktokx-cropcenter:100:100.jpeg?dr=14579&refresh_token=331f7145&x-expires=1753326000&x-signature=jcafJMAXla9bOfoQck99acJ0LPU%3D&t=4d5b0474&ps=13740610&shp=a5d48078&shcp=81f88b70&