{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":669918949,"defaultBranch":"main","name":"TikTok-ML-Analysis-on-Turkish-Political-Parties-WEBSCRAPPING","ownerLogin":"UygarTalu","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-07-23T21:16:49.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/113293853?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1690148116.0","currentOid":""},"activityList":{"items":[{"before":"6aaaaf5d0d64b238489b356abf6b52728e6ce041","after":"c716cd504162df7932a35b7b879871dc845923aa","ref":"refs/heads/main","pushedAt":"2023-07-23T21:35:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script is the complete version whole webscraping part of the thesis project. You can find all the layers of webscraping codes inside this complete script, along with their explanations and descriptions.","shortMessageHtmlLink":"Add files via upload"}},{"before":"aca91589feddb8cb3b7d712db2c9fe7d04ac09a6","after":"6aaaaf5d0d64b238489b356abf6b52728e6ce041","ref":"refs/heads/main","pushedAt":"2023-07-23T21:33:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script is a Python program designed to scrape comments from a list of specified TikTok videos and store them in CSV files. The script follows a 12-step process, and uses the Selenium WebDriver for browser automation:\r\n\r\nThe script starts by reading a text file named \"video_list.txt\". This file contains a list of URLs for the TikTok videos from which comments will be scraped.\r\n\r\nA new Firefox browser instance is opened using Selenium WebDriver. On the first run, the script will pause to allow the user to manually log into their TikTok account if required.\r\n\r\nThe script checks if a directory named \"comments\" exists in the current directory. If not, it creates one. This directory is used to store all scraped comments.\r\n\r\nFor each video URL from the list, the script generates a filename and checks if a CSV file with this name already exists in the \"comments\" directory. If it does, it means the comments for this video have already been scraped, and so the script moves on to the next URL.\r\n\r\nIf the CSV file doesn't exist, the browser navigates to the video's URL.\r\n\r\nIn a loop, the script identifies all the comment elements on the page, each contained within a <p> tag with a 'data-e2e' attribute value 'comment-level-1'.\r\n\r\nThe text for each comment is extracted and stored in a list.\r\n\r\nThe script then computes a SHA256 hash of the concatenated comments. If this hash matches the hash computed in the previous loop iteration, it signifies that no new comments were loaded, so the script breaks out of the loop.\r\n\r\nOtherwise, the script scrolls to the bottom of the page to load more comments, waits for a few seconds to ensure comments are loaded, and then repeats the process.\r\n\r\nAfter breaking out of the loop, the script writes the comments to a CSV file named after the video URL in the \"comments\" directory.\r\n\r\nOnce all videos have been processed, the browser is closed.\r\n\r\nNote: This script is dependent on the page structure of TikTok's website at the time of writing, so changes to TikTok's website may cause the script to stop working. Also, Selenium WebDriver must be properly set up for this script to function.\r\n\r\nThe script imports required libraries, like time, hashlib, csv, os, and selenium, and follows the steps as described above to scrape and store the comments from the TikTok videos.","shortMessageHtmlLink":"Add files via upload"}},{"before":"6ae799a013b80d158a1c2adc89f4d5598af32453","after":"aca91589feddb8cb3b7d712db2c9fe7d04ac09a6","ref":"refs/heads/main","pushedAt":"2023-07-23T21:31:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis Python script consists of 7 core functions that allow the extraction and manipulation of TikTok videos and their corresponding metadata using the PyTok library. The script performs several activities, including link formatting, data deduplication, directory path generation, web scraping, data storage, and metadata merging. Below is an explanation of each function:\r\n\r\ncreate_tiktok_links_for_pytok(list_of_links) - This function formats a given list of TikTok video URLs to be compatible with the PyTok library by appending necessary query parameters. It returns a list of suitably formatted TikTok URLs.\r\n\r\ncapture_links_from_detailed_df_summary(csv_file_path) - This function reads a specified CSV file containing TikTok URLs and generates a dictionary where keys are column names and values are lists of URLs from the corresponding column, formatted using the first function.\r\n\r\nremove_duplicates_from_csv(csv_file_path) - This function removes duplicate entries from a specified CSV file to ensure data integrity and uniqueness.\r\n\r\nextract_video_paths(video_url) - This function generates a path for each TikTok video using a specified video URL. This is useful for storing and accessing the downloaded videos.\r\n\r\ntiktok_webscrapper(video_urls, column_name) - This function performs the actual scraping of TikTok videos and metadata using the PyTok library. It iteratively processes each URL, downloads the video and metadata, and stores them in specified directories.\r\n\r\nexecution_tiktok_scrapping(csv_file_path, start_column=None) - This function orchestrates the entire scraping process. It reads URLs from the CSV file, scrapes data for each URL, stores the data, and handles errors. After each column, it asks the user whether to proceed to the next column, allowing for user supervision.\r\n\r\nmerge_metadata_vidoes(path, output_file_name) - This function combines all the separate metadata CSV files from each hashtag and user profile into a single, comprehensive CSV file. It reads all the CSV files in the specified directory, concatenates them into a single pandas DataFrame, and saves the combined data to a new CSV file.\r\n\r\nThe script ends with some code snippets that demonstrate how to use these functions. The execution function is executed for a CSV file, storing the paths of the scraped videos. Afterward, the merge_metadata_vidoes() function is called to combine all the separate metadata CSV files into one.\r\n\r\nNote: This script assumes the user has adequate disk space and network bandwidth for the data to be downloaded. Furthermore, scraping data from TikTok, or any website, should always comply with the site's Terms of Service. Also, as scraping can be resource-intensive, it might impose a significant load on both the user's machine and the server hosting the data. Users should carefully manage the frequency and volume of their scraping activities.","shortMessageHtmlLink":"Add files via upload"}},{"before":"2fd00163b36615d975e1ed8bff013cf24ce4cfc2","after":"6ae799a013b80d158a1c2adc89f4d5598af32453","ref":"refs/heads/main","pushedAt":"2023-07-23T21:29:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script contains functions designed for web scraping of TikTok data. The script is composed of four functions aimed at processing and consolidating TikTok video links gathered from previous analysis stages. These functions produce several CSV files, each containing video links that pertain to different divisions - namely, user profiles, hashtags, and an overall collection.\r\n\r\nHere are the main functions used in this script:\r\n\r\nfinal_tiktoklinks_file_creator_filetype_1: This function compiles all the TikTok video links from multiple CSV files into one final CSV file. It adjusts file paths, extracts video links from each file, consolidates these links, and verifies the final data count. The resulting consolidated CSV file is saved to the desktop.\r\n\r\nfinal_tiktoklinks_file_creator_filetype_2_PROFILE: Much like the first function, this one also consolidates video links from multiple CSV files. However, the focus here is specifically on video links related to user profiles. It processes the files, compiles the profile links, verifies the data count, and finally, saves the results in a CSV file on the desktop.\r\n\r\nfinal_tiktoklinks_file_creator_filetype_3_HASHTAG: This function parallels the previous two, but its focal point is TikTok video links that are associated with specific hashtags. It amalgamates all hashtag-related links from multiple CSV files, confirms the count, and saves the results in a CSV file on the desktop.\r\n\r\ndetailed_tiktok_links_data_file_creator: This function provides a more granular perspective by dividing video links according to their respective CSV files. It extracts the column names (hashtags and profiles) from each file, combines these into a single dataframe, and provides counts and percentages of the links for each category. This dataframe, indicating the link distribution, is saved as a CSV file on the desktop.\r\n\r\nIn summary, this script is integral in merging previously scraped TikTok data. It prepares several organized CSV files that can be used for further analysis, such as studying the popularity of certain user profiles or hashtags based on the video link counts.","shortMessageHtmlLink":"Add files via upload"}},{"before":"de3313494bd308f329d11bfd33d2154d88c0cce1","after":"2fd00163b36615d975e1ed8bff013cf24ce4cfc2","ref":"refs/heads/main","pushedAt":"2023-07-23T21:25:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script contains four main functions that serve as the analysis layer for web scraping TikTok data associated with certain hashtags. The script uses the Zeeschuimer module to scrape data from user profiles and hashtags. It then extracts unique video IDs from the downloaded ndjson files and generates a TikTok link for each video. The resultant dataframe has labels for hashtag videos and user profiles and includes their associated video links.\r\n\r\nFunction 1: auto_data_file_importer_hashtag\r\nThis function imports a data file in ndjson format associated with a given hashtag. It then converts this data into a pandas DataFrame, and saves the DataFrame as a CSV file. This function is useful for importing and preprocessing data files related to specific hashtags for subsequent analysis.\r\n\r\nFunction 2: extract_usernames_and_ids_hashtag\r\nThis function takes a DataFrame and a hashtag name as inputs and extracts the usernames and video IDs associated with the given hashtag. It also counts the occurrences of each username to determine if the same users are posting multiple videos under the same hashtag. The function returns a tuple consisting of a list of username and video ID pairs, and a dictionary with username occurrences. This function is helpful for extracting relevant information from a DataFrame related to a specific hashtag and analyzing username patterns.\r\n\r\nFunction 3: tiktok_link_generator_hashtag_generalized\r\nThis function generates TikTok video links given a list of user ID and video ID pairs. It constructs the TikTok video link using the base URL, username with \"@\" sign, video ID, and additional parameters for each pair, and returns a list of these generated TikTok video links.\r\n\r\nFunction 4: save_videolinks_to_csv_hashtag\r\nThis function saves a list of video links to a CSV file. It first converts the list of video links into a pandas DataFrame, then constructs the filename for the CSV file using the hashtag name and additional information, and finally saves the DataFrame as a CSV file in a specified location.\r\n\r\nIn summary, the script makes it convenient to scrape data associated with specific hashtags from TikTok, extract relevant information such as usernames and video IDs, generate links to the corresponding TikTok videos, and save these links in a CSV file for further analysis.","shortMessageHtmlLink":"Add files via upload"}},{"before":"3f4e8e9bea89f5651831c25b10da99beb3ef3048","after":"de3313494bd308f329d11bfd33d2154d88c0cce1","ref":"refs/heads/main","pushedAt":"2023-07-23T21:24:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script, comprises four key functions designed for data acquisition and processing from TikTok user profiles. Primarily, it uses Zeeschuimer for scrapping data from user profiles and hashtags, and subsequently extracts unique video IDs from the downloaded ndjson files. The function auto_data_file_importer imports ndjson data files and converts them into CSV format for easier analysis. extract_video_ids then extracts unique video IDs from these data files. With these IDs, tiktok_link_generator_profile generates TikTok video links, which are saved into a CSV file by the save_videolinks_to_csv function. The final result is a structured dataset containing labeled video links, ready for further analysis or data mining. This script is invaluable for those interested in analyzing user behavior or content trends on TikTok.","shortMessageHtmlLink":"Add files via upload"}},{"before":"238499e8da4dbe1e8706ad9504a79b1ef9636326","after":"3f4e8e9bea89f5651831c25b10da99beb3ef3048","ref":"refs/heads/main","pushedAt":"2023-07-23T21:23:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Delete WEBSCRAPING_LAYER_1 directory","shortMessageHtmlLink":"Delete WEBSCRAPING_LAYER_1 directory"}},{"before":null,"after":"238499e8da4dbe1e8706ad9504a79b1ef9636326","ref":"refs/heads/main","pushedAt":"2023-07-23T21:23:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"UygarTalu","name":"Uygar","path":"/UygarTalu","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113293853?s=80&v=4"},"commit":{"message":"Add files via upload\n\nThis script, comprises four key functions designed for data acquisition and processing from TikTok user profiles. Primarily, it uses Zeeschuimer for scrapping data from user profiles and hashtags, and subsequently extracts unique video IDs from the downloaded ndjson files. The function auto_data_file_importer imports ndjson data files and converts them into CSV format for easier analysis. extract_video_ids then extracts unique video IDs from these data files. With these IDs, tiktok_link_generator_profile generates TikTok video links, which are saved into a CSV file by the save_videolinks_to_csv function. The final result is a structured dataset containing labeled video links, ready for further analysis or data mining. This script is invaluable for those interested in analyzing user behavior or content trends on TikTok.","shortMessageHtmlLink":"Add files via upload"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADW2YajgA","startCursor":null,"endCursor":null}},"title":"Activity · UygarTalu/TikTok-ML-Analysis-on-Turkish-Political-Parties-WEBSCRAPPING"}