# Scrape Bitchute Platform Recommendations

This **Notebook-as-Tool** allows you to:

1.   scrape platform recommendations from Bitchute: popular videos, trending videos, recommended channels, all videos. See the [Bitchute homepage](https://www.bitchute.com/).

For running or adapting this Colab Notebook you need to create a copy in you Google drive: **File → Save a copy in Drive**. I will be stored in a folder ```Colab Notebooks```. Open this file with Google Colab and run the cells consecutively by pressing the **Play** button or pushing **shift+enter**.

**Important notes:**
- Code is hidden in the background of Colab forms. For viewing and editing the code **double click** cell or select  **View → Show/hide code**
- Data will be stored in Google Drive in the folder ```Colab Data```. A connection to your drive will be authenticated when running setup code cells. This is temporary and only your current notebook will be conncted to your drive. The connection will be revoked when the notebook is terminated or by selecting **Runtime → Factory reset runtimme**.


**Credits:** This notebook was written by Marcus Burkhardt. It uses the bitchute-scraper package which is currently in alpha development: https://github.com/bumatic/bitchute-scraper.

In [None]:
#@title Setup 1: Mount Google Drive for Loading and Storing Data
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

In [None]:
#@title Setup 2: Install and Load Required Libraries and Run Setup Procedures (At first run many packages are installed and lot of irrelevant output is generated. Just run the cell a second time to get rid of it.)

# Install Libraries
try: 
  import bitchute as bc
except: 
  !pip install git+https://github.com/bumatic/bitchute-scraper.git
  !apt-get update
  !apt install chromium-chromedriver
  import bitchute as bc

# Import Libaries
import os
import time
import pandas as pd
from datetime import datetime
from tqdm.notebook import tqdm

# Defining path variable for data path
data_path = os.path.join("gdrive", "MyDrive", "Colab_Data", "Data", "Bitchute-Platform-Recommendations")
if not os.path.isdir(data_path):
  os.makedirs(data_path)

# Initialize scraper object.
b = bc.Crawler(chrome_driver='chromedriver')

In [None]:
#@title Setup 3: Definition of Core and Support Functions Used by the Tool(s)

def get_time():
    t = str(int(datetime.utcnow().timestamp()))
    return t

def check_dir(path):
    if not os.path.isdir(path):
        os.mkdir(path)

In [None]:
#@title Tool: Retrieve Bitchute Platform Recommendations (Popular Videos, Trending Videos, Trending Tags, All Videos, Recommended Channels)

print('GET POPULAR VIDEOS')
popular_path = os.path.join(data_path, 'popular-videos')
check_dir(popular_path)
rv, tags = b.get_recommended_videos(type='popular')
t = get_time()
outfile_name = os.path.join(popular_path, t+'.csv')
rv.to_csv(outfile_name, sep='\t', index=None)
print('Retrieved {} video items and saved results to: {}'.format(len(rv), outfile_name))

print()
print('GET TRENDING VIDEOS AND TAGS')
trending_videos_path = os.path.join(data_path, 'trending-videos')
trending_tags_path = os.path.join(data_path, 'trending-tags')
check_dir(trending_videos_path)
check_dir(trending_tags_path)
rv, tags = b.get_recommended_videos(type='trending')
t = get_time()
outfile_name = os.path.join(trending_videos_path, t+'.csv')
rv.to_csv(outfile_name, sep='\t', index=None)
print('Retrieved {} video items and saved results to: {}'.format(len(rv), outfile_name))
outfile_name = os.path.join(trending_tags_path, t+'.csv')
tags.to_csv(outfile_name, sep='\t', index=None)
print('Retrieved {} tag items and saved results to: {}'.format(len(tags), outfile_name))

print()
print('GET ALL VIDEOS')
all_path = os.path.join(data_path, 'all-videos')
check_dir(all_path)
rv, tags = b.get_recommended_videos(type='all')
t = get_time()
outfile_name = os.path.join(all_path, t+'.csv')
rv.to_csv(outfile_name, sep='\t', index=None)
print('Retrieved {} video items and saved results to: {}'.format(len(rv), outfile_name))

print()
print('GET RECOMMENDED CHANNELS')
recommended_channels_path = os.path.join(data_path, 'recommended-channels')
check_dir(recommended_channels_path)
rc = b.get_recommended_channels(extended=False)
t = get_time()
outfile_name = os.path.join(recommended_channels_path, t+'.csv')
rc.to_csv(outfile_name, sep='\t', index=None)
print('Retrieved {} channel items and saved results to: {}'.format(len(rc), outfile_name))

b.reset_webdriver()