# CoLab Download Utilities

Useful scripts for using CoLab as a remote dowwnloader

# Setting up SSH

You will be given almost all permissions in the VM including installing programs, starting services, etc. Some functionalities are forbidden e.g. iptables. However, it is not a practice of honor to take advantage of a free scientific computing platform to download unrelevent things in the first place, so it is not recommanded to actively exploit it in the way of a VPS. After all we don't want to push too far to Google's limit which could be responded with a strict ban.

In [None]:
#@title Using Ngrok

#@markdown Resubmit the code if there is an error. However, if the code runs
#@markdown properly but you still cannot establish an SSH connection (e.g. `Connection closed by remote host`, 
#@markdown see more in 
#@markdown [this issue](https://github.com/WassimBenzarti/colab-ssh/issues/45)), 
#@markdown please go to next section and use Cloudflared instead.

#@markdown Please enter a password below, this will be used to log in.

##############################################
## Code for setting up SSH server on the VM ##
##############################################

import urllib.request, json, getpass

password = "passwd" #@param {type:"string"}

# Download ngrok
! wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip -qq -n ngrok-stable-linux-amd64.zip

# Setup sshd
! apt-get install -qq -o=Dpkg::Use-Pty=0 openssh-server pwgen > /dev/null

# Set root password
! echo root:$password | chpasswd
! mkdir -p /var/run/sshd
! echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
! echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
! echo "LD_LIBRARY_PATH=/usr/lib64-nvidia" >> /root/.bashrc
! echo "export LD_LIBRARY_PATH" >> /root/.bashrc

# Run sshd
get_ipython().system_raw('/usr/sbin/sshd -D &')

# Ask token
print("Copy authtoken from https://dashboard.ngrok.com/auth")
authtoken = getpass.getpass()

# Create tunnel
get_ipython().system_raw('./ngrok authtoken $authtoken && ./ngrok tcp 22 &')

# Get public address and print connect command
with urllib.request.urlopen('http://localhost:4040/api/tunnels') as response:
  data = json.loads(response.read().decode())
  (host, port) = data['tunnels'][0]['public_url'][6:].split(':')
  print(f'SSH command: ssh -p{port} root@{host}')

# Print root password
print(f'Root password: {password}')

In [None]:
#@title Using Cloudflared
#@markdown Follow the instructions shown to install Cloudflared if you haven't.

# Install colab_ssh on google colab
!pip install colab_ssh --upgrade

password = "passwd" #@param {type:"string"}

from colab_ssh import launch_ssh_cloudflared, init_git_cloudflared
launch_ssh_cloudflared(password)

#@markdown The credit goes to 
#@markdown [`colab-ssh`](https://github.com/WassimBenzarti/colab-ssh). They also 
#@markdown provide a method to do port forwarding using Ngrok. If you prefer,
#@markdown you can uncomment the corresponding snipite in the previous section. 

print(f'Root password: {password}')

# Mount Drive

In [None]:
#@title Google Drive

#@markdown Mount Google Drive using the official `colabtool` package.

####################################
## Mount Google Drive to CoLab VM ##
####################################

import os
from google.colab import drive  
      # Load the Drive helper and mount

mountpoint = "/content/drive" #@param {type:"string"}

if not os.path.isdir(os.path.join(mountpoint, 'My Drive')):
  drive.mount(mountpoint, force_remount=True)


In [None]:
#@title Flush and remount Google Drive

#######################################
## Manually sync VM and Google Drive ##
#######################################

#@markdown This is often used to make sure all modification done in CoLab have 
#@markdown been updated to the Google drive. In this case, check the remount
#@markdown below. If you only want to umount and have a clean exit, leave it
#@markdown unchecked.

remount = True #@param{type:"boolean"}

drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')
if remount:
  drive.mount(mountpoint, force_remount=True)

In [None]:
#@title Using rClone

#@markdown It's also possible to mount other cloud storage with 
#@markdown [rClone](https://rclone.org/). It is recommended to configure the
#@markdown the remotes locally and upload the config file. You will be prompt
#@markdown to upload a file if the path cannot be found.

# Installing rClone
#!curl https://rclone.org/install.sh | sudo bash
!command -v rclone >/dev/null 2>&1 || { curl https://rclone.org/install.sh | sudo bash;}

path2config = "/content/rclone.conf" #@param {type:"string"}

if not os.path.exists(path2config):
  from google.colab import files
  uploaded = files.upload()
  path2config = "/content/" + list(uploaded.keys())[0]

#@markdown Enter the name of the remote and the corresponding folder to mount.

remote = "Union" #@param {type:"string"}
remote_path = " " #@param {type:"string"}

remote = remote.strip()
remote_path = remote_path.strip()

#@markdown Enter a mountpoint

import os
mountpoint_rclone = "/content/rclone" #@param (type:"string")

##markdown Enter a path to mount

# rClone mount
!mkdir -p "$mountpoint_rclone"
!rclone about $remote: --config "$path2config"

!nohup rclone mount $remote:"$remote_path" "$mountpoint_rclone" --vfs-cache-mode writes --config "$path2config" --rc &

In [None]:
#@title Exit rClone

#@markdown It is recommanded to manually quite the running rClone. Run this cell
#@markdown to umount. Note that according to the rClone
#@markdown [documentation](https://rclone.org/commands/rclone_mount/)
#@markdown > The umount operation can fail, for example when the mountpoint is
#@markdown busy. When that happens, it is the user's responsibility to stop the 
#@markdown mount manually.
#@markdown >
#@markdown > Stopping the mount manually:
#@markdown >
#@markdown > ```
#@markdown > fusermount -u /path/to/local/mount
#@markdown > ```

!rclone rc core/quit

### Umount manually if needed

In [None]:
!fusermount -u $mountpoint_rclone

# Set up Downloader

In [None]:
#@title Install aria2

#@markdown If the installation is successful, the code here will also downloads 
#@markdown a list of quality BT trackers from 
#@markdown [here](https://github.com/ngosang/trackerslist). This can be helpful
#@markdown when downloading torrents or megnetic links.

#####################################
## Install aria2 as the downloader ##
#####################################

## There are notebooks shared all over the internect that uses libtorrent in
## python, but why bother when you can set up a full function downloader? In
## this way, you can plug in the aria2 commands just like what you did in your
## local machine.

! sudo apt install aria2 > /dev/null
! cd /content && aria2c https://raw.githubusercontent.com/ngosang/trackerslist/master/trackers_best_ip.txt -o tracker --allow-overwrite="true" -q

In [None]:
#@title Download from link

#@markdown You will be prompted to enter links to download. Note that you will
#@markdown not be able to run other codes on this CoLab instance through
#@markdown notebook before the download stops. If this is a concern, please
#@markdown consider set up RPC in the next session.

#####################################
## Enter the link to be downloaded ##
#####################################

with open("/content/download_list", "w+") as f:
  while True:
      magnet_link = input("Enter Link Or Type Exit: ")
      if magnet_link.lower() == "exit":
          break
      print(magnet_link, end="\n\n", file=f)

! cd "/content/drive/MyDrive/CoLab Download/Torrent" && aria2c --bt-tracker=$(sed ':a;N;$!ba;s/\n\n/,/g' /content/tracker) --bt-enable-lpd=true --disable-ipv6 --seed-time=0 --file-allocation=none --console-log-level=warn -i /content/download_list    

In [None]:
#@title Set up aria2 RPC

#@markdown This will set up the portmap daemon and so that you can control the 
#@markdown download through a web UI like 
#@markdown [AriaNg](http://ariang.mayswind.net/latest/) with the address
#@markdown `http://localhost:6800/jsonrpc` and the password you set below.

####################
## Setting up RPC ##
####################

rpc_secret = "passwd" #@param{type:"string"}

# For more complex download tasks like transfering data drom Onedrive to Google
# drive, It is recommended to use the RPC deamon and ariang web UI which make
# configuration a lot easier.

!aria2c --enable-rpc --rpc-listen-all -d /content/drive/MyDrive/Downloads/ --disable-ipv6 --rpc-secret=$rpc_secret --max-concurrent-downloads=10 --max-connection-per-server=10 --min-split-size=10M --split=5  --bt-tracker=$(sed ':a;N;$!ba;s/\n\n/,/g' /content/tracker) --bt-enable-lpd=true --rpc-allow-origin-all --file-allocation=none --seed-time=300 -D
#print("Please connect to the VM through the following command:")
#print(f'ssh "{host}" -p "{port}" -L 8080:localhost:80 -L 6800:localhost:6800 -l "root"')
#print(f'Root password: {password}')

#@markdown To forward the corresponding port in your local machine to the CoLab
#@markdown instance, you need to append the command you used for SSH with
#@markdown `-L 8080:localhost:80 -L 6800:localhost:6800`. Remember not to close
#@markdown the window so that the connection can be kepy alive.

#@markdown Browser extensions like [Aria2 for Chrome](https://chrome.google.com/webstore/detail/aria2-for-chrome/mpkodccbngfoacfalldjimigbofkhgjn)
#@markdown and [Aria2 for Edge](https://microsoftedge.microsoft.com/addons/detail/aria2-for-edge/jjfgljkjddpcpfapejfkelkbjbehagbh) can be helpful
#@markdown when your downloads is initiated from browser. The Edge version seems
#@markdown better at catching request header so that you can download from sites
#@markdown like OneDrive withough manually feed in the cookies.

In [None]:
#@title Peek at downloaded videos

#@markdown If you want, you can view a clip of the downloaded video even if it
#@markdown has't been synced to cloud. Note that this will use `ffmpeg` to 
#@markdown transcode a clip of the file into HTML version and the process can be
#@markdown very time consuming. This piecec of code was for debugging use and a
#@markdown more practical way would be to flush and umount the drive and check
#@markdown the content through rClone, RaiDrive or whatever (the web interface
#@markdown will not be up untill Google "processed" your video).

###########################
## Peek downloaded video ##
###########################

## The syncing process between CoLab and Google Drive can be time-consuming and
## you would want to peek into the download file to see if it contains the right
## content. 

from IPython.display import HTML
from base64 import b64encode
import os

# Input video path
save_path = input("Enter the path to the file to peek (you can find it in the left penel): ")

# Compressed video path
compressed_path = "/content/result_compressed.mp4"

# It is necessary to extract only a short clip and encod it into mp4 before it
# can be 
os.system(f"ffmpeg -i {save_path} -ss 00:00:30 -to 00:01:00 -strict -2 -vcodec libx264 -acodec copy {compressed_path}")
print("Compression finished")

# Show video
print(r"If no video shown. Please download /content/result_compressed.mp4 " + 
      "from the left panel and verify it locally.")
mp4 = open(compressed_path,'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)