# Running Gamba Experiments on Google Colab

You can use this script to run Gamba experiments on Colab GPUs.

It requires cloning the repository https://github.com/BenBullinger/DL_Project to your Google Drive directory '/content/drive/MyDrive/DL_Project/' (i.e. the contents of the repository will be located in '/content/drive/MyDrive/DL_Project/DL_Project/').

As the repository is currently private, you need to upload your GitHub SSH keys to '/content/drive/MyDrive/DL_Project/SSH'.



In [1]:
import os
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
base_path = '/content/drive/MyDrive/DL_Project/DL_Project/'
os.chdir(base_path)
print("Current working directory:", os.getcwd())

Mounted at /content/drive
Current working directory: /content/drive/MyDrive/DL_Project/DL_Project


## SSH Auth & Git Pull from https://github.com/BenBullinger/DL_Project

In [2]:
# Setup SSH directory
!mkdir -p ~/.ssh
!chmod 700 ~/.ssh

# Store SSH keys uploaded to the local SSH directory on Google Drive
!cp "/content/drive/MyDrive/DL_Project/SSH/benbullinger_git" ~/.ssh/id_rsa
!cp "/content/drive/MyDrive/DL_Project/SSH/benbullinger_git.pub" ~/.ssh/id_rsa.pub

# Set proper permissions
!chmod 600 ~/.ssh/id_rsa
!chmod 644 ~/.ssh/id_rsa.pub

# Add GitHub to known hosts
!ssh-keyscan github.com >> ~/.ssh/known_hosts

# Pull from repository (requires having cloned https://github.com/BenBullinger/DL_Project to Google Drive)
!git pull

# github.com:22 SSH-2.0-159e461a3
# github.com:22 SSH-2.0-159e461a3
# github.com:22 SSH-2.0-159e461a3
# github.com:22 SSH-2.0-159e461a3
# github.com:22 SSH-2.0-159e461a3
Already up to date.


## GPU Setup & Installation

In [6]:
# Install Conda for Colab
!pip install -q condacolab
import condacolab
condacolab.install()
!conda --version

# Project Installation
!CONDA_OVERRIDE_CUDA=12.4 conda create --name Gamba python=3.12.7
!conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
!pip install torch_geometric
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu124.html
!pip install wandb
!pip install transformers

✨🍰✨ Everything looks OK!
conda 23.11.0
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / done
Solving environment: \ | done


    current version: 23.11.0
    latest version: 24.11.2

Please update conda by running

    $ conda update -n base -c conda-forge conda



## Package Plan ##

  environment location: /usr/local/envs/Gamba

  added / updated specs:
    - python=3.12.7


The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge 
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu 
  bzip2              conda-forge/linux-64::bzip2-1.0.8-h4bc722e_7 
  ca-certificates    conda-forge/linux-64::ca-certificates-2024.12.14-hbcca054_0 
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.43-h712a8e2_2 
  libexpat           conda-forge/linux-64::libexpat-2.6.4-h5888daf_0 
  libffi             conda-forge/linux-64::libffi-3.4.2-h7f98852_5 
  libgc

In [10]:
# Additional dependencies required on Google Colab
!pip install ray[tune]

Collecting pandas (from ray[tune])
  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorboardX>=1.9 (from ray[tune])
  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting pyarrow>=9.0.0 (from ray[tune])
  Downloading pyarrow-18.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting python-dateutil>=2.8.2 (from pandas->ray[tune])
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting pytz>=2020.1 (from pandas->ray[tune])
  Downloading pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->ray[tune])
  Downloading tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pyarrow-18.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Run

In [None]:
!python {base_path}start.py --config data/configs/sample_config.json

some_key
Preprocessing the dataset: 100% 1113/1113 [00:01<00:00, 603.05it/s]
The fast path is not available because one of `(selective_state_update, selective_scan_fn, causal_conv1d_fn, causal_conv1d_update, mamba_inner_fn)` is None. Falling back to the sequential implementation of Mamba, as use_mambapy is set to False. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d. For the mamba.py backend, follow https://github.com/alxndrTL/mamba.py.
Training:   0% 0/201 [00:00<?, ?epoch/s]The 'batch_size' argument of MambaCache is deprecated and will be removed in v4.49. Use the more precisely named 'max_batch_size' argument instead.
Epoch 0 - Train Loss: 0.6856, Train Acc: 0.5798, Val Loss: 0.6808, Val Acc: 0.5946, 'lr': 0.0004
Epoch 20 - Train Loss: 0.5654, Train Acc: 0.7180, Val Loss: 0.5789, Val Acc: 0.7027, 'lr': 0.0004
Epoch 40 - Train Loss: 0.5123, Train Acc: 0.7551, Val Loss: 0.5621, Val Acc: 0.6937, 'lr': 0.0002560000000