# Running [FANG](https://github.com/nguyenvanhoang7398/FANG.git) code in Colab

This notebook describes how we tried to get the code for FANG to run on Google's Colab instances with GPU. Copy and execute this notebook on Google Colab to reproduce.

## 1. Make sure GPU runtime is active (Runtime -> Change runtime type -> Select "GPU")
***Note:*** *unfortunately, GPUs are [sometimes not available](https://research.google.com/colaboratory/faq.html#resource-limits) with the free version of Google Colab. In such cases, there's nothing one can do about it except waiting, or paying :(*

In [1]:
# get details on GPU that is made available to us:
!nvidia-smi -L

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.



## 2. Install conda using [condacolab](https://github.com/conda-incubator/condacolab):

***Important:*** *Running the cell below will cause the runtime to crash and restart, don't skip to the next cell right away, wait until Colab has restarted.*

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

‚è¨ Downloading https://github.com/jaimergp/miniforge/releases/latest/download/Mambaforge-colab-Linux-x86_64.sh...
üì¶ Installing...
üìå Adjusting configuration...
ü©π Patching environment...
‚è≤ Done in 0:00:30
üîÅ Restarting kernel...


## 3. Get code + data from FANG [GitHub repo](https://github.com/nguyenvanhoang7398/FANG.git)

The best option is probably cloning the repo and uploading the folder to your Google Drive and then mounting your Google Drive:

In [2]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


Alternatively, you could also upload the whole cloned folder (or any arbitrary version of the code with custom modifications) from your local file system:

In [1]:
#upload ZIP of repo that has been downloaded to local machine:
#from google.colab import files

#uploaded = files.upload()

#for fn in uploaded.keys():
#  print('Successfully uploaded file "{name}" with length {length} bytes'.format(
#      name=fn, length=len(uploaded[fn])))

Saving FANG-master.zip to FANG-master.zip
Successfully uploaded file "FANG-master.zip" with length 51843531 bytes


In [3]:
# unzip uploaded ZIP (assuming it is called FANG-master.zip)
#!unzip FANG-master.zip

Archive:  FANG-master.zip
7addc8e56391922d63986e863925e405c4f1a399
   creating: FANG-master/
 extracting: FANG-master/.gitignore  
  inflating: FANG-master/README.md   
   creating: FANG-master/data/
  inflating: FANG-master/data/fang_fake.csv  
  inflating: FANG-master/data/fang_real.csv  
   creating: FANG-master/data/news_graph/
  inflating: FANG-master/data/news_graph/deny.tsv  
  inflating: FANG-master/data/news_graph/entities.txt  
  inflating: FANG-master/data/news_graph/entity_features.tsv  
  inflating: FANG-master/data/news_graph/media_factuality.txt  
  inflating: FANG-master/data/news_graph/meta_data.tsv  
  inflating: FANG-master/data/news_graph/news_info.tsv  
  inflating: FANG-master/data/news_graph/rep_entities.tsv  
  inflating: FANG-master/data/news_graph/report.tsv  
  inflating: FANG-master/data/news_graph/source_citation.tsv  
  inflating: FANG-master/data/news_graph/source_publication.tsv  
  inflating: FANG-master/data/news_graph/support_negative.tsv  
  inflatin

In [4]:
!ls

FANG-master  FANG-master.zip  sample_data


Or, clone the repo directly on the Colab instance:

In [None]:
#!git clone https://github.com/nguyenvanhoang7398/FANG.git

Disadvantage of both approaches compared to using Google Drive: data created is lost after runtime disconnects - unless you downloaded it beforehand, of course. If you use Google Drive the data should be synced to the folder on Google Drive, ideally saving us training time the next time around as we can use pretrained models created by the FANG script.

## 4. Navigate to folder containing FANG code + data

Make sure to provide the correct path depending on how you loaded data in previous step.

In [3]:
%cd 'drive/My Drive/shared/fang'
#%cd 'FANG-master'

/content/drive/.shortcut-targets-by-id/1ToHnZjx6MmCRPAtxDUrirVP51cZnbzgo/fang


We should now see the FANG code + data:

In [4]:
!ls

condacolab_install.log	environment.yml  fang	     README.md	   user_embed
data			exp_ckpt	 fang.ipynb  run_graph.py
dataset			exp_log		 graph	     training


## 5. Install the conda environment defined in `environment.yml` (replacing the base environment):

In [None]:
!mamba env update -n base -f environment.yml

[2Kpkgs/r/linux-64          [] (00m:00s) 
[1A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[1A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[2Kpkgs/main/linux-64       [] (00m:00s) 
[2A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[2Kpkgs/main/linux-64       [] (00m:00s) 612 KB / ?? (1.96 MB/s)
[2A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[2Kpkgs/main/linux-64       [] (00m:00s) 612 KB / ?? (1.96 MB/s)
[2Kpkgs/main/noarch         [] (00m:00s) 
[3A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[2Kpkgs/main/linux-64       [] (00m:00s) 612 KB / ?? (1.96 MB/s)
[2Kpkgs/main/noarch         [] (00m:00s) 336 KB / ?? (1.08 MB/s)
[3A[2Kpkgs/r/linux-64          [] (00m:00s) 580 KB / ?? (1.88 MB/s)
[2Kpkgs/main/linux-64       [] (00m:00s) 612 KB / ?? (1.96 MB/s)
[2Kpkgs/main/noarch         [] (00m:00s) 336 KB / ?? (1.08 MB/s)
[2Kpytorch/noarch           [] (--:--)

## 6. Train and evaluate model (or, at least try to üòÄ)

For example, we can try to execute the first command mentioned in FANG's [GitHub repo](https://github.com/nguyenvanhoang7398/FANG.git). Unfortunately, we didn't manage to get this code to finish execution. It seems that the Colab machine runs out of RAM and then the script is silently killed. We cannot verify this properly though, as one cannot inspect the system resources usage at the same level of detail as when running code locally on one's own machine.

In [None]:
!python run_graph.py -t fang -m graph_sage -p data/news_graph --percent 90 --epochs=30 --attention --use-stance --use-proximity --temporal

Using stance
Using proximity
Load FANG dataset from data/news_graph
  sum_inv = np.power(row_sum, -1).flatten()
Creating new far node cache with capacity 20000
Use temporal
Initialize optimizer with weight decay 0.0005
Train 55858 Dev 51 Test 48
Training GraphSage:   0% 0/30 [00:00<?, ?it/s]
Sampling news batches:   0% 0/47 [00:00<?, ?it/s][Atcmalloc: large alloc 1767661568 bytes == 0x5559f9df6000 @  0x7f668cc40b6b 0x7f668cc60379 0x7f663635acc7 0x7f663635d21a 0x7f6662290e4c 0x7f66624fa4cb 0x7f66624ec677 0x7f66624ec339 0x7f66624ec677 0x7f666229860e 0x7f666229cecd 0x7f66625b77e3 0x7f66624eca29 0x7f66624ea012 0x7f66624eca29 0x7f666761a422 0x7f666749d1fd 0x555956777c94 0x555956777db1 0x5559567e35be 0x555956727b00 0x555956777497 0x5559567e3229 0x5559567272b9 0x5559567283e5 0x555956746b93 0x55595673995e 0x5559567e051a 0x5559567272b9 0x5559567283e5 0x555956746b93
tcmalloc: large alloc 1767661568 bytes == 0x555ab0192000 @  0x7f668cc40b6b 0x7f668cc60379 0x7f663635acc7 0x7f663635d21a 0x7f666229

## 7. (Bonus Tip) Keep Colab busy so runtime doesn't disconnect

Schedule execution of this cell after starting fitting a model so that the Colab runtime remains busy and doesn't disconnect due to being idle for too long.

In [None]:
while True:pass