### **Before you begin, make sure to connect to a GPU runtime (go to Runtime -> Change runtime type -> pick GPU as hardware accelerator and save -> Then click "Connect" in the top right corner of main page**

Run to make sure you can use CuPy library (a dependency for this project)

In [None]:
import cupy as cp

In [None]:
z = cp.arange(6).reshape(2, 3).astype('f')
z

array([[0., 1., 2.],
       [3., 4., 5.]], dtype=float32)

## Run code needed to use conda

In [None]:
%%bash
MINICONDA_INSTALLER_SCRIPT=Miniconda3-4.5.4-Linux-x86_64.sh
MINICONDA_PREFIX=/usr/local
wget https://repo.continuum.io/miniconda/$MINICONDA_INSTALLER_SCRIPT
chmod +x $MINICONDA_INSTALLER_SCRIPT
./$MINICONDA_INSTALLER_SCRIPT -b -f -p $MINICONDA_PREFIX

In [None]:
!which conda # should return /usr/local/bin/conda

/usr/local/bin/conda


In [None]:
!conda --version 

conda 4.5.4


In [None]:
!which python # still returns /usr/local/bin/python

/usr/local/bin/python


In [None]:
!python --version 

Python 3.6.5 :: Anaconda, Inc.


In [None]:
%%bash
conda install --channel defaults conda python=3.6 --yes
conda update --channel defaults --all --yes

In [None]:
!conda --version 

conda 4.9.2


In [None]:
!python --version

Python 3.6.12 :: Anaconda, Inc.


In [None]:
import sys
sys.path

['',
 '/env/python',
 '/usr/lib/python36.zip',
 '/usr/lib/python3.6',
 '/usr/lib/python3.6/lib-dynload',
 '/usr/local/lib/python3.6/dist-packages',
 '/usr/lib/python3/dist-packages',
 '/usr/local/lib/python3.6/dist-packages/IPython/extensions',
 '/root/.ipython']

In [None]:
!ls /usr/local/lib/python3.6/dist-packages

In [None]:
import sys
_ = (sys.path
        .append("/usr/local/lib/python3.6/site-packages"))

## BEGIN BMT CODE

In [None]:
!git clone --recursive https://github.com/v-iashin/BMT.git

Cloning into 'BMT'...
remote: Enumerating objects: 123, done.[K
remote: Total 123 (delta 0), reused 0 (delta 0), pack-reused 123[K
Receiving objects: 100% (123/123), 13.26 MiB | 23.04 MiB/s, done.
Resolving deltas: 100% (47/47), done.
Submodule 'submodules/pycocoevalcap' (https://github.com/salaniz/pycocoevalcap.git) registered for path 'submodules/pycocoevalcap'
Submodule 'submodules/video_features' (https://github.com/v-iashin/video_features.git) registered for path 'submodules/video_features'
Cloning into '/content/BMT/submodules/pycocoevalcap'...
remote: Enumerating objects: 11, done.        
remote: Counting objects: 100% (11/11), done.        
remote: Compressing objects: 100% (10/10), done.        
remote: Total 808 (delta 1), reused 6 (delta 1), pack-reused 797        
Receiving objects: 100% (808/808), 130.05 MiB | 40.22 MiB/s, done.
Resolving deltas: 100% (420/420), done.
Cloning into '/content/BMT/submodules/video_features'...
remote: Enumerating objects: 365, done.       

Navigate to BMT folder

In [None]:
%cd BMT

/content/BMT


In [None]:
!ls

conda_env.yml  download_data.sh  loss	  README.md  scripts
data	       epoch_loops	 main.py  results    submodules
datasets       evaluation	 model	  sample     utilities


Bash command below downloads features (I3D and VGGish) and word embeddings (GloVe). The script will download them (~10 GB) and unpack into ./data and ./.vector_cache folders. Make sure to run it while being in BMT folder.

**TAKES ~15-20 MINUTES**

**DISCLAIMER: Can skip this command altogether if you don't have enough Disk space. I skipped it since I ran out of disk space later on when I downloaded it and had to restart the runtime and notebook in order to get results.**

In [None]:
%%bash
./download_data.sh

Downloading i3d features
Downloading vggish features
Downloading GloVe embeddings
Checking for correctness of the downloaded files
OK: i3d features
OK: vggish features
OK: glove embeddings
Unpacking i3d (~1 min)
Unpacking vggish features
Done


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



### bmt conda environment commands (from yml file)

In [None]:
%%bash
conda env create -f ./conda_env.yml

In [None]:
%%bash
conda info --envs

# conda environments:
#
base                  *  /usr/local
bmt                      /usr/local/envs/bmt



In [None]:
%%bash
source activate bmt

In [None]:
!pip install spacy

In [None]:
%%bash
# install spacy language model. Make sure you activated the conda environment
python -m spacy download en

### Navigate to sample folder and download pretrained models

In [None]:
%cd sample #Should be /content/BMT/sample

/content/BMT/sample


In [None]:
%%bash
wget https://a3s.fi/swift/v1/AUTH_a235c0f452d648828f745589cde1219a/bmt/best_cap_model.pt

In [None]:
%%bash
wget https://a3s.fi/swift/v1/AUTH_a235c0f452d648828f745589cde1219a/bmt/best_prop_model.pt

### Make directory for test video and files needed to generate captions

Navigate back to BMT folder

In [None]:
%cd .. #Should be /content/BMT

/content/BMT


Make a directory called `test` (`BMT/test`) to hold the vggish, rgb, and flow features of the video you want to caption. Then upload (the `.npy` files you got from the I3D and VGGish generation notebooks) from desktop to this folder

In [None]:
!mkdir test

In [None]:
!ls

conda_env.yml  download_data.sh  loss	  README.md  scripts	 utilities
data	       epoch_loops	 main.py  results    submodules
datasets       evaluation	 model	  sample     test


In [None]:
%%bash
source activate bmt

Install the following dependencies/libraries (takes ~2.2 GB of Disk space)

**You may need to play around with the dependencies and versions in case something doesn't work. It was very finicky when I tried it out**

In [None]:
%%bash
nvcc --version #Check NVIDIA version

In [None]:
!nvidia-smi #Check CUDA version

In [None]:
!pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
!pip install torchtext==0.6

In [None]:
!pip install sklearn

In [None]:
!pip install pandas

In [None]:
pip freeze

**Make sure to change `duration_in_secs` to match the length of the video you're trying to caption. Also, change `max_prop_per_vid` to match the maximum number of proposed captions you would like for your video**

**Due to limited disk space, I recommend using a video that's 10 seconds or shorter and to start off `max_prop_per_vid` with value of ~20 and move your way up from there (though you may have luck with other values)**

**Make sure to download the text file that this command writes the results to to your desktop so that you don't lose it when the runtime is disconnected (in this example, it's `skiing_video_output.txt`)**

In [None]:
%%bash
python ./sample/single_video_prediction.py \
    --prop_generator_model_path ./sample/best_prop_model.pt \
    --pretrained_cap_model_path ./sample/best_cap_model.pt \
    --vggish_features_path ./test/skiing_video_vggish.npy \
    --rgb_features_path ./test/skiing_video_rgb.npy \
    --flow_features_path ./test/skiing_video_flow.npy \
    --duration_in_secs 8 \
    --device_id 0 \
    --max_prop_per_vid 100 \
    --nms_tiou_thresh 0.4 > ./test/skiing_video_output.txt

tcmalloc: large alloc 2635227136 bytes == 0x56299ce04000 @  0x7f555d71ab6b 0x7f555d73a379 0x7f5505bb304e 0x7f5505bb4f4a 0x7f553eaa20c4 0x7f554ccef5d9 0x56295ec81d45 0x56295ebf9bfb 0x56295ec81bae 0x56295eca425a 0x56295ec7b2ce 0x56295ec7c32c 0x56295ebf9ddf 0x56295ec1847d 0x7f5550a5a62d 0x7f5550a535ed 0x56295ebf9a5a 0x56295ec81a5c 0x56295eca425a 0x56295ec7cfa6 0x56295ec7d896 0x56295ebf981e 0x56295eca58bb 0x56295ec7afd4 0x56295ec7be51 0x56295ec81b35 0x56295eca425a 0x56295ec7afd4 0x56295ec7be51 0x56295ec81b35 0x56295eca5019
