# **Server and workspace setup**

### Optional git/github/ssh-keygen/ssh-add setup

1. Run `ssh-keygen` and save id_ras in /content/.ssh/id_rsa
2. `ssh-add /content/.ssh/id_rsa` (perhaps will have to repeat it each time)
3. Save the public key in github setting
4. Remember set up local username & email (perhaps will have to repeat it each time)
5. All good to go! 

In [0]:
'''
Run this cell ONLY ONCE to connect/reconnect
Once connected, do NOT run again!!!
'''

googleDriveAlreadySetUp = False
sshServerAlreadySetUp = False

**Step 1. Mount the google drive to */content***

` #learned_from_experience`: 


---


1. The *`/content`* directory is the only place where Google Colab allows file-writing.

2. And, Google Colab would not `cd` to anyother folders (both inside and outside)!

   ```
        ! pwd >>> /content
        ! cd ../ | pwd >>> /content
        ! cd ~ | pwd >>> /content
        ! cd / | pwd >>> /content
        ! cd /content/gdrive | pwd >>> /content
   ```



3. So, if your Google drive is not mounted to *`/content`*, all newly downloaded/created files & folders **(through "`!-commands`" on this notebook interface)** automatically go to *`/content`* instead of any path you try to specify!! 


4. However, if you are **accessing the server using ssh** (see Step 2), you can make change in 2 places: 
        a) /content
        b) your google drive folder (wherever it is mounted to)
        
5. Anyways, to make life easier, we'd better:

        a) mount google drive to /content
        b) use absolute path


In [23]:
if not googleDriveAlreadySetUp:
    from google.colab import drive
    drive.mount('/content/gdrive')
    googleDriveAlreadySetUp = True

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


**Step 2. Server setup**


---


Notes: 

1. Once setup, go to your choice of terminal and do ***`ssh root@0.tcp.ngrok.io -p [port#]`*** and enter the generated random password, you're in!

2. Sometimes you will encounter the following error message:
    ```
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    IndexError: list index out of range
    ```
   I have no idea how it occurs, but the server is actually running. 
   
   To see the portal number, just run the cell again.

In [24]:
if not sshServerAlreadySetUp:
    #Generate root password
    import random, string
    global password
    password = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(20))

    #Download ngrok
    ! wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
    ! unzip -qq -n ngrok-stable-linux-amd64.zip
    #Setup sshd
    ! apt-get install -qq -o=Dpkg::Use-Pty=0 openssh-server pwgen > /dev/null
    #Set root password
    ! echo root:$password | chpasswd
    ! mkdir -p /var/run/sshd
    ! echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
    ! echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
    ! echo "LD_LIBRARY_PATH=/usr/lib64-nvidia" >> /root/.bashrc
    ! echo "export LD_LIBRARY_PATH" >> /root/.bashrc

    #Run sshd
    get_ipython().system_raw('/usr/sbin/sshd -D &')

    #Ask token
    print("Copy authtoken from https://dashboard.ngrok.com/auth")
    import getpass
    authtoken = getpass.getpass()

    #Create tunnel
    get_ipython().system_raw('./ngrok authtoken $authtoken && ./ngrok tcp 22 &')
    #Print root password
    print("Root password: {}".format(password))
    #Get public address
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
    
    # finished setup
    sshServerAlreadySetUp = True
    
else:
    print("ssh server already setup")   
    print("password", password)
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
    

Copy authtoken from https://dashboard.ngrok.com/auth
··········
Root password: 72brST9lBpO9ssOH93re
tcp://0.tcp.ngrok.io:17509


**Step 3. Workplace Setup and Raw File Preparation**


---



Directory path: /content/gdrive/Shared\ drives/VQA

We (Shikhar and Nuan) decided to only work on VQA v2 with only open-ended answers.


---


Notes on file system:
1. `wget -P []` specifies the directory to download to, if not existing will be created.

2. Have to use absolute path since we cannot change current directory (run `pwd` will always return `/content`) in G-Colab notebook.


---


Notes on data collection:


1. Every image has several free-form natural-language questions with 10 concise open-ended answers each.

2. The annotations we release are the result of the following post-processing steps on the raw crowdsourced data:
    - Spelling correction (using Bing Speller) of question and answer strings
    - Question normalization (first char uppercase, last char ‘?’)
    - Answer normalization (all chars lowercase, no period except as decimal point, number words —> digits, strip articles (a, an the))
    - Adding apostrophe if a contraction is missing it (e.g., convert "dont" to "don't")




In [25]:
# check if Annotations_Train_mscoco.zip is downloaded, if not, download it
! test -f /content/gdrive/Shared\ drives/VQA/data/Annotations_Train_mscoco.zip \
    && echo "Annotations_Train_mscoco.zip already here, skip download" \
    || { echo "Annotations_Train_mscoco.zip does not exist, start downloading..."; \
         wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/Annotations_Train_mscoco.zip \
                -P /content/gdrive/Shared\ drives/VQA/data;}
/

# check if Annotations_Train_mscoco.zip is unzipped, if not, unzip it
! test -f /content/gdrive/Shared\ drives/VQA/data/mscoco_train2014_annotations.json \
    && echo "mscoco_train2014_annotations.json already here, skip unzip" \
    || { echo "mscoco_train2014_annotations.json does not exist, start unzipping..."; \
         unzip /content/gdrive/Shared\ drives/VQA/data/Annotations_Train_mscoco.zip \
               -d /content/gdrive/Shared\ drives/VQA/data;}
/

print("raw data collected.")

Annotations_Train_mscoco.zip already here, skip download
mscoco_train2014_annotations.json already here, skip unzip
raw data collected.


# **Preprocessing**