
# **Server and workspace setup**

In [0]:
''' <-- here
# Run this cell ONLY ONCE when connecting
# Once connected, uncomment the first line!
googleDriveAlreadySetUp = False
sshServerAlreadySetUp = False
# '''

##**Step 1. Mount the google drive to */content***


---


1. The *`/content`* directory is the only place where Google Colab allows file-writing. However, it is NOT persistent between each connection! (All change you make here will be erased once disconnected!!)

2. And, Google Colab stays in /content would not `cd` to anyother folders (both inside and outside /content)!

   ```
        ! pwd >>> /content
        ! cd ../ | pwd >>> /content
        ! cd ~ | pwd >>> /content
        ! cd / | pwd >>> /content
        ! cd /content/gdrive | pwd >>> /content
   ```



3. So, all newly downloaded/created files & folders **(through "`!-commands`" on this notebook interface without an abosolute path)** automatically go to *`/content`*!! 


4. However, if you are **accessing the server using ssh** (see Step 2), you can make changes in 2 places: 
        a) /content
        b) your google drive folder (wherever it is mounted to)
        
5. The conclusion: 

    **a) only make changes in /content/gdrive**

    **b) always use absolute path in Colab notebook**
    
6. For shared drives: everything is the same as gdrive/My\ Drive. 
    - expecting some file lock features to prevent simultaneous editing 
    - but didn't experient on that yet (#learning_through_errors)

In [2]:
if not googleDriveAlreadySetUp:
    from google.colab import drive
    drive.mount('/content/gdrive')
    googleDriveAlreadySetUp = True

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


##**Step 2. Server setup**


---


Notes: 

1. Once setup, go to your choice of terminal and do ***`ssh root@0.tcp.ngrok.io -p [port#]`*** and enter the generated random password, you're in!

2. Sometimes you will encounter the following error message:
    ```
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    IndexError: list index out of range
    ```
   This may (or may not) be because of the following line:
   ```
   ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
   ```
   
    *#TODO rewrite this line with python, not linux command.*
   
3. Yet, the connection has been setup. To see the portal number, just run the cell again.

In [3]:
if not sshServerAlreadySetUp:
    #Generate root password
    import random, string
    global password
    password = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(20))

    #Download ngrok
    ! wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
    ! unzip -qq -n ngrok-stable-linux-amd64.zip
    #Setup sshd
    ! apt-get install -qq -o=Dpkg::Use-Pty=0 openssh-server pwgen > /dev/null
    #Set root password
    ! echo root:$password | chpasswd
    ! mkdir -p /var/run/sshd
    ! echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
    ! echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
    ! echo "LD_LIBRARY_PATH=/usr/lib64-nvidia" >> /root/.bashrc
    ! echo "export LD_LIBRARY_PATH" >> /root/.bashrc

    #Run sshd
    get_ipython().system_raw('/usr/sbin/sshd -D &')

    #Ask token
    print("Copy authtoken from https://dashboard.ngrok.com/auth")
    import getpass
    authtoken = getpass.getpass()

    #Create tunnel
    get_ipython().system_raw('./ngrok authtoken $authtoken && ./ngrok tcp 22 &')
    #Print root password
    print("Root password: {}".format(password))
    #Get public address
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
    
    # finished setup
    print("ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] ")
    sshServerAlreadySetUp = True # could the bug be here? no more line after a indented command?
    
else:
    print("ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] ")   
    print("password", password)
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"


Creating config file /etc/ssh/sshd_config with new version
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:xe5wLAl3tfONPBWQbqSGbTjTannrjOepHQUt3Diqk2c root@fb37cdd7beaf (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:YG+hg/rGCfc7q4cw8PEQAZ1ZIomLmyfsVmS/0ZplLVM root@fb37cdd7beaf (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:4iVWYktqjC27Wmb7mRr1bxzyUfasECitYPFxYUADP1U root@fb37cdd7beaf (ED25519)
Created symlink /etc/systemd/system/sshd.service → /lib/systemd/system/ssh.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service → /lib/systemd/system/ssh.service.
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Copy authtoken from https://dashboard.ngrok.com/auth
··········
Root password: ZL48TWilYmcmrGFtH371
tcp://0.tcp.ngrok.io:14576
ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] 


## Optional git/github/ssh-keygen/ssh-add setup


- For once, run `ssh-keygen` and save id_ras in /content/gdrive/My\ Drive/.ssh/id_rsa and save the public key in github setting

- Every time connected to server, 
    1. run `eval $(ssh-agent)`
    2. run `ssh-add /content/gdrive/My\ Drive/.ssh/id_rsa`
    3. set up local username & email (everytime if in Shared Drive (multiple users), just once if in personal drive):
```
     git config user.name "your-user-name"
     git config user.email "your-email-addr"
```



In [0]:
! eval $(ssh-agent)
! cat /content/gdrive/My\ Drive/.ssh/id_rsa.pub
# Run the following command via ssh server to avoid "Could not open a connection to your authentication agent."
# ! ssh-add /content/gdrive/My\ Drive/.ssh/id_rsa

**Step 3. Workplace Setup and Raw File Preparation**


---



Directory path: /content/gdrive/Shared\ drives/VQA

We (Shikhar and Nuan) decided to only work on VQA v2 with only open-ended answers.


---


Notes on file system:
1. `wget -P []` specifies the directory to download to, if not existing will be created.

2. Have to use absolute path since we cannot change current directory (run `pwd` will always return `/content`) in G-Colab notebook.


---


Notes on data collection:


1. Every image has several free-form natural-language questions with 10 concise open-ended answers each.

2. The annotations "we" release are the result of the following post-processing steps on the raw crowdsourced data:
    - Spelling correction (using Bing Speller) of question and answer strings
    - Question normalization (first char uppercase, last char ‘?’)
    - Answer normalization (all chars lowercase, no period except as decimal point, number words —> digits, strip articles (a, an the))
    - Adding apostrophe if a contraction is missing it (e.g., convert "dont" to "don't")


---

Notes on folder structure:



```
root@addd4560b196:/content/gdrive/Shared drives/VQA# tree -a -C -L 4 ./
./
├── data
│   ├── datahelper (https://github.com/GT-Vision-Lab/VQA)
│   │   └── ...
│   ├── raw zip
│   |   ├── Annotations_Train_abstract_v002.zip
│   |   └── Questions_Train_abstract_v002.zip
│   └── abstract scene
│   │   └── train                                                               
│   │       ├── abstract_v002_train2015_annotations.json
│   │       ├── MultipleChoice_abstract_v002_train2015_questions.json
│   │       ├── OpenEnded_abstract_v002_train2015_questions.json
│   │       └── images
│   │           ├── abstract_v002_train2015_000000000000.png
│   │           ├── abstract_v002_train2015_000000000001.png
│   │           ├── ...
│   │           ├── abstract_v002_train2015_000000019998.png
│   │           └── abstract_v002_train2015_000000019999.png
└── VQA_baseline_with_notes.ipynb

39 directories, 43 files

```




In [0]:
# Abstract Scene (same as v1.0 release)
print("=============================================================================================")
print("\nCollecting raw training data for abstract scenes...\n")
print("---------------------------------------------------------------------------------------------")

'''Training Annotations'''

# check if Annotations_Train_abstract_v002.zip is downloaded, if not, download it
! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Annotations_Train_abstract_v002.zip \
    && echo "Annotations_Train_abstract_v002.zip already here, skip download" \
    || { echo "Annotations_Train_abstract_v002.zip does not exist, start downloading..."; \
         wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/vqa/Annotations_Train_abstract_v002.zip \
                -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
/

# check if Annotations_Train_abstract_v002.zip is unzipped, if not, unzip it
! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/abstract_v002_train2015_annotations.json \
    && echo "abstract_v002_train2015_annotations.json already here, skip unzip" \
    || { echo "abstract_v002_train2015_annotations.json does not exist, start unzipping..."; \
         unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Annotations_Train_abstract_v002.zip \
               -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train;}
/

''' Training Questions '''
# check if Questions_Train_abstract_v002.zip is downloaded, if not, download it
! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Questions_Train_abstract_v002.zip \
    && echo "Questions_Train_abstract_v002.zip already here, skip download" \
    || { echo "Questions_Train_abstract_v002.zip does not exist, start downloading..."; \
         wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/vqa/Questions_Train_abstract_v002.zip \
                -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
/

# check if Questions_Train_abstract_v002.zip is unzipped, if not, unzip it
! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/OpenEnded_abstract_v002_train2015_questions.json \
    && echo "OpenEnded(MultipleChoice)_abstract_v002_train2015_questions.json already here, skip unzip" \
    || { echo "OpenEnded(MultipleChoice)_abstract_v002_train2015_questions.json does not exist, start unzipping..."; \
         unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Questions_Train_abstract_v002.zip \
               -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train;}
/

''' Training Images '''
# check if scene_img_abstract_v002_train2015.zip is downloaded, if not, download it
! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/scene_img_abstract_v002_train2015.zip \
    && echo "scene_img_abstract_v002_train2015.zip already here, skip download" \
    || { echo "scene_img_abstract_v002_train2015.zip does not exist, start downloading..."; \
         wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/scene_img/scene_img_abstract_v002_train2015.zip \
                -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
/

# check if Questions_Train_abstract_v002.zip is unzipped, if not, unzip it
! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/images/abstract_v002_train2015_000000000001.png \
    && echo "abstract scene/train/images already here, skip unzip" \
    || { echo "abstract scene/train/images does not exist, start unzipping..."; \
         unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/scene_img_abstract_v002_train2015.zip \
               -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/images;}
/
print()

print("raw data for abstract training collected.")

print("---------------------------------------------------------------------------------------------")

Similarly, MSCOCO_2014 ("balanced real images") datasets are collected.

# **Preprocessing**


---


Goal: 

To write a script that will convert the VQA dataset in the following format (in a .txt file):

> img_path \t question \t answer

> (no space between each string)



Note that img_path should preferrably contain only the file name (e.g. img_1.jpg), instead of the entire path (/home/axe/.../img_1.jpg).
Also, you may write the question & answer in comma-separated style.   (e.g. where,is,he,?)

In [0]:
# https://stackoverflow.com/questions/67631/how-to-import-a-module-given-the-full-path

# For Python 3.5+ use:

# import importlib.util
# spec = importlib.util.spec_from_file_location("module.name", "/path/to/file.py")
# foo = importlib.util.module_from_spec(spec)
# spec.loader.exec_module(foo)
# foo.MyClass()

# MODULE_PATH = "/content/gdrive/Shared drives/VQA/data/datahelper/PythonHelperTools/vqaTools/__init__.py"
# MODULE_NAME = "vqaTools"
# import importlib
# import sys
# spec = importlib.util.spec_from_file_location(MODULE_NAME, MODULE_PATH)
# vqaTools = importlib.util.module_from_spec(spec)
# sys.modules[spec.name] = vqaTools 
# spec.loader.exec_module(vqaTools)

# import sys
# sys.path.append("/content/gdrive/Shared drives/VQA/data/datahelper/PythonHelperTools/vqaTools")
# from vqaTools.vqa import VQA


In [0]:
'''/content/gdrive/Shared drives/VQA/data/datahelper/PythonHelperTools/vqaTools was written in Python 2.7 need to modify it to Python 3.7'''
# change VQA from python2.7 to python3.7 Done

# __author__ = 'aagrawal'
# https://github.com/GT-Vision-Lab/VQA/blob/master/PythonHelperTools/vqaTools/vqa.py

# Interface for accessing the VQA dataset.

# This code is based on the code written by Tsung-Yi Lin for MSCOCO Python API available at the following link: 
# (https://github.com/pdollar/coco/blob/master/PythonAPI/pycocotools/coco.py).

# The following functions are defined:
#  VQA        - VQA class that loads VQA annotation file and prepares data structures.
#  getQuesIds - Get question ids that satisfy given filter conditions.
#  getImgIds  - Get image ids that satisfy given filter conditions.
#  loadQA     - Load questions and answers with the specified question ids.
#  showQA     - Display the specified questions and answers.
#  loadRes    - Load result file and create result object.

# Help on each function can be accessed by: "help(COCO.function)"

import json
import datetime
import copy

class VQA:
    def __init__(self, annotation_file=None, question_file=None):
        """
           Constructor of VQA helper class for reading and visualizing questions and answers.
        :param annotation_file (str): location of VQA annotation file
        :return:
        """
        # load dataset
        self.dataset = {}
        self.questions = {}
        self.qa = {}
        self.qqa = {}
        self.imgToQA = {}
        if not annotation_file == None and not question_file == None:
            print('loading VQA annotations and questions into memory...')
            time_t = datetime.datetime.utcnow()
            dataset = json.load(open(annotation_file, 'r'))
            questions = json.load(open(question_file, 'r'))
            print(datetime.datetime.utcnow() - time_t)
            self.dataset = dataset
            self.questions = questions
            self.createIndex()

    def createIndex(self):
        # create index
        print('creating index...')
        imgToQA = {ann['image_id']: [] for ann in self.dataset['annotations']}
        qa =  {ann['question_id']:       [] for ann in self.dataset['annotations']}
        qqa = {ann['question_id']:       [] for ann in self.dataset['annotations']}
        for ann in self.dataset['annotations']:
            imgToQA[ann['image_id']] += [ann]
            qa[ann['question_id']] = ann
        for ques in self.questions['questions']:
              qqa[ques['question_id']] = ques
        print('index created!')

         # create class members
        self.qa = qa
        self.qqa = qqa
        self.imgToQA = imgToQA

    def info(self):
        """
        Print information about the VQA annotation file.
        :return:
        """
        for key, value in self.dataset['info'].items():
            print('%s: %s'%(key, value))

    def getQuesIds(self, imgIds=[], quesTypes=[], ansTypes=[]):
        """
        Get question ids that satisfy given filter conditions. default skips that filter
        :param     imgIds    (int array)   : get question ids for given imgs
                quesTypes (str array)   : get question ids for given question types
                ansTypes  (str array)   : get question ids for given answer types
        :return:    ids   (int array)   : integer array of question ids
        """
        imgIds       = imgIds    if type(imgIds)    == list else [imgIds]
        quesTypes = quesTypes if type(quesTypes) == list else [quesTypes]
        ansTypes  = ansTypes  if type(ansTypes)  == list else [ansTypes]

        if len(imgIds) == len(quesTypes) == len(ansTypes) == 0:
            anns = self.dataset['annotations']
        else:
            if not len(imgIds) == 0:
                anns = sum([self.imgToQA[imgId] for imgId in imgIds if imgId in self.imgToQA],[])
            else:
                 anns = self.dataset['annotations']
            anns = anns if len(quesTypes) == 0 else [ann for ann in anns if ann['question_type'] in quesTypes]
            anns = anns if len(ansTypes)  == 0 else [ann for ann in anns if ann['answer_type'] in ansTypes]
        ids = [ann['question_id'] for ann in anns]
        return ids

    def getImgIds(self, quesIds=[], quesTypes=[], ansTypes=[]):
        """
        Get image ids that satisfy given filter conditions. default skips that filter
        :param quesIds   (int array)   : get image ids for given question ids
               quesTypes (str array)   : get image ids for given question types
               ansTypes  (str array)   : get image ids for given answer types
        :return: ids     (int array)   : integer array of image ids
        """
        quesIds   = quesIds   if type(quesIds)   == list else [quesIds]
        quesTypes = quesTypes if type(quesTypes) == list else [quesTypes]
        ansTypes  = ansTypes  if type(ansTypes)  == list else [ansTypes]

        if len(quesIds) == len(quesTypes) == len(ansTypes) == 0:
            anns = self.dataset['annotations']
        else:
            if not len(quesIds) == 0:
                anns = sum([self.qa[quesId] for quesId in quesIds if quesId in self.qa],[])
            else:
                anns = self.dataset['annotations']
            anns = anns if len(quesTypes) == 0 else [ann for ann in anns if ann['question_type'] in quesTypes]
            anns = anns if len(ansTypes)  == 0 else [ann for ann in anns if ann['answer_type'] in ansTypes]
        ids = [ann['image_id'] for ann in anns]
        return ids

    def loadQA(self, ids=[]):
        """
        Load questions and answers with the specified question ids.
        :param ids (int array)       : integer ids specifying question ids
        :return: qa (object array)   : loaded qa objects
        """
        if type(ids) == list:
            return [self.qa[id] for id in ids]
        elif type(ids) == int:
            return [self.qa[ids]]

    def showQA(self, anns):
        """
        Display the specified annotations.
        :param anns (array of object): annotations to display
        :return: None
        """
        if len(anns) == 0:
            return 0
        for ann in anns:
            quesId = ann['question_id']
            print("Question: %s" %(self.qqa[quesId]['question']))
            for ans in ann['answers']:
                print("Answer %d: %s" %(ans['answer_id'], ans['answer']))
        
    def loadRes(self, resFile, quesFile):
        """
        Load result file and return a result object.
        :param   resFile (str)     : file name of result file
        :return: res (obj)         : result api object
        """
        res = VQA()
        res.questions = json.load(open(quesFile))
        res.dataset['info'] = copy.deepcopy(self.questions['info'])
        res.dataset['task_type'] = copy.deepcopy(self.questions['task_type'])
        res.dataset['data_type'] = copy.deepcopy(self.questions['data_type'])
        res.dataset['data_subtype'] = copy.deepcopy(self.questions['data_subtype'])
        res.dataset['license'] = copy.deepcopy(self.questions['license'])

        print('Loading and preparing results...')
        time_t = datetime.datetime.utcnow()
        anns   = json.load(open(resFile))
        assert type(anns) == list, 'results is not an array of objects'
        annsQuesIds = [ann['question_id'] for ann in anns]
        assert set(annsQuesIds) == set(self.getQuesIds()), \
        'Results do not correspond to current VQA set. Either the results do not have predictions for all question ids in annotation file or there is atleast one question id that does not belong to the question ids in the annotation file.'
        for ann in anns:
            quesId                  = ann['question_id']
            if res.dataset['task_type'] == 'Multiple Choice':
                assert ann['answer'] in self.qqa[quesId]['multiple_choices'], 'predicted answer is not one of the multiple choices'
            qaAnn                = self.qa[quesId]
            ann['image_id']      = qaAnn['image_id'] 
            ann['question_type'] = qaAnn['question_type']
            ann['answer_type']   = qaAnn['answer_type']
        print('DONE (t=%0.2fs)'%((datetime.datetime.utcnow() - time_t).total_seconds()))

        res.dataset['annotations'] = anns
        res.createIndex()
        return res


In [0]:
import argparse

parser = argparse.ArgumentParser(description='Prepare data for balanced real images QA aka COCO')

parser.add_argument('--inp_dir', type=str, help='path to ../ba;anced real/ directory', required=True)
parser.add_argument('--label_loc', type=str, help='location to store first 1000 most frequent answers', required=True)
parser.add_argument('--dictionary_loc', type=str, help='location to store index2word and word2index dictionaries', required=True)
parser.add_argument('--out_dir', type=str, help='output directory', required=True)

args = parser.parse_args()


In [5]:
annFile = "/content/gdrive/Shared drives/VQA/data/abstract scene/train/abstract_v002_train2015_annotations.json"
quesFile = "/content/gdrive/Shared drives/VQA/data/abstract scene/train/OpenEnded_abstract_v002_train2015_questions.json"

# initialize VQA api for QA annotations

vqa=VQA(annFile, quesFile)

loading VQA annotations and questions into memory...
0:00:03.405463
creating index...
index created!


In [6]:
vqa.info()

description: This is v1.0 of the VQA dataset.
url: http://visualqa.org
version: 1.0
year: 2015
contributor: VQA Team
date_created: 2015-10-02 19:48:55


In [7]:
# import pickle

print("type(vqa.dataset):\n\t", type(vqa.dataset), "\n")
print("vqa.dataset.keys():\n\t", vqa.dataset.keys(), "\n")
print("type(vqa.dataset['annotations']):\n\t", type(vqa.dataset['annotations']), "\n")
print("type(vqa.dataset['annotations'][0]):\n\t", type(vqa.dataset['annotations'][0]), "\n")
print("vqa.dataset['annotations'][0].keys():\n\t", vqa.dataset['annotations'][0].keys(), "\n")
print("one example:", vqa.dataset['annotations'][-1])
# most confident &/ most frequent ==> just use the multiple_choice_answer

print("\nnumber of answers:", len(vqa.dataset['annotations']))

imgQuesAnsTupList = []
for i in range(len(vqa.dataset['annotations'])):
    imgID = vqa.dataset['annotations'][i]['image_id']
    quesID = vqa.dataset['annotations'][i]['question_id']
    ansStr = vqa.dataset['annotations'][i]['multiple_choice_answer']
    imgQuesAnsTupList.append((imgID, vqa.qqa[quesID]['question'], ansStr))

type(vqa.dataset):
	 <class 'dict'> 

vqa.dataset.keys():
	 dict_keys(['info', 'data_type', 'license', 'data_subtype', 'annotations']) 

type(vqa.dataset['annotations']):
	 <class 'list'> 

type(vqa.dataset['annotations'][0]):
	 <class 'dict'> 

vqa.dataset['annotations'][0].keys():
	 dict_keys(['question_type', 'multiple_choice_answer', 'answers', 'image_id', 'answer_type', 'question_id']) 

one example: {'question_type': 'what is the man', 'multiple_choice_answer': 'soccer ball', 'answers': [{'answer': 'soccer ball', 'answer_confidence': 'yes', 'answer_id': 1}, {'answer': 'nothing', 'answer_confidence': 'yes', 'answer_id': 2}, {'answer': 'soccer ball', 'answer_confidence': 'yes', 'answer_id': 3}, {'answer': 'soccer ball', 'answer_confidence': 'yes', 'answer_id': 4}, {'answer': 'soccer ball', 'answer_confidence': 'yes', 'answer_id': 5}, {'answer': 'soccer ball', 'answer_confidence': 'yes', 'answer_id': 6}, {'answer': 'ball', 'answer_confidence': 'yes', 'answer_id': 7}, {'answer': 'soc

In [8]:
from IPython.display import Image, display
import random

index = random.randint(0, 19999)

print(imgQuesAnsTupList[index])
imgIdStr = str(imgQuesAnsTupList[index][0]) if len(str(imgQuesAnsTupList[index][0])) == 5 else "0"+str(imgQuesAnsTupList[index][0])
imgStr = "abstract_v002_train2015_0000000" + imgIdStr + ".png"
path = "/content/gdrive/Shared drives/VQA/data/abstract scene/train/images/"
display(Image(filename=path+imgStr))


(19025, 'Are the leaves on the tree lighter or darker green than the leaves on the bush?', 'darker')


OSError: ignored