RecipeQA is a QA dataset with focus on using multimodal information for question answering. In this prelimanary experiment, we set the task as a seq2seq problem so we will focus only on the 'textual_cloze' subproblems and ignoring visual information for now. 


Each 'textual_cloze' examples has a context which contains different parts of the recipe. We use a custom identifier '@context' to concatenate different parts of context to a single sequence to help in differentiate different parts of context.  

Each problem has a question to "Choose the best title for the missing blank to correctly complete the recipe." with recipe shown as 

"['Ingredients Halal Vanilla Extract', 'Scrape Vanilla Beans', 'Vegetable Glycerin and Vanilla Beans', '@placeholder']", where @placeholder shows the missing blank.

There are 4 options given to choose for the blank:

['How to Make Caffe Mocha', 'How to Make Koki Paratha', 'Vanilla Beans Can Use With Vegetable Glycerin', 'Prepare the Dough']

We represent the question sequences using two custom identifiers '@q_pad' and '@c_pad'. So the above the question transforms to 

Ingredients Halal Vanilla Extract @q_pad Scrape Vanilla Beans @q_pad Vegetable Glycerin and Vanilla Beans @q_pad @placeholder @c_pad How to Make Caffe Mocha @c_pad How to Make Koki Paratha @c_pad Vanilla Beans Can Use With Vegetable Glycerin @c_pad Prepare the Dough

Note: Since the question_text in all 'textual cloze' tasks is 
"Choose the best title for the missing blank to correctly complete the recipe." we don't represent it in the question sequence.


We model them as a Seq2seq task. The first RNN looks at the context sequence. The second RNN looks at the question sequence and we use the last hidden state of second RNN to classify to values [0,1,2,3] each representing choice numbers. It should give more than 25% accuracy(random guessing)

The model is simply learning train data. When run for more epochs, train accuracy increases but not valid so the model is simply learning the training data.

In [1]:
import torch
import json
from torchtext import data
import torch.nn as nn
import torch.optim as optim
import time

Download dataset( if not already downloaded )

In [2]:
%%bash
FILE=/content/train.json
if [ ! -f "$FILE" ]; then
  wget -c "https://vision.cs.hacettepe.edu.tr/files/recipeqa/train.json"
fi
file=/content/val.json
if [ ! -f "$file" ]; then
  wget -c "https://vision.cs.hacettepe.edu.tr/files/recipeqa/val.json"
fi
file=/content/test.json
if [ ! -f "$file" ]; then
  wget -c "https://vision.cs.hacettepe.edu.tr/files/recipeqa/test.json"
fi

# Create Dataset and Iterators

In [3]:
train_file = "/content/train.json"
test_file = "/content/test.json"
val_file = "/content/val.json"


# Padding for dataset
context_sequence_pad = ' @context '
question_sequence_pad = ' @q_pad '
choice_list_pad = ' @c_pad '

# Selecting examples of 'textual cloze' and creating sequences with padding
def get_examples(file):
    ak = json.load(open(file))

    k = [i for i in ak['data'] if i['task'] not in ['visual_coherence', 'visual_cloze', 'visual_ordering']]

    examples = []

    for i, j in enumerate(k):
        l = {}
        l['context'] = context_sequence_pad.join([m['body'] for m in j['context']])
        l['question'] = question_sequence_pad.join(j['question']) + choice_list_pad + choice_list_pad.join(
            j['choice_list'])
        l['answer'] = j['answer']
        examples.append(l)
    return examples

train_examples = get_examples(train_file)
test_examples = get_examples(test_file)
val_examples = get_examples(val_file)

# Defining fields for the context, question and answer
context = data.Field(sequential=True, tokenize='spacy', init_token='<sos>', eos_token='<eos>')
question = data.Field(sequential=True, tokenize='spacy', init_token='<sos>', eos_token='<eos>')
answer = data.LabelField(is_target=True, preprocessing=lambda x: str(x), tokenize='spacy',sequential=False)
fields = [('context', context), ('question', question), ('answer', answer)]

# creating datasets
train_Examples = [data.Example.fromlist([i['context'], i['question'], i['answer']], fields) for i in train_examples]
train_dataset = data.Dataset(train_Examples, fields)
test_Examples = [data.Example.fromlist([i['context'], i['question'], i['answer']], fields) for i in test_examples]
test_dataset = data.Dataset(test_Examples, fields)
val_Examples = [data.Example.fromlist([i['context'], i['question'], i['answer']], fields) for i in val_examples]
val_dataset = data.Dataset(val_Examples, fields)


#Build Vocabs
context.build_vocab(train_dataset,min_freq = 2,max_size = 30000,vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)
question.build_vocab(train_dataset, min_freq = 2,max_size = 6000,vectors = "glove.6B.100d", 
                 unk_init = torch.Tensor.normal_)
answer.build_vocab(train_dataset)

# build iterators
BATCH_SIZE = 128
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_iterator, test_iterator, val_iterator = data.BucketIterator.splits((train_dataset, test_dataset, val_dataset),
                                                                         batch_size=BATCH_SIZE,
                                                                         sort_key=lambda x: len(x.context),
                                                                         sort_within_batch=True,device= device)

100%|█████████▉| 399242/400000 [00:15<00:00, 25770.61it/s]

In [4]:
print(train_examples[0])

{'context': '3 until 5 whole vanilla beans250 gram of vegetable glycerin food gradeEvery 100 gram of vanilla beans have 35 until 40 of whole vanilla beans @context Scrape Vanilla Beans and get the seeds into vegetable glycerin @context Vanilla Beans Seed and Vegetable Glycerin @context Whole Vanilla Beans put in a bottle with seeds and vegetable glycerin', 'question': 'Ingredients Halal Vanilla Extract @q_pad Scrape Vanilla Beans @q_pad Vegetable Glycerin and Vanilla Beans @q_pad @placeholder @c_pad How to Make Caffe Mocha @c_pad How to Make Koki Paratha @c_pad Vanilla Beans Can Use With Vegetable Glycerin @c_pad Prepare the Dough', 'answer': 2}


In [5]:
print(len(train_examples))

7837


# Define Model

In [6]:
class Encoder(nn.Module):
  def __init__(self,context_dim,emb_dim,hid_dim,n_layers,dropout,bidirectional):
    super().__init__()
    self.hid_dim = hid_dim
    
    self.embedding = nn.Embedding(context_dim,emb_dim)

    self.rnn = nn.LSTM(input_size=emb_dim,hidden_size = hid_dim,num_layers= n_layers,dropout= dropout,bidirectional = bidirectional)

    self.dropout = nn.Dropout(dropout)
  
  def forward(self,context):
    
    embedded = self.dropout(self.embedding(context))

    outputs, (hidden,cell_state) = self.rnn(embedded)

    return hidden,cell_state


In [7]:
class Decoder(nn.Module):
  def __init__(self,question_dim,emb_dim,hid_dim,n_layers,bidirectional,dropout):
    super().__init__()
    self.hid_dim = hid_dim

    self.embedded = nn.Embedding(question_dim,emb_dim)

    self.rnn = nn.LSTM(input_size=emb_dim,hidden_size = hid_dim,num_layers= n_layers,dropout= dropout,bidirectional = bidirectional)

    self.dropout = nn.Dropout(dropout)
    

  def forward(self, question, hidden, cell_state):
    # since the task is not about sequence prediction but getting a representation of the 'question' sequence
    # we just pass the sequence once and not step by step

    input = self.dropout(self.embedded(question))

    outputs , (hidden, cell_state) = self.rnn(input)

    return outputs


In [8]:
class Seq2Seq(nn.Module):
  def __init__(self,context_dim,question_dim,answer_dim,emb_dim,hid_dim,n_layers,bidirectional,dropout):
    super().__init__()

    self.encoder = Encoder(context_dim,emb_dim,hid_dim,n_layers,dropout,bidirectional)

    self.decoder = Decoder(question_dim,emb_dim,hid_dim,n_layers,bidirectional,dropout)

    self.no_of_directions= 2 if bidirectional else 1

    self.fc_out = nn.Linear(self.no_of_directions*hid_dim,answer_dim)

  def forward(self,context,question):

    encoder_hidden,encoder_cell_state = self.encoder(context)

    decoder_outputs = self.decoder(question,encoder_hidden,encoder_cell_state)

    decoder_output = decoder_outputs[-1].squeeze(0)

    output = self.fc_out(decoder_output)

    return output

# Initialize model, optimizer and Loss

In [9]:
context_dim = len(context.vocab)
question_dim = len(question.vocab)
answer_dim = len(answer.vocab)

print(context_dim)
print(question_dim)
print(answer_dim)

emb_dim = 100
hid_dim = 256
n_layers = 1
bidirectional = False
dropout = 0.5

model = Seq2Seq(context_dim,question_dim,answer_dim,emb_dim,hid_dim,n_layers,bidirectional,dropout).to(device)

def init_weights(m):
  for name,param in m.named_parameters():
    nn.init.uniform_(param.data, -0.08, 0.08)

# model.apply(init_weights)


# Optimizer
optimizer = optim.Adam(model.parameters())

# Loss
criterion = nn.CrossEntropyLoss()

# Accuracy
def accuracy(predictions, answers):
  _, predictions = torch.max(predictions,1)
  correct = (predictions == answers).float()
  acc = correct.sum()/len(correct)

  return acc

30004
6004
4


  "num_layers={}".format(dropout, num_layers))


# Train and Eval loop

In [10]:
def train(model, iterator,optimizer,criterion,clip):
  model.train()

  epoch_loss = 0
  epoch_acc = 0

  for i, batch in enumerate(iterator):
    context = batch.context
    question = batch.question
    answer = batch.answer

    optimizer.zero_grad()

    output = model(context,question)
    
    # answer = answer.t().squeeze()

    loss = criterion(output,answer)
    acc_ = accuracy(output,answer)

    loss.backward()

    # torch.nn.utils.clip_grad_norm(model.parameters(),clip)

    optimizer.step()

    epoch_loss += loss.item()
    epoch_acc += acc_.item()

  return epoch_loss/len(iterator) , epoch_acc/len(iterator)

def evaluate(model,iterator,criterion):

  model.eval()

  epoch_loss = 0
  epoch_acc = 0

  with torch.no_grad():
    for i,batch in enumerate(iterator):
      context = batch.context
      question = batch.question
      answer = batch.answer

      output = model(context,question)

      # answer = answer.t().squeeze()

      loss = criterion(output,answer)
      acc_ = accuracy(output,answer)

      epoch_loss += loss.item()
      epoch_acc += acc_.item()

  return epoch_loss/len(iterator), epoch_acc/len(iterator)

def epoch_time(start_time,end_time):
  elapsed_time = end_time - start_time
  elapsed_mins = int(elapsed_time/60)
  elapsed_secs = int(elapsed_time - (elapsed_mins*60))
  return elapsed_mins,elapsed_secs

# Training

In [11]:
N_EPOCHS = 10
CLIP = 1

best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):

  start_time = time.time()

  train_loss,train_acc = train(model,train_iterator,optimizer,criterion,CLIP)
  test_loss,test_acc = evaluate(model,test_iterator,criterion)
  val_loss,val_acc = evaluate(model,val_iterator,criterion)

  end_time = time.time()

  epoch_mins, epoch_secs = epoch_time(start_time, end_time)

  if val_loss < best_valid_loss:
    best_valid_loss = val_loss
    torch.save(model.state_dict(), 'tut1-model.pt')

  print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
  print(f'\tTrain Loss: {train_loss:.3f} | Train Accuracy: {train_acc*100:.3f}')
  print(f'\t Val. Loss: {val_loss:.3f} |  Val. Accuracy: {val_acc*100:.3f}')
  print(f'\t Test. Loss: {test_loss:.3f} |  Val. Accuracy: {test_acc*100:.3f}')

Epoch: 01 | Time: 0m 4s
	Train Loss: 1.391 | Train Accuracy: 24.936
	 Val. Loss: 1.397 |  Val. Accuracy: 23.212
	 Test. Loss: 1.393 |  Val. Accuracy: 24.257


100%|█████████▉| 399242/400000 [00:29<00:00, 25770.61it/s]

Epoch: 02 | Time: 0m 4s
	Train Loss: 1.388 | Train Accuracy: 25.051
	 Val. Loss: 1.390 |  Val. Accuracy: 23.212
	 Test. Loss: 1.389 |  Val. Accuracy: 24.159
Epoch: 03 | Time: 0m 4s
	Train Loss: 1.386 | Train Accuracy: 25.819
	 Val. Loss: 1.389 |  Val. Accuracy: 23.212
	 Test. Loss: 1.389 |  Val. Accuracy: 24.159
Epoch: 04 | Time: 0m 4s
	Train Loss: 1.386 | Train Accuracy: 25.278
	 Val. Loss: 1.395 |  Val. Accuracy: 23.212
	 Test. Loss: 1.393 |  Val. Accuracy: 24.346
Epoch: 05 | Time: 0m 4s
	Train Loss: 1.384 | Train Accuracy: 26.630
	 Val. Loss: 1.389 |  Val. Accuracy: 23.310
	 Test. Loss: 1.387 |  Val. Accuracy: 24.443
Epoch: 06 | Time: 0m 4s
	Train Loss: 1.383 | Train Accuracy: 26.340
	 Val. Loss: 1.391 |  Val. Accuracy: 24.564
	 Test. Loss: 1.386 |  Val. Accuracy: 26.121
Epoch: 07 | Time: 0m 4s
	Train Loss: 1.383 | Train Accuracy: 26.096
	 Val. Loss: 1.389 |  Val. Accuracy: 24.564
	 Test. Loss: 1.387 |  Val. Accuracy: 26.121
Epoch: 08 | Time: 0m 4s
	Train Loss: 1.382 | Train Accurac