# Fine tune Haystack FarmReader for Extractive Question Answering:

## Install dependent packages

In [24]:
! pip install -qq faiss-gpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [26]:
! pip install -qq 'farm-haystack[faiss]'

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Necessary imports

In [1]:
import pandas as pd
import re
import json
import os
from haystack.nodes import FARMReader, EmbeddingRetriever
from haystack import Pipeline
from haystack.document_stores import FAISSDocumentStore
from haystack import Document


## Load the stack overflow dataset . 
`Dataset can be downloaded from - https://github.com/chauhang/pt-experiments/blob/pytorch-qa/pytorch-qa/pt_question_answers.csv `

In [2]:
df = pd.read_csv("pt_question_answers.csv")
df

Unnamed: 0,pt_post_id,pt_post_type_id,pt_accepted_answer_id,pt_creation_date,pt_score,pt_title,pt_body,pt_tags,pt_parent_id,context,pt_answer,question
0,34750268,1,34762233.0,2016-01-12T17:36:25.473,9,Extracting the top-k value-indices from a 1-D Tensor,"<p>Given a 1-D tensor in Torch (<code>torch.Tensor</code>), containing value...",<python><lua><pytorch><torch>,,<p>Just loop through the tensor and run your compare:</p>\n\n<pre><code>requ...,"<p>As of pull request <a href=""https://github.com/torch/torch7/pull/496"" rel...",Extracting the top-k value-indices from a 1-D Tensor <p>Given a 1-D tensor i...
1,38543850,1,38676842.0,2016-07-23T16:15:43.967,40,How to Display Custom Images in Tensorboard (e.g. Matplotlib Plots)?,"<p>The <a href=""https://github.com/tensorflow/tensorflow/blob/master/tensorf...",<python><tensorflow><matplotlib><pytorch><tensorboard>,,"<p>It is quite easy to do if you have the image in a memory buffer. Below, I...","<p>It is quite easy to do if you have the image in a memory buffer. Below, I...",How to Display Custom Images in Tensorboard (e.g. Matplotlib Plots)? <p>The ...
2,41767005,1,43824857.0,2017-01-20T15:22:08.063,11,Python wheels: cp27mu not supported,"<p>I'm trying to install pytorch (<a href=""http://pytorch.org/"" rel=""norefer...",<python><linux><unicode><pytorch>,,<p>This is exactly that. \nRecompile python under slack with --enable-unicod...,<p>This is exactly that. \nRecompile python under slack with --enable-unicod...,Python wheels: cp27mu not supported <p>I'm trying to install pytorch (<a hre...
3,41861354,1,54261158.0,2017-01-25T20:45:35.297,8,Loading Torch7 trained models (.t7) in PyTorch,"<p>I am using Torch7 library for implementing neural networks. Mostly, I re...",<python><lua><pytorch><torch><pre-trained-model>,,<p>The correct function is <code>load_lua</code>:</p>\n\n<pre><code>from tor...,<p>As of PyTorch 1.0 <code>torch.utils.serialization</code> is completely re...,Loading Torch7 trained models (.t7) in PyTorch <p>I am using Torch7 library ...
4,41924453,1,42054194.0,2017-01-29T18:31:24.687,65,PyTorch: How to use DataLoaders for custom Datasets,<p>How to make use of the <code>torch.utils.data.Dataset</code> and <code>to...,<python><torch><pytorch>,,"<p>Yes, that is possible. Just create the objects by yourself, e.g.</p>\n\n<...","<p>Yes, that is possible. Just create the objects by yourself, e.g.</p>\n\n<...",PyTorch: How to use DataLoaders for custom Datasets <p>How to make use of th...
...,...,...,...,...,...,...,...,...,...,...,...,...
10758,74612146,1,,2022-11-29T09:54:30.430,0,Is it possible to perform quantization on densenet169 and how?,<p>I have been trying to performing quantization on a densenet model without...,<machine-learning><pytorch><artificial-intelligence><densenet><static-quanti...,,"<p>Here's how to do this on DenseNet169 from torchvision:</p>\n<pre class=""l...","<p>Here's how to do this on DenseNet169 from torchvision:</p>\n<pre class=""l...",
10759,74637151,1,,2022-12-01T05:08:37.150,1,"Why when the batch size increased, the epoch time will also increasing?",<p>Epoch time means the time required to train for an epoch.</p>\n<p>From my...,<deep-learning><pytorch>,,"<p>As you already noticed, there are many factors that may affect epoch-time...","<p>As you already noticed, there are many factors that may affect epoch-time...",
10760,74642594,1,,2022-12-01T13:23:27.277,0,Why does StableDiffusionPipeline return black images when generating multipl...,"<p>I am using the <a href=""https://github.com/huggingface/diffusers/tree/mai...",<python><pytorch><apple-m1><huggingface-transformers><stable-diffusion>,,"<p>Apparently it is indeed an Apple Silicon (M1/M2) issue, of which Hugging ...","<p>Apparently it is indeed an Apple Silicon (M1/M2) issue, of which Hugging ...",
10761,74671399,1,,2022-12-03T22:46:46.443,1,Locating tags in a string in PHP (with respect to the string with tags removed),<p>I want to create a function that labels the location of certain HTML tags...,<php><string><pytorch><label><italics>,,<p>I think I've got something. How about this:</p>\n<pre><code>function labe...,<p>I think I've got something. How about this:</p>\n<pre><code>function labe...,


### The raw stack overflow answers contains html tags, lets remove the html tags and convert everything to lower case

In [3]:
## remove html

CLEANR = re.compile('<.*?>') 

def cleanhtml(raw_html):
  cleantext = re.sub(CLEANR, '', raw_html)
  return cleantext

df["context"] = df["context"].apply(lambda x: cleanhtml(x))
df["pt_answer"] = df["pt_answer"].apply(lambda x: cleanhtml(x))

df["context"] = df["context"].str.lower()
df["pt_answer"] = df["pt_answer"].str.lower()

### converting to SQuAD format for fine tuning:

In [4]:
# SQuAD format
# {
#     version: "Version du dataset"
#     data:[
#             {
#                 title: "Titre de l'article Wikipedia"
#                 paragraphs:[
#                     {
#                         context: "Paragraph de l'article"
#                         qas:[
#                             {
#                                 id: "Id du pair question-réponse"
#                                 question: "Question"
#                                 answers:[
#                                     {
#                                         "answer_start": "Position de la réponse"
#                                         "text": "Réponse"
#                                     }
#                                 ],
#                                 is_impossible: (not in v1)
#                             }
#                         ]
#                     }
#                 ]
#             }
#     ]
# }

In [5]:
data_json1 = {
            'version': "Version 1",
            'data': []
}

i = 0
for index, row in df.iterrows():
    
    data_json ={}
    
    data_json['title'] = row['pt_title']
    search_index = row['context'].find(row['pt_answer'])
    data_json['paragraphs'] = [
                             {
                                 'context': row['context'],
                                 'qas':[
                                        {
                                            'id': i,
                                            'question': row['pt_title'],
                                            'answers':[
                                                {
                                                    "answer_start": search_index,
                                                    "text": row['pt_answer']
                                                }
                                            ],
                                            'is_impossible': False
                                        }
                                    ]
                             }
                            ]
    data_json1['data'].append(data_json)
    i += 1


In [6]:
data_json1['data'][0]

{'title': 'Extracting the top-k value-indices from a 1-D Tensor',
 'paragraphs': [{'context': "just loop through the tensor and run your compare:\n\nrequire 'torch'\n\ndata = torch.tensor({1,2,3,4,505,6,7,8,9,10,11,12})\nidx  = 1\nmax  = data[1]\n\nfor i=1,data:size()[1] do\n   if data[i]&gt;max then\n      max=data[i]\n      idx=i\n   end\nend\n\nprint(idx,max)\n\n\n--edit--\nresponding to your edit: use the torch.max operation documented here: https://github.com/torch/torch7/blob/master/doc/maths.md#torchmaxresval-resind-x-dim ...\n\ny, i = torch.max(x, 1) returns the largest element in each column (across rows) of x, and a tensor i of their corresponding indices in x\n\nas of pull request #496 torch now includes a built-in api named torch.topk. example:\n\n&gt; t = torch.tensor{9, 1, 8, 2, 7, 3, 6, 4, 5}\n\n-- obtain the 3 smallest elements\n&gt; res = t:topk(3)\n&gt; print(res)\n 1\n 2\n 3\n[torch.doubletensor of size 3]\n\n-- you can also get the indices in addition\n&gt; res, ind

### Export the squad dataset in json format

In [7]:
if not os.path.exists("data"):
    os.mkdir("data")

In [8]:
with open("data/qa.json", "w") as outfile:
    json.dump(data_json1, outfile)

### Initialize Haystack FARMReader using `deepset/roberta-base-squad2` base model

In [9]:
#mfeb/albert-xxlarge-v2-squad2
#deepset/roberta-large-squad2

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)
data_dir = "data"



### Finetune the reberta model based on the generated squad dataset

In [None]:
reader.train(data_dir=data_dir, train_filename="qa.json", use_gpu=True, n_epochs=1, save_dir="data/qa_model")

### Initialize the document store - FAISS Document store is used for this example

In [10]:
# Initialize FAISS document store.

document_store = FAISSDocumentStore(faiss_index_factory_str="Flat", return_embedding=True)



### Use EmbeddingRetriever - use `sentence-transformers/multi-qa-mpnet-base-dot-v1`  as the base model for generating embeddings

In [None]:
embedding_retriever = EmbeddingRetriever(document_store=document_store,
                              embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1",
                               model_format="sentence_transformers")


### Initialize the FARMReader using the finetuned model - data/qa_model

In [None]:

## loading fine tuned reader
new_reader = FARMReader(model_name_or_path="data/qa_model")


### Create a haystack pipeline with retriever and reader

In [None]:
embedding_querying_pipeline = Pipeline()
embedding_querying_pipeline.add_node(component=embedding_retriever, name="Retriever", inputs=["Query"])
embedding_querying_pipeline.add_node(component=new_reader, name="Reader", inputs=["Retriever"])

## Indexing

### Combine question and answer together into a single

In [11]:
df["text"] = "question: " + df["pt_title"] + "\n" + "answer: " + df["pt_answer"]

df = df[["text"]]
df

Unnamed: 0,text
0,question: Extracting the top-k value-indices from a 1-D Tensor\nanswer: as o...
1,question: How to Display Custom Images in Tensorboard (e.g. Matplotlib Plots...
2,question: Python wheels: cp27mu not supported\nanswer: this is exactly that....
3,question: Loading Torch7 trained models (.t7) in PyTorch\nanswer: as of pyto...
4,"question: PyTorch: How to use DataLoaders for custom Datasets\nanswer: yes, ..."
...,...
10758,question: Is it possible to perform quantization on densenet169 and how?\nan...
10759,"question: Why when the batch size increased, the epoch time will also increa..."
10760,question: Why does StableDiffusionPipeline return black images when generati...
10761,question: Locating tags in a string in PHP (with respect to the string with ...


### Convert the text to haystack document

In [12]:
# Use data to initialize Document objects

texts = list(df["text"].values)
documents = []
for text in texts:
    documents.append(Document(content=text))
    

In [13]:
# Delete existing documents in documents store
document_store.delete_documents()

# Write documents to document store
document_store.write_documents(documents)

# Add documents embeddings to index
document_store.update_embeddings(retriever=embedding_retriever)


Writing Documents:   0%|          | 0/10763 [00:00<?, ?it/s]

Updating Embedding:   0%|          | 0/10763 [00:00<?, ? docs/s]

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Batches:   0%|          | 0/24 [00:00<?, ?it/s]

## Prediction

In [14]:
## predict answer
def get_answer(querying_pipeline, query):
    prediction = querying_pipeline.run(
    query=query,
    params={
        "Retriever": {"top_k": 5},
        "Reader": {"top_k": 1,"debug": True}
    })
    
    return prediction["answers"][0].answer
    


### Load top 10 frequently asked questions and run predictions

In [16]:
top_10_questions = pd.read_csv("top100questions.csv").iloc[:10].question.tolist()

In [None]:

for query in top_10_questions:
    answer = get_answer(embedding_querying_pipeline, query)
    print("Query: ", query)
    print("Answer: ", answer)
    print("\n\n\n")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How do I check if PyTorch is using the GPU?

Answer:  question: How do I check if PyTorch is using the GPU?
answer: these functions should help:
&gt;&gt;&gt; import torch

&gt;&gt;&gt; torch.cuda.is_available()
true

&gt;&gt;&gt; torch.cuda.device_count()
1

&gt;&gt;&gt; torch.cuda.current_device()
0

&gt;&gt;&gt; torch.cuda.device(0)
&lt;torch.cuda.device at 0x7efce0b03be0&gt;

&gt;&gt;&gt; torch.cuda.get_device_name(0)
'geforce gtx 950m'

this tells us:

cuda is available and can be used by one device.
device 0 refers to the gpu geforce gtx 950m, and it is currently chosen by pytorch.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How do I save a trained model in PyTorch?

Answer:  question: how to save a Pytorch model?
answer: to save:
# save the weights of the model to a .pt file
torch.save(model.state_dict(), &quot;your_model_path.pt&quot;)

to load:
# load your model architecture/module
model = yourmodel()
# fill your architecture with the trained weights
model.load_state_dict(torch.load(&quot;your_model_path.pt&quot;))






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query:  How do I save a trained model in PyTorch?

Answer:  question: how to save a Pytorch model?
answer: to save:
# save the weights of the model to a .pt file
torch.save(model.state_dict(), &quot;your_model_path.pt&quot;)

to load:
# load your model architecture/module
model = yourmodel()
# fill your architecture with the trained weights
model.load_state_dict(torch.load(&quot;your_model_path.pt&quot;))






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  What does .view() do in PyTorch?

Answer:  question: What is the difference between x.view(x.size(0), -1) and torch.nn.Flatten() layer and torch.flatten(x)? pytorch question
answer: a view is a way to modify the way you look at your data without modifying the data itself:

torch.view returns a view on the data: the data is not copied, only the &quot;window&quot; which you look through on the data changes
torch.flatten returns a one-dimensional output from a multi-dimensional input. it may not copy the data if


[the] input can be viewed as the flattened shape (source)


torch.nn.flatten is just a wrapper for convenience around torch.flatten

contiugous data just means that the data is linearly adressable in memory, e.g. for two dimension data this would mean that element [i][j] is at position i * num_columns + j. if this is already the case then .contiguous will not change your data or copy anything.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  Why do we need to call zero_grad() in PyTorch?

Answer:  question: Shall I use grad.zero_() in PyTorch with or without gradient tracking?
answer: in your snippet that doesn't really matter. the underscore in the name of zero_() means it is an inplace function, and since w.grad.requires_grad == false we know that there won't be any gradient computation with respect to w.grad happening anyway. the only important thing is that it happens before the loss.backward() call.
i would recommend though to use different names for your loss function and the actuall loss tensor it computes, otherwise you're overwriting one with the other.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How do I print the model summary in PyTorch?

Answer:  question: Print Bert model summary using Pytorch
answer: i used torch-summary module-
pip install torch-summary

summary(model,input_size=(768,),depth=1,batch_dim=1, dtypes=[‘torch.inttensor’])






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How do I initialize weights in PyTorch?

Answer:  question: pytorch initialize two sub-modules with same weights?
answer: i think the most easy way would be to init one of the sub-modules at random, save the state_dict and then load_state_dict from the other module.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  What does model.eval() do in pytorch?

Answer:  question: Training model in eval() mode gives better result in PyTorch?
answer: this seems like the model architecture is simple and when in train mode, is not able to capture the features in the data and hence undergoes underfitting.

eval() disables dropouts and batch normalization, among other modules.

this means that the model trains better without dropout helping the model the learn better with more neurons, also increasing the layer size, increasing the number of layers, decreasing the dropout probability, helps.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  What's the difference between reshape and view in pytorch?

Answer:  question: what does reshape`(1, 1, 28, 28)` mean
answer: a pytorch model mostly requires the first dimension of the input to be the batch size. so the shape of the image is (1, 28, 28). if you want to feed only one image to the model you still have to specify the batch size, which is of course 1 for one image. therefore he adds the batch size dimension to the image by &quot;reshaping&quot; it to (1, 1, 28, 28).






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  What does model.train() do in PyTorch?

Answer:  question: What does model.train() do in PyTorch?
answer: model.train() tells your model that you are training the model. this helps inform layers such as dropout and batchnorm, which are designed to behave differently during training and evaluation. for instance, in training mode, batchnorm updates a moving average on each new batch; whereas, for evaluation mode, these updates are frozen.
more details:
model.train() sets the mode to train
(see source code). you can call either model.eval() or model.train(mode=false) to tell that you are testing.
it is somewhat intuitive to expect train function to train model but it does not do that. it just sets the mode.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  What does .contiguous() do in PyTorch?

Answer:  question: What is the difference between x.view(x.size(0), -1) and torch.nn.Flatten() layer and torch.flatten(x)? pytorch question
answer: a view is a way to modify the way you look at your data without modifying the data itself:

torch.view returns a view on the data: the data is not copied, only the &quot;window&quot; which you look through on the data changes
torch.flatten returns a one-dimensional output from a multi-dimensional input. it may not copy the data if


[the] input can be viewed as the flattened shape (source)


torch.nn.flatten is just a wrapper for convenience around torch.flatten

contiugous data just means that the data is linearly adressable in memory, e.g. for two dimension data this would mean that element [i][j] is at position i * num_columns + j. if this is already the case then .contiguous will not change your data or copy anything.






In [None]:
questions = ['Why is my training so slow?',
             'How should I scale up my Pytorch models?',
             'How do I make my experiment deterministic?',
             'Does PyTorch work on windows 32-bit?']

for query in questions:
    answer = get_answer(embedding_querying_pipeline, query)
    print("Query: ", query)
    print("Answer: ", answer)
    print("\n\n\n")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  Why is my training so slow?
Answer:  question: How to impove the speed of tf.data.experimental.CsvDataset in tensorflow 1.13.1?
answer: you may play with the batch size in the first example, and if it reads batches from file every time you can prove it if you make it 2x bigger, you may expect 2x speed improvement. i haven't played with (experimental) class csvdataset  in tf. 

i am sure pandas reads your document faster and this is part of the reason why you have these times.

probable the next step you should unset the loss function nn.crossentropyloss(). most probable have the regression problem and not the classification problem judging by float labels you have at the end.

so try torch.nn.mseloss as the loss function.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How should I scale up my Pytorch models?
Answer:  question: Reducing batch size in pytorch
answer: the batch size depends on the model.  typically, it's the first dimension of your input tensors.  your model uses different names than i'm used to, some of which are general terms, so i'm not sure of your model topology or usage.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  How do I make my experiment deterministic?
Answer:  question: is it possible make cuda deterministic?
answer: cpu and gpu can't produce the same result even if the seeds are set equal.
refer to this and this.






Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Query:  Does PyTorch work on windows 32-bit?
Answer:  question: Windows installing pytorch 0.3
answer: peterjc123 released the version for windows here: https://anaconda.org/peterjc123/pytorch






In [20]:
get_answer(embedding_querying_pipeline, "how to perform L2 regularization in pytorch")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: L1/L2 regularization in PyTorch\nanswer: see the documentation. add a weight_decay parameter to the optimizer for l2 regularization.'

In [21]:
get_answer(embedding_querying_pipeline, "how to freeze layers whiie finetuning pytorch model")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: require_grad = True in pytorch model despite changing require_grad = false for all parameters\nanswer: you should run the method:\nmodel.requires_grad_(false)\n\nyou probably want to freeze only part of the network though, in your case you should change the fc1 attribute:\nmodel.fc1 = torch.nn.linear(128, num_classes)\n\nwhere num_classes is the number of classes you have (you should at least unfreeze the last linear layer).'

In [23]:
get_answer(embedding_querying_pipeline, "how to use different learning rates while training")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: How to adjust the learning rate after N number of epochs?\nanswer: you could train in two steps,\nfirst, train with desired initial learning rate then create a second optimizer with the final learning rate. it is equivalent.'

In [24]:
get_answer(embedding_querying_pipeline, "how to add embedding layer in pytorch")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: Pytorch: use pretrained vectors to initialize nn.Embedding, but this embedding layer is not updated during the training\nanswer: the torch.nn.embedding.from_pretrained classmethod by default freezes the parameters. if you want to train the parameters, you need to set the freeze keyword argument to false. see the documentation.\nso you might try this instead:\nself.embeds = torch.nn.embedding.from_pretrained(self.vec_weights, freeze=false)'

In [26]:
get_answer(embedding_querying_pipeline, "how to normalize the tensors")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: Unable to Normalize Tensor in PyTorch\nanswer: in order to apply transforms.normalize you have to convert the input to a tensor. for this you can use transforms.totensor.\ninv_normalize = transforms.compose(\n    [\n        transforms.totensor(),\n        transforms.normalize(mean=[-0.5/0.5], std=[1/0.5])\n    ]\n)\n\n\nthis tensor must consist of three dimensions (channels, height, width). currently you have one dimension too much. just remove the extra dimension in your view call:\noutput = model(input).to(device).view(1, 150, 150)'

In [27]:
get_answer(embedding_querying_pipeline, "what are the different optimizers available in pytorch")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: What kinds of optimization are used in PyTorch methods?\nanswer: pytorch uses an efficient blas implementation and multithreading (openmp, if i\'m not wrong) to parallelize such operations with multiple cores. some performance loss comes from the python itself - since this is an interpreted language, no significant compiler-like optimization can be done. you can use the jit module to speed up the "wrapper" code around the matrix multiplies, but for anything more than very small matrices this cost is probably negligible.\n\none big improvement you may be able to get manually, but which pytorch doesn\'t apply automatically, is to properly order the matrix multiplies. as you probably know, depending on matrix shapes, a multiplication abcd may have different performance computed as a(b(cd)) than if computed as (ab)(cd), etc.'

In [29]:
get_answer(embedding_querying_pipeline, "how to add a dropout layer")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'question: Dropout in custom LSTM in pytorch\nanswer: nn.lstm(... dropout=0.3) applies a dropout layer on the outputs of each lstm layer except the last layer. you can have multiple stacked layers by passing parameter num_layers &gt; 1. if you want to add a dropout to the final layer (or if lstm has only one layer), you have to add it as you are doing now.\nif you want to replicate what lstm dropout does (which is only in case of multiple layers), you can stack lstm layers manually and add a dropout layer in between.'