Adds reranker example #58

rodrigonogueira4 · 2020-07-11T12:30:08Z

No description provided.

ronakice

Can we give a slightly more meaningful example to show that it indeed works? And should we move this to docs/

rodrigonogueira4 · 2020-07-11T15:06:29Z

Regarding exa

Can we give a slightly more meaningful example to show that it indeed works? And should we move this to docs/

Regarding a meaningful example, sure!
Regarding moving to docs/, I think this code needs to be in the main README.md as they can find it more easily.

ronakice

LGTM!

Fatima-200159617 · 2020-07-11T17:07:09Z

I am trying to use monobert instead of T5. Here is the code

import torch
from transformers import PreTrainedModel,PreTrainedTokenizer,AutoTokenizer,AutoModel
from pygaggle.rerank.base import Query, Text
from pygaggle.rerank.transformer import SequenceClassificationTransformerReranker
model_name = 'castorini/monobert-large-msmarco'
tokenizer_name = 'castorini/monobert-large-msmarco'
batch_size = 8
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AutoModel.from_pretrained("castorini/monobert-large-msmarco")
model = model.to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
reranker = SequenceClassificationTransformerReranker(model, tokenizer)
query = Query('how old are you?')
doc1 = Text('I am 77 years old')
doc2 = Text('I am hungry')
documents = [doc1,doc2]
scores = [result.score for result in reranker.rerank(query, documents)]
print(scores)
Could you please advice on what is the right tokenizer to use as I am having an issue running it.

rodrigonogueira4 · 2020-07-11T20:00:14Z

Hi Fatima,

Could you try tokenizer_name = 'bert-large-uncased'?

Fatima-200159617 · 2020-07-11T20:15:42Z

I tried I am getting the same error shown below:
ValueError Traceback (most recent call last)

in ()
15 doc2 = Text('I am hungry')
16 documents = [doc1,doc2]
---> 17 scores = [result.score for result in reranker.rerank(query, documents)]
18 print(scores)
19 print(sorted(scores,reverse=True))

1 frames

/content/gdrive/My Drive/Reranking-pygaggle/pygaggle/pygaggle/rerank/transformer.py in rerank(self, query, texts)
117 input_ids = ret['input_ids'].to(self.device)
118 tt_ids = ret['token_type_ids'].to(self.device)
--> 119 output, = self.model(input_ids, token_type_ids=tt_ids)
120 if output.size(1) > 1:
121 text.score = torch.nn.functional.log_softmax(

ValueError: too many values to unpack (expected 1)

Not sure if the tokenizer is the issue.

Fatima-200159617 · 2020-07-14T08:16:16Z

The below code worked for me
#monoBert reranker
import torch
from transformers import BertTokenizer,BertForSequenceClassification
from pygaggle.rerank.base import Query, Text
from pygaggle.rerank.transformer import SequenceClassificationTransformerReranker
model_name = 'castorini/monobert-large-msmarco'
tokenizer_name = 'bert-large-uncased'
batch_size = 8
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertForSequenceClassification.from_pretrained("castorini/monobert-large-msmarco")
model = model.to(device).eval()
tokenizer = BertTokenizer.from_pretrained(tokenizer_name)
reranker = SequenceClassificationTransformerReranker(model, tokenizer)
query = Query('how old are you?')
doc1 = Text('I am 77 years old')
doc2 = Text('I am hungry')
doc3=Text('My age is 77')
doc4=Text('I want to sleep early')
documents = [doc1,doc2,doc3,doc4]
scores = [result.score for result in reranker.rerank(query, documents)]
print(scores)

rodrigonogueira4 · 2020-07-14T13:59:32Z

Great, thanks, Fatima!

I've created a pull request that exemplifies how to use the BERT reranker: #59

Fatima-200159617 · 2020-07-14T14:13:48Z

Great, thanks, Fatima!

I've created a pull request that exemplifies how to use the BERT reranker: #59

Thanks a lot for the code.

Adds reranker example

02742d9

rodrigonogueira4 requested review from ronakice and nikhilro July 11, 2020 12:30

rodrigonogueira4 mentioned this pull request Jul 11, 2020

Pygaggle for reranking msmacro passages without ground-truth labels #57

Closed

ronakice requested changes Jul 11, 2020

View reviewed changes

Uses real example

f85c751

ronakice approved these changes Jul 11, 2020

View reviewed changes

rodrigonogueira4 merged commit 4b8d67b into master Jul 11, 2020

ronakice deleted the rodrigonogueira4-patch-2 branch July 11, 2020 17:04

Narabzad mentioned this pull request Aug 14, 2020

seems like no reranking happens #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds reranker example #58

Adds reranker example #58

rodrigonogueira4 commented Jul 11, 2020

ronakice left a comment

rodrigonogueira4 commented Jul 11, 2020

ronakice left a comment

Fatima-200159617 commented Jul 11, 2020

rodrigonogueira4 commented Jul 11, 2020

Fatima-200159617 commented Jul 11, 2020 •

edited

Loading

Fatima-200159617 commented Jul 14, 2020

rodrigonogueira4 commented Jul 14, 2020

Fatima-200159617 commented Jul 14, 2020

Adds reranker example #58

Adds reranker example #58

Conversation

rodrigonogueira4 commented Jul 11, 2020

ronakice left a comment

Choose a reason for hiding this comment

rodrigonogueira4 commented Jul 11, 2020

ronakice left a comment

Choose a reason for hiding this comment

Fatima-200159617 commented Jul 11, 2020

rodrigonogueira4 commented Jul 11, 2020

Fatima-200159617 commented Jul 11, 2020 • edited Loading

Fatima-200159617 commented Jul 14, 2020

rodrigonogueira4 commented Jul 14, 2020

Fatima-200159617 commented Jul 14, 2020

Fatima-200159617 commented Jul 11, 2020 •

edited

Loading