Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds reranker example #58

Merged
merged 2 commits into from
Jul 11, 2020
Merged

Adds reranker example #58

merged 2 commits into from
Jul 11, 2020

Conversation

rodrigonogueira4
Copy link
Member

No description provided.

Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give a slightly more meaningful example to show that it indeed works? And should we move this to docs/

@rodrigonogueira4
Copy link
Member Author

Regarding exa

Can we give a slightly more meaningful example to show that it indeed works? And should we move this to docs/

Regarding a meaningful example, sure!
Regarding moving to docs/, I think this code needs to be in the main README.md as they can find it more easily.

Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rodrigonogueira4 rodrigonogueira4 merged commit 4b8d67b into master Jul 11, 2020
@ronakice ronakice deleted the rodrigonogueira4-patch-2 branch July 11, 2020 17:04
@Fatima-200159617
Copy link

I am trying to use monobert instead of T5. Here is the code

import torch
from transformers import PreTrainedModel,PreTrainedTokenizer,AutoTokenizer,AutoModel
from pygaggle.rerank.base import Query, Text
from pygaggle.rerank.transformer import SequenceClassificationTransformerReranker
model_name = 'castorini/monobert-large-msmarco'
tokenizer_name = 'castorini/monobert-large-msmarco'
batch_size = 8
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AutoModel.from_pretrained("castorini/monobert-large-msmarco")
model = model.to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
reranker = SequenceClassificationTransformerReranker(model, tokenizer)
query = Query('how old are you?')
doc1 = Text('I am 77 years old')
doc2 = Text('I am hungry')
documents = [doc1,doc2]
scores = [result.score for result in reranker.rerank(query, documents)]
print(scores)
Could you please advice on what is the right tokenizer to use as I am having an issue running it.

@rodrigonogueira4
Copy link
Member Author

Hi Fatima,

Could you try tokenizer_name = 'bert-large-uncased'?

@Fatima-200159617
Copy link

Fatima-200159617 commented Jul 11, 2020

I tried I am getting the same error shown below:
ValueError Traceback (most recent call last)

in ()
15 doc2 = Text('I am hungry')
16 documents = [doc1,doc2]
---> 17 scores = [result.score for result in reranker.rerank(query, documents)]
18 print(scores)
19 print(sorted(scores,reverse=True))

1 frames

/content/gdrive/My Drive/Reranking-pygaggle/pygaggle/pygaggle/rerank/transformer.py in rerank(self, query, texts)
117 input_ids = ret['input_ids'].to(self.device)
118 tt_ids = ret['token_type_ids'].to(self.device)
--> 119 output, = self.model(input_ids, token_type_ids=tt_ids)
120 if output.size(1) > 1:
121 text.score = torch.nn.functional.log_softmax(

ValueError: too many values to unpack (expected 1)

Not sure if the tokenizer is the issue.

@Fatima-200159617
Copy link

The below code worked for me
#monoBert reranker
import torch
from transformers import BertTokenizer,BertForSequenceClassification
from pygaggle.rerank.base import Query, Text
from pygaggle.rerank.transformer import SequenceClassificationTransformerReranker
model_name = 'castorini/monobert-large-msmarco'
tokenizer_name = 'bert-large-uncased'
batch_size = 8
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertForSequenceClassification.from_pretrained("castorini/monobert-large-msmarco")
model = model.to(device).eval()
tokenizer = BertTokenizer.from_pretrained(tokenizer_name)
reranker = SequenceClassificationTransformerReranker(model, tokenizer)
query = Query('how old are you?')
doc1 = Text('I am 77 years old')
doc2 = Text('I am hungry')
doc3=Text('My age is 77')
doc4=Text('I want to sleep early')
documents = [doc1,doc2,doc3,doc4]
scores = [result.score for result in reranker.rerank(query, documents)]
print(scores)

@rodrigonogueira4
Copy link
Member Author

Great, thanks, Fatima!

I've created a pull request that exemplifies how to use the BERT reranker: #59

@Fatima-200159617
Copy link

Great, thanks, Fatima!

I've created a pull request that exemplifies how to use the BERT reranker: #59

Thanks a lot for the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants