Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add belebele #885

Merged
merged 4 commits into from
Oct 12, 2023
Merged

add belebele #885

merged 4 commits into from
Oct 12, 2023

Conversation

ManuelFay
Copy link
Contributor

@juletx
Copy link
Contributor

juletx commented Sep 30, 2023

Thank for adding this dataset! The prompt is a bit different from the paper, they use A: instead of A. And they use no description for five-shot evaluation. They also include one space on both sides of \n, I don't know why. Maybe it is only to make the prompt more readable in the paper. We should run some five-shot experiments with LLama to validate that we get the same results. This is the prompt with the changes I mentioned:

P: {{flores_passage}} \n Q: {{question.strip()}} \n A: {{mc_answer1}} \n B: {{mc_answer2}} \n C: {{mc_answer3}} \n D: {{mc_answer4}} \n Answer:

@lintangsutawika
Copy link
Contributor

@ManuelFay I'm still seeing description in the yamls, do you think this should still be kept?

@ManuelFay
Copy link
Contributor Author

Ok I'm removing the description, in the paper, from my understanding there was a 0-shot scenario with a description and a 5-shot without. Since the exact content of the description was not stated explicitly, let's just do the 5 shot I guess.

@ManuelFay
Copy link
Contributor Author

I changed the . to : in the prompt and removed the description. As for the spaces between the \n, I left them, I really feel it's for paper readibility cause it makes no sense to me to add useless whitespaces otherwise.

@ManuelFay
Copy link
Contributor Author

ManuelFay commented Oct 5, 2023

Maybe we can tag @satyanshukla, the author for his input !
[edit] Opened an issue requesting author review. facebookresearch/belebele#7

@lucasbandarkar
Copy link

lucasbandarkar commented Oct 6, 2023

Maybe we can tag @satyanshukla, the author for his input !
[edit] Opened an issue requesting author review. facebookresearch/belebele#7

Hey one of the authors of the paper here, happy to help ! Though it's unclear to me what the question is, is it just about : or . ?

If so, I know we were trying out different prompts and the one reported is the one that worked best though idk if we tried using . instead.

@ManuelFay
Copy link
Contributor Author

Hey Lucas ! Yes it's essentially for:

  • Are there spaces between \n like in the paper or was it just for readability but the real prompt has no added whitespaces?
  • Is there a description of the task in 0-shot settings ? If so, which is it ? I did not find it in the paper ?
  • Lastly, just nice to get your approval for the task implementation so we can get an "author approved" benchmark implementation, makes it more official !

Thanks again for everything

@lucasbandarkar
Copy link

lucasbandarkar commented Oct 6, 2023 via email

@haileyschoelkopf
Copy link
Contributor

haileyschoelkopf commented Oct 6, 2023

  1. Just to clarify: am I approving EleutherAI the authority to use my
    dataset or am I approving that the implementation is what I want ?

This would simply be agreeing that the implementation matches what you used in your paper!

@juletx
Copy link
Contributor

juletx commented Oct 6, 2023

To clarify 1, the doubt is not if \n was used in the prompt. Our doubt is if there is any whitespace between \n and text.

@lucasbandarkar
Copy link

lucasbandarkar commented Oct 6, 2023 via email

@lucasbandarkar
Copy link

cc-ing @davisliang
So the format did change slightly across the models because some things worked better for some than others, but it was all just punctuation/minutia. The f-string generally looked like this:

f"{instruction}\n###\nPassage:\n{passage}\n###\nQuery:\n{query}\n###\nChoices:\n(A) {A}\n(B) {B}\n(C) {C}\n(D) {D}\n###\nAnswer:\n"

Example:

Given the following passage, query, and answer choices, output the letter corresponding to the correct answer.
###
Passage:
Though many of the animals in the park are used to seeing humans, the wildlife is nonetheless wild and should not be fed or disturbed. According to park authorities, stay at least 100 yards/meters away from bears and wolves and 25 yards/meters from all other wild animals! No matter how docile they may look, bison, elk, moose, bears, and nearly all large animals can attack. Each year, dozens of visitors are injured because they didn't keep a proper distance. These animals are large, wild, and potentially dangerous, so give them their space. In addition, be aware that odors attract bears and other wildlife, so avoid carrying or cooking odorous foods and keep a clean camp.
###
Query:
Which of the following is not mentioned in the passage as a possible cause of wildlife attacks?
###
Choices:
(A) Strong smells
(B) Failure to maintain distance
(C) Feeding the wildlife
(D) Animals that are unfamiliar with humans
###
Answer:

Proccessing the outputs:
Our response processing looked something like this, where we accepted 'A', '(A)', and some other closely related variants.

correct = 0
for item in data[language]:
    qid = item['qid']
    
    answer = answers[language][qid].replace('(','').replace(')','')
    if answer not in ['A','B','C','D']:
        print("###############################")
        print("FAILED: ", answer)
        print("ACTUAL: ", item['answer'])
        answer = answer[0]  
    
    if item['answer'] == answer:
        correct += 1
        
print(correct/len(data[language]))

@juletx
Copy link
Contributor

juletx commented Oct 10, 2023

Thank you! If I understand correctly, that is the prompt for instruction/chat models, right? The prompt used for 5-shot in-context learning is the one you mention in the paper (removing the extra spaces between \n and text).

@lucasbandarkar
Copy link

lucasbandarkar commented Oct 10, 2023 via email

@ManuelFay
Copy link
Contributor Author

Okay then, I guess we are good for the 5-shot one (the one adapted for the lm eval harness), let's merge ?

@lintangsutawika lintangsutawika merged commit 17095c8 into EleutherAI:big-refactor Oct 12, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants