-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError during pseudo labeling #11
Comments
i'm exactly here :) still trying to figure it out
I wonder what happens to that corpus in between being read from file and getting to that point?! |
o well! our mistake is that the corpus.jsonl has the ids as int not strings. The code dataloader expects it to be string so it errors at that Key. Change the corpus.jsonl to have string _ids. |
@ahadda5 Thanks. Yes, even I have _ids as int. Let me change it to string and try again. |
Thanks for both of your attention @ahadda5 @sudhanshu-shukla-git! I will add a type assertion This setting follows the one in the BeIR repo. I think string type is used instead of integers can make the IDs more universal. |
Have added the type hints and assertion: #12 |
Hello! For those who have encountered this issue during dataset generation using pandas, the following data type conversion may be helpful for transforming a column: df = df.astype({'_id': 'string'}) |
Hi ,
I am facing a key error while pseudo labeling. Looks like pos_pid selected is not found in the corpus.
The corpus, I have has the below structure. Does the order of the _id and numbers matter?
Code to train:
Could you help in what I am missing or doing wrong?
The text was updated successfully, but these errors were encountered: