Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not generate training data #2

Open
kristopher283 opened this issue May 18, 2022 · 3 comments
Open

Could not generate training data #2

kristopher283 opened this issue May 18, 2022 · 3 comments

Comments

@kristopher283
Copy link

kristopher283 commented May 18, 2022

Hi @gaetangate
I'm at the final step to generate training files .jsonl
When I run this command:
export CLASSPATH=jar/dprBM25.jar:./anserini/target/anserini-0.4.1-SNAPSHOT-fatjar.jar
java com.ibm.research.ai.pretraining.retrieval.DPRTrainingData -passageIndex anserini_passage_index -positivePidData zsRE_train_positive_pids.jsonl -trainingData zsRE_dpr_training_data.jsonl

I face this error:
Error: Could not find or load main class com.ibm.research.ai.pretraining.retrieval.DPRTrainingData Caused by: java.lang.ClassNotFoundException: com.ibm.research.ai.pretraining.retrieval.DPRTrainingData
Do you have any idea about `com.ibm.research.ai.pretraining.retrieval.DPRTrainingData?

@michaelrglass
Copy link
Member

Can you confirm that jar/dprBM25.jar is the path to the jar from this repo? Maybe try with absolute path to be sure.

@kristopher283
Copy link
Author

Oh nice, thank you very much, @michaelrglass . Now I can run the command.
Btw, can I confirm that is this log correct behaviour?

skipping 17058091::[0,7] in positives
skipping 307157::[4,5] in positives
skipping since we found an answer: ' least concern '
skipping 12426884::[15,17] in positives
skipping 20792463::[11,13] in positives
skipping since we found an answer: ' least concern '
skipping 1475129::[0,5] in positives
skipping 12533644::[0,3] in positives
skipping 6956383::[2,4] in positives
skipping since we found an answer: ' vulnerable '
skipping since we found an answer: ' critically endangered '
skipping 12621756::[0,1] in positives
skipping since we found an answer: ' vulnerable '
skipping 38129955::[5,7] in positives
skipping 12505161::[0,10] in positives (Last warning)
skipping since we found an answer: ' critically endangered '
skipping since we found an answer: ' critically endangered '
skipping since we found an answer: ' critically endangered '
skipping since we found an answer: ' vulnerable '
skipping since we found an answer: ' critically endangered ' (Last warning)
On instance 2560
On instance 3840
On instance 5120
On instance 6400
On instance 9728
On instance 16640
On instance 19712
On instance 20992
On instance 22272
On instance 22784
On instance 25856
On instance 29440
On instance 31744
On instance 34048
On instance 36096
On instance 38400
On instance 41472
On instance 44288
On instance 45824
On instance 46592
On instance 49664
On instance 52736
On instance 55552
On instance 56832
On instance 59904
On instance 61440
On instance 62976
On instance 65024
On instance 67072
On instance 68096
On instance 69120

On instance 69632
On instance 70912
On instance 72192
On instance 72960
On instance 74240
On instance 77312
On instance 82688
On instance 84480
On instance 86016
On instance 91648
On instance 93184
Skipped 453 instances for lack of hard negatives

@michaelrglass
Copy link
Member

michaelrglass commented May 19, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants