Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about evidence embedding file #11

Open
ISSCA-ZED opened this issue Sep 19, 2023 · 1 comment
Open

question about evidence embedding file #11

ISSCA-ZED opened this issue Sep 19, 2023 · 1 comment

Comments

@ISSCA-ZED
Copy link

the precomputed evidence embedding file is only 19GB if I download it by Google,and then I have a error message

Unpickling BlockData: /disk2/qby/Desktop/emdr2-main/embedding-path/emdr2-finetuning-embedding/psgs_w100-retriever-nq-emdr2-finetuning-base-topk50-epochs10-bsize64-async-indexer.pkl
Traceback (most recent call last):
File "tasks/run.py", line 67, in
main()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 72, in main
open_retrieval_generative_qa(dataset_cls)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 60, in open_retrieval_generative_qa
end_of_training_callback_provider=distributed_metrics_func_provider)
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/train_e2eqa.py", line 583, in train
model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 134, in setup_model_and_optimizer
model = get_model(model_provider_func)
File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 43, in get_model
model = model_provider_func()
File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 36, in model_provider
evidence_retriever = PreComputedEvidenceDocsRetriever()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 387, in init
self.precomputed_index_wrapper()
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 417, in precomputed_index_wrapper
self.get_evidence_embedding(args.embedding_path)
File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 412, in get_evidence_embedding
load_from_path=True)
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 28, in init
self.load_from_file()
File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 50, in load_from_file
state_dict = pickle.load(open(self.embedding_path, 'rb'))
_pickle.UnpicklingError: pickle data was truncated

@DevSinghSachan
Copy link
Owner

Can you try to use the dropbox link to download? The actual size would be ~32 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants