Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorials 09_dpr_training - Training issue #317

Open
choi-yongsuk opened this issue Apr 26, 2024 · 1 comment
Open

tutorials 09_dpr_training - Training issue #317

choi-yongsuk opened this issue Apr 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@choi-yongsuk
Copy link

Describe the issue
The following error occurred while running "tutorials 09_dpr_training" in Google Colab.

To Reproduce
https://haystack.deepset.ai/tutorials/09_dpr_training
https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/09_DPR_training.ipynb

  1. Installing Haystack - ok
  2. Enabling Telemetry - ok
  3. Logging - ok
  4. Download Original DPR Training Data - ok
  5. Option 1: Training DPR from Scratch - ok
  6. Initialization - ok
  7. Training - error
# Start training our model and save it when it is finished

retriever.train(
    data_dir=doc_dir,
    train_filename=train_filename,
    dev_filename=dev_filename,
    test_filename=dev_filename,
    n_epochs=1,
    batch_size=16,
    grad_acc_steps=8,
    save_dir=save_dir,
    evaluate_every=3000,
    embed_title=True,
    num_positives=1,
    num_hard_negatives=1,
)

INFO:haystack.modeling.data_handler.data_silo:
Loading data into the data silo ... 
              ______
               |o  |   !
   __          |:`_|---'-.
  |__|______.-/ _ \-----.|
 (o)(o)------'\ _ /     ( )
 
INFO:haystack.modeling.data_handler.data_silo:LOADING TRAIN DATA
INFO:haystack.modeling.data_handler.data_silo:==================
INFO:haystack.modeling.data_handler.data_silo:Loading train set from: data/tutorial9/train/biencoder-nq-train.json 
Preprocessing dataset: 100%|██████████| 115/115 [01:28<00:00,  1.30 Dicts/s]
INFO:haystack.modeling.data_handler.data_silo:
INFO:haystack.modeling.data_handler.data_silo:LOADING DEV DATA
INFO:haystack.modeling.data_handler.data_silo:=================
INFO:haystack.modeling.data_handler.data_silo:Loading dev set from: data/tutorial9/dev/biencoder-nq-dev.json
Preprocessing dataset: 100%|██████████| 13/13 [00:09<00:00,  1.36 Dicts/s]
INFO:haystack.modeling.data_handler.data_silo:
INFO:haystack.modeling.data_handler.data_silo:LOADING TEST DATA
INFO:haystack.modeling.data_handler.data_silo:=================
INFO:haystack.modeling.data_handler.data_silo:Loading test set from: data/tutorial9/dev/biencoder-nq-dev.json
Preprocessing dataset: 100%|██████████| 13/13 [00:09<00:00,  1.43 Dicts/s]
INFO:haystack.modeling.data_handler.data_silo:
INFO:haystack.modeling.data_handler.data_silo:DATASETS SUMMARY
INFO:haystack.modeling.data_handler.data_silo:================
INFO:haystack.modeling.data_handler.data_silo:Examples in train: 58880
INFO:haystack.modeling.data_handler.data_silo:Examples in dev  : 6515
INFO:haystack.modeling.data_handler.data_silo:Examples in test : 6515
INFO:haystack.modeling.data_handler.data_silo:Total examples   : 71910
INFO:haystack.modeling.data_handler.data_silo:
INFO:haystack.modeling.data_handler.data_silo:Longest query length observed after clipping: 31   - for max_query_len: 64
INFO:haystack.modeling.data_handler.data_silo:Average query length after clipping:          11.807353940217391
INFO:haystack.modeling.data_handler.data_silo:Proportion queries clipped:                   0.0
INFO:haystack.modeling.data_handler.data_silo:
INFO:haystack.modeling.data_handler.data_silo:Longest passage length observed after clipping: 223.0   - for max_passage_len: 256
INFO:haystack.modeling.data_handler.data_silo:Average passage length after clipping:          139.19262907608694
INFO:haystack.modeling.data_handler.data_silo:Proportion passages clipped:                    0.0
INFO:haystack.modeling.model.optimization:Loading optimizer 'AdamW': {'correct_bias': True, 'weight_decay': 0.0, 'eps': 1e-08, 'lr': 1e-05}
INFO:haystack.modeling.model.optimization:Using scheduler 'get_linear_schedule_with_warmup'
INFO:haystack.modeling.model.optimization:Loading schedule 'get_linear_schedule_with_warmup': '{'num_warmup_steps': 100, 'num_training_steps': 460}'
INFO:haystack.modeling.training.base:No train checkpoints found. Starting a new training ...
Train epoch 0/0 (Cur. train loss: 0.0000):   0%|          | 0/3680 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-7-1d070a39096a>](https://localhost:8080/#) in <cell line: 3>()
      1 # Start training our model and save it when it is finished
      2 
----> 3 retriever.train(
      4     data_dir=doc_dir,
      5     train_filename=train_filename,

7 frames
[/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py](https://localhost:8080/#) in _get_default_group()
    975     """Get the default process group created by init_process_group."""
    976     if not is_initialized():
--> 977         raise ValueError(
    978             "Default process group has not been initialized, "
    979             "please make sure to call init_process_group."

ValueError: Default process group has not been initialized, please make sure to call init_process_group.

Expected behavior
Data loading and preprocessing were completed, but a problem occurred in initializing the process group, and “ValueError: Default process group has not been initialized, please make sure to call init_process_group.” It stops after an error occurs.

What environment did you try to run the tutorial on?:

  • OS: google colab
  • Browser [chrome]
  • Haystack Version [v1.25.5]
    and
  • torch [v2.2.1+cu121]
  • transformers [v4.39.3]
  • huggingface_hub [v0.20.3]
@choi-yongsuk choi-yongsuk added the bug Something isn't working label Apr 26, 2024
@jinruiyang
Copy link

same here! did you find way to fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants