Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test PTB Dependency Parsing Model #47

Open
woshiyyya opened this issue Sep 7, 2022 · 9 comments
Open

Test PTB Dependency Parsing Model #47

woshiyyya opened this issue Sep 7, 2022 · 9 comments

Comments

@woshiyyya
Copy link

Hi there!

I am trying to test with your pretrained dependency parsing model. However, I cannot find your processed PTB dataset. Can you share it with a link?

Also, I am wondering how to inference with my own data. For example, how can I feed one sentence and get its tagging result?

@wangxinyu0922
Copy link
Member

wangxinyu0922 commented Sep 7, 2022

I have just uploaded the ptb dataset on onedrive.

For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:

1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root

@woshiyyya
Copy link
Author

Hi Xinyu,

Thanks for uploading the data!

I created a folder named data and put a train.tsv file with the demo case you provide.

Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order

But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

@wangxinyu0922
Copy link
Member

Have you checked whether the datasets is at the correct place?

@lizhou21
Copy link

lizhou21 commented Sep 27, 2022

I have just uploaded the ptb dataset on onedrive.

For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:

1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root

Hi Xinyu,
Is there something wrong with the data format provided?
i just find, the code token = Token(fields[1], head_id=int(fields[6])) shows me ValueError: invalid literal for int() with base 10: '_'.

So I guess the 0-th column is token id,
the 1-th column is token,
the 2,3,4,5-th column is "",
the 6-th column is 0, (dummy tags)
the 7-th column is "
",
the 8-th column is "root", (dummy tags)
the 9-th column is "0:root", (dummy tags)

is that right?

@lizhou21
Copy link

Hi Xinyu,

Thanks for uploading the data!

I created a folder named data and put a train.tsv file with the demo case you provide.

Run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order

But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

after I change the data format, I also face the same problem.
have you resolved it?

@wangxinyu0922
Copy link
Member

Hi Xinyu,
Thanks for uploading the data!
I created a folder named data and put a train.tsv file with the demo case you provide.
Run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:

2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
  File "train.py", line 85, in <module>
    config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
    self.corpus: ListCorpus=self.get_corpus
  File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
    current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
    train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
  File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
    assert path_to_conll_file.exists()
AssertionError

Do you know how to fix that?

after I change the data format, I also face the same problem. have you resolved it?

Have you ensured the path /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu exist? If not, you may download the data above and put them at this path.

@lizhou21
Copy link

lizhou21 commented Sep 30, 2022

yes! I have done it! and I solve this problem, it also needs to have dev/test datasets in the target_dir.
But now I can parse the dataset with CPU(very slow), fail to run it with GPU set.

It shows me :

Traceback (most recent call last):
File "train.py", line 378, in
train_eval_result, train_loss = student.evaluate(loader,out_path=Path('outputs/train.'+'.'+tar_file_name+'.conllu'),embeddings_storage_mode="none",prediction_mode=True)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 1174, in evaluate
arc_scores, rel_scores = self.forward(batch, prediction_mode=prediction_mode)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 597, in forward
self.embeddings.embed(sentences,embedding_mask=self.selection)
File "/DM_parser/ACE/flair/embeddings.py", line 185, in embed
embedding.embed(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 97, in embed
self._add_embeddings_internal(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 2960, in _add_embeddings_internal
self._add_embeddings_to_sentences(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 3155, in _add_embeddings_to_sentences
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 753, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 68, in forward
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 178, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/functional.py", line 1484, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

I try to set
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)

into

sequence_output, pooled_output, hidden_states = self.model(input_ids.cuda(), attention_mask=mask.cuda(), inputs_embeds = inputs_embeds)

it also shows me the same question.

T T,

@wangxinyu0922
Copy link
Member

wangxinyu0922 commented Oct 3, 2022

You may try to uncomment these lines

ACE/train.py

Lines 226 to 238 in 7033e91

# if student.selection[idx] == 1:
# embedding.to(flair.device)
# if 'elmo' in embedding.name:
# # embedding.reset_elmo()
# # continue
# # pdb.set_trace()
# embedding.ee.elmo_bilm.cuda(device=embedding.ee.cuda_device)
# states=[x.to(flair.device) for x in embedding.ee.elmo_bilm._elmo_lstm._states]
# embedding.ee.elmo_bilm._elmo_lstm._states = states
# for idx in range(len(embedding.ee.elmo_bilm._elmo_lstm._states)):
# embedding.ee.elmo_bilm._elmo_lstm._states[idx]=embedding.ee.elmo_bilm._elmo_lstm._states[idx].to(flair.device)
# else:
embedding.to('cpu')

@lizhou21
Copy link

You may try to uncomment these lines

ACE/train.py

Lines 226 to 238 in 7033e91

# if student.selection[idx] == 1:
# embedding.to(flair.device)
# if 'elmo' in embedding.name:
# # embedding.reset_elmo()
# # continue
# # pdb.set_trace()
# embedding.ee.elmo_bilm.cuda(device=embedding.ee.cuda_device)
# states=[x.to(flair.device) for x in embedding.ee.elmo_bilm._elmo_lstm._states]
# embedding.ee.elmo_bilm._elmo_lstm._states = states
# for idx in range(len(embedding.ee.elmo_bilm._elmo_lstm._states)):
# embedding.ee.elmo_bilm._elmo_lstm._states[idx]=embedding.ee.elmo_bilm._elmo_lstm._states[idx].to(flair.device)
# else:
embedding.to('cpu')

hi Xinyu, I have resolved the problem, and applied ACE to my data parsing successfully, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants