You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2024-01-03 21:14:47 INFO *** initialize network ***
2024-01-03 21:14:47 INFO create new checkpoint
2024-01-03 21:14:47 INFO removed incomplete checkpoint .ckpt
2024-01-03 21:14:47 INFO checkpoint: .ckpt
2024-01-03 21:14:47 INFO - [arg] dataset: dataset/mitre
2024-01-03 21:14:47 INFO - [arg] transformers_model: xlm-roberta-base
2024-01-03 21:14:47 INFO - [arg] random_seed: 1
2024-01-03 21:14:47 INFO - [arg] lr: 5e-06
2024-01-03 21:14:47 INFO - [arg] epochs: 20
2024-01-03 21:14:47 INFO - [arg] warmup_step: 0
2024-01-03 21:14:47 INFO - [arg] weight_decay: 1e-07
2024-01-03 21:14:47 INFO - [arg] batch_size: 32
2024-01-03 21:14:47 INFO - [arg] max_seq_length: 128
2024-01-03 21:14:47 INFO - [arg] fp16: False
2024-01-03 21:14:47 INFO - [arg] max_grad_norm: 1
2024-01-03 21:14:47 INFO - [arg] lower_case: False
2024-01-03 21:14:47 INFO target dataset: ['dataset/mitre']
2024-01-03 21:14:47 INFO data_name: dataset/mitre
2024-01-03 21:14:47 INFO formatting custom dataset from dataset/mitre
2024-01-03 21:14:47 INFO found following files: {'test': 'test.txt', 'train': 'train.txt', 'valid': 'valid.txt'}
2024-01-03 21:14:47 INFO note that files should be named as either valid.txt, test.txt, or train.txt
Traceback (most recent call last):
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\c3.py", line 11, in
model.train()
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\transformers_ner.py", line 52, in train
trainer.train(monitor_validation=True)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 292, in train
self.__setup_model_data(self.args.dataset, self.args.lower_case)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 142, in __setup_model_data
self.dataset_split, self.label_to_id, self.language, self.unseen_entity_set = get_dataset_ner(
^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 153, in get_dataset_ner
data_split_all, label_to_id, language, ues = get_dataset_ner_single(d, **param)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 359, in get_dataset_ner_single
data_split_all, unseen_entity_set, label_to_id = decode_all_files(
^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 459, in decode_all_files
label_to_id, unseen_entity_set, data_dict = decode_file(
^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 397, in decode_file
for n, line in enumerate(f):
File "C:\Users\talia\anaconda3\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7701: character maps to
The text was updated successfully, but these errors were encountered:
The issue with the 'UnicodeDecodeError' when reading file can be resolved by specifying the encoding 'utf-8' explicitly when opening the file within the decode_file function in the get_dataset.py file present in the path CyNer/cyner/tner/get_dataset.py .
got this error:
2024-01-03 21:14:47 INFO *** initialize network ***
2024-01-03 21:14:47 INFO create new checkpoint
2024-01-03 21:14:47 INFO removed incomplete checkpoint .ckpt
2024-01-03 21:14:47 INFO checkpoint: .ckpt
2024-01-03 21:14:47 INFO - [arg] dataset: dataset/mitre
2024-01-03 21:14:47 INFO - [arg] transformers_model: xlm-roberta-base
2024-01-03 21:14:47 INFO - [arg] random_seed: 1
2024-01-03 21:14:47 INFO - [arg] lr: 5e-06
2024-01-03 21:14:47 INFO - [arg] epochs: 20
2024-01-03 21:14:47 INFO - [arg] warmup_step: 0
2024-01-03 21:14:47 INFO - [arg] weight_decay: 1e-07
2024-01-03 21:14:47 INFO - [arg] batch_size: 32
2024-01-03 21:14:47 INFO - [arg] max_seq_length: 128
2024-01-03 21:14:47 INFO - [arg] fp16: False
2024-01-03 21:14:47 INFO - [arg] max_grad_norm: 1
2024-01-03 21:14:47 INFO - [arg] lower_case: False
2024-01-03 21:14:47 INFO target dataset: ['dataset/mitre']
2024-01-03 21:14:47 INFO data_name: dataset/mitre
2024-01-03 21:14:47 INFO formatting custom dataset from dataset/mitre
2024-01-03 21:14:47 INFO found following files: {'test': 'test.txt', 'train': 'train.txt', 'valid': 'valid.txt'}
2024-01-03 21:14:47 INFO note that files should be named as either
valid.txt
,test.txt
, ortrain.txt
Traceback (most recent call last):
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\c3.py", line 11, in
model.train()
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\transformers_ner.py", line 52, in train
trainer.train(monitor_validation=True)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 292, in train
self.__setup_model_data(self.args.dataset, self.args.lower_case)
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\model.py", line 142, in __setup_model_data
self.dataset_split, self.label_to_id, self.language, self.unseen_entity_set = get_dataset_ner(
^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 153, in get_dataset_ner
data_split_all, label_to_id, language, ues = get_dataset_ner_single(d, **param)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 359, in get_dataset_ner_single
data_split_all, unseen_entity_set, label_to_id = decode_all_files(
^^^^^^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 459, in decode_all_files
label_to_id, unseen_entity_set, data_dict = decode_file(
^^^^^^^^^^^^
File "C:\Users\talia\OneDrive\Desktop\New folder (3)\CyNER-main\CyNER-main\cyner\tner\get_dataset.py", line 397, in decode_file
for n, line in enumerate(f):
File "C:\Users\talia\anaconda3\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7701: character maps to
The text was updated successfully, but these errors were encountered: