Skip to content
This repository has been archived by the owner on Sep 19, 2023. It is now read-only.

Ignore invalid UTF8 characters when tokenizing #61

Merged
merged 1 commit into from Mar 20, 2019
Merged

Ignore invalid UTF8 characters when tokenizing #61

merged 1 commit into from Mar 20, 2019

Conversation

guillaumekln
Copy link
Contributor

@guillaumekln guillaumekln merged commit d49756f into OpenNMT:master Mar 20, 2019
@guillaumekln guillaumekln deleted the ignore-invalid-utf8 branch March 20, 2019 10:04
jsenellart pushed a commit that referenced this pull request May 20, 2019
* master: (73 commits)
  Fix base directory in data path
  Only upgrade data configuration in training runs
  Also update tokenization config in local config after buildvocab
  Allow missing sample_dist
  Enable corpus synchronization for old-style data configuration (#68)
  Improve storage (#64)
  --copy_source translation option to build aligned source/target files (#67)
  Update to OpenNMT-tf 1.22.0
  Declare dtype in tensor proto
  Translate from gzip files (#65)
  Refresh OpenNMT-py framework with serving support (#54)
  Update TensorFlow to 1.13
  change request.get to request.post to support long sentences (#63)
  Ignore invalid UTF8 characters when tokenizing (#61)
  Update OpenNMT-tf to 1.21.7
  By default, disable TER and METEOR for computation reason (#60)
  refs 51557: change Thai to Char based evaluation (#59)
  Add missing auto_config flag for inference Runner
  Support returning the averaged checkpoint from OpenNMT-tf
  Update OpenNMT-tf to 1.21.6
  ...
jsenellart pushed a commit that referenced this pull request May 20, 2019
* master: (73 commits)
  Fix base directory in data path
  Only upgrade data configuration in training runs
  Also update tokenization config in local config after buildvocab
  Allow missing sample_dist
  Enable corpus synchronization for old-style data configuration (#68)
  Improve storage (#64)
  --copy_source translation option to build aligned source/target files (#67)
  Update to OpenNMT-tf 1.22.0
  Declare dtype in tensor proto
  Translate from gzip files (#65)
  Refresh OpenNMT-py framework with serving support (#54)
  Update TensorFlow to 1.13
  change request.get to request.post to support long sentences (#63)
  Ignore invalid UTF8 characters when tokenizing (#61)
  Update OpenNMT-tf to 1.21.7
  By default, disable TER and METEOR for computation reason (#60)
  refs 51557: change Thai to Char based evaluation (#59)
  Add missing auto_config flag for inference Runner
  Support returning the averaged checkpoint from OpenNMT-tf
  Update OpenNMT-tf to 1.21.6
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant