Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
amaiya committed Jun 25, 2020
2 parents e9aafad + 970e15b commit cbd42f9
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 8 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@ Most recent releases are shown at the top. Each release shows:
- **Fixed**: Bug fixes that don't change documented behaviour


## 0.17.2 (2020-06-25)

### New:
- Added support for Russian in `text.EnglishTranslator`

### Changed
- N/A

### Fixed:
- N/A

## 0.17.1 (2020-06-24)

### New:
Expand Down
40 changes: 36 additions & 4 deletions examples/text/language_translation_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"\n",
"- `zh` : Chinese (both Simplified and Traditional)\n",
"- `ar` : Arabic\n",
"- `ru` : Russian\n",
"- `de` : German\n",
"- `ar` : Afrikaans\n",
"- `fr` : French\n",
Expand Down Expand Up @@ -72,7 +73,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand All @@ -95,7 +96,9 @@
"metadata": {},
"source": [
"#### Some comments about traslations:\n",
"Notice in the example above that we supplied a document of **two** sentences as input. The `translate` method can accept single sentences, paragraphs, or entire documents. However, if the document is large (e.g., a book), we recommend that you break it up into smaller chunks (e.g., pages or paragraphs). This is because *ktrain* tokenizes your document into individual sentences, which re joined together and fed to model as single batch when making a prediction. If the batch is too large for memory, the prediction will fail.\n"
"Notice in the example above that we supplied a document of **two** sentences as input. The `translate` method can accept single sentences, paragraphs, or entire documents. However, if the document is large (e.g., a book), we recommend that you break it up into smaller chunks (e.g., pages or paragraphs). This is because *ktrain* tokenizes your document into individual sentences, which re joined together and fed to model as single batch when making a prediction. If the batch is too large for memory, the prediction will fail.\n",
"\n",
"When instantiating the `EnglishTranslator`, pretrained models are automatically loaded, which may take a few seconds. Once instantiated, the `translate` method can be repeatedly invoked on different documents or sentences. Next, let us reinstantiate an `EnglishTranslator` object to translate Arabic.\n"
]
},
{
Expand All @@ -107,7 +110,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"outputs": [
{
Expand All @@ -127,6 +130,35 @@
"print(translator.translate(src_text))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Russian to English"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The pandemic has damaged the world economy.\n",
"However, as of June 2020, the US stock market continues to grow.\n"
]
}
],
"source": [
"translator = text.EnglishTranslator(src_lang='ru')\n",
"src_text = '''Пандемия нанесла ущерб мировой экономике.\n",
"Однако по состоянию на июнь 2020 года фондовый рынок США продолжает расти.\n",
"'''\n",
"print(translator.translate(src_text))"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -219,7 +251,7 @@
"source": [
"## The `Translator` Class for Translating to and from Many Languages\n",
"\n",
"For translations **from** and **to** other languages, `text.Translator`instances can be used. `Translator` instances accept as input a pretrained model from [Helsinki-NLP](https://huggingface.co/Helsinki-NLP). For instance, to translate Chinese to German, one can use the [Helsinki-NLP/opus-mt-ZH-de ](https://huggingface.co/Helsinki-NLP/opus-mt-ZH-de) model:"
"For translations **from** and **to** other languages, `text.Translator`instances can be used. `Translator` instances accept as input a pretrained model from [Helsinki-NLP](https://huggingface.co/models?search=Helsinki-NLP%2Fopus-mt). For instance, to translate Chinese to German, one can use the [Helsinki-NLP/opus-mt-ZH-de ](https://huggingface.co/Helsinki-NLP/opus-mt-ZH-de) model:"
]
},
{
Expand Down
9 changes: 6 additions & 3 deletions ktrain/text/translation/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from ... import utils as U
from .. import textutils as TU

SUPPORTED_SRC_LANGS = ['zh', 'ar', 'de', 'af', 'es', 'fr', 'it', 'pt']
SUPPORTED_SRC_LANGS = ['zh', 'ar', 'ru', 'de', 'af', 'es', 'fr', 'it', 'pt']

class Translator():
"""
Expand All @@ -18,11 +18,11 @@ def __init__(self, model_name=None, device=None):
device(str): device to use (e.g., 'cuda', 'cpu')
"""
if 'Helsinki-NLP' not in model_name:
raise ValueError('BasicTranslator requires a Helsinki-NLP model: https://huggingface.co/Helsinki-NLP')
raise ValueError('Translator requires a Helsinki-NLP model: https://huggingface.co/Helsinki-NLP')
try:
import torch
except ImportError:
raise Exception('BasicTranslator requires PyTorch to be installed.')
raise Exception('Translator requires PyTorch to be installed.')
self.torch_device = device
if self.torch_device is None: self.torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
from transformers import MarianMTModel, MarianTokenizer
Expand Down Expand Up @@ -67,6 +67,7 @@ def __init__(self, src_lang=None, device=None):
Must be one of SUPPORTED_SRC_LANGS:
'zh': Chinese (either tradtional or simplified)
'ar': Arabic
'ru' : Russian
'de': German
'af': Afrikaans
'es': Spanish
Expand All @@ -82,6 +83,8 @@ def __init__(self, src_lang=None, device=None):
self.translators = []
if src_lang == 'ar':
self.translators.append(Translator(model_name='Helsinki-NLP/opus-mt-ar-en', device=device))
elif src_lang == 'ru':
self.translators.append(Translator(model_name='Helsinki-NLP/opus-mt-ru-en', device=device))
elif src_lang == 'de':
self.translators.append(Translator(model_name='Helsinki-NLP/opus-mt-de-en', device=device))
elif src_lang == 'af':
Expand Down
2 changes: 1 addition & 1 deletion ktrain/version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__all__ = ['__version__']
__version__ = '0.17.1'
__version__ = '0.17.2'

0 comments on commit cbd42f9

Please sign in to comment.