[add] translatepy v2.0 #15

ZhymabekRoman · 2021-06-10T07:26:04Z

New features:
- Exception raising
- Proxy support (partly needs to be refined - WIP)
- A better class management, with base classes
- Full code refactoring
- New Bing Translate implementation
And more .....

WIP*:
- Fully implement text to spech function
- Convert ISO 639 to CSV
- Implement supported_languages method

*WIP - working in process

Animenosekai · 2021-06-10T07:55:12Z

Just cloned your fork, I'll check how it works!

…nguage Detection

Animenosekai · 2021-06-10T10:48:44Z

@ZhymabekRoman What do you think of my commit?

Animenosekai · 2021-06-10T10:51:54Z

Also, I thought about adding a better shell version (with something like inquirer) but that would add a dependency

Animenosekai · 2021-06-10T10:53:56Z

I really like the idea that we can very easily add more translators, and even open this to some sorts of plugins, because the user can add whatever BaseTranslator inherited class they want.

Also, I think that giving out the detected language rather than "auto" is needed because without it it would mean that the user needs to make another request when for example they are trying to translate something but show the translation only when the language is different from the original language

ZhymabekRoman · 2021-06-10T11:23:47Z

What do you think of my commit?

Thank you, great work.

I thought about adding a better shell version (with something like inquirer) but that would add a dependency

I think we can use click library: https://click.palletsprojects.com/en/8.0.x/

Animenosekai · 2021-06-10T11:33:57Z

I think we can use click library: https://click.palletsprojects.com/en/8.0.x/

What's cool about inquirer is that you can make a nice looking list for choosing for example what the user wants to do (transliterate, translate, etc.)

ZhymabekRoman · 2021-06-10T13:35:57Z

Do you think it would be a good idea to use CSV to store ISO 693 values? Wouldn't it be better to use JSON instead of csv?

New CSV table solves a few problems:

It makes easier to change and access the data. The data looks very structured.
This table also acts as a list of languages supported by the service

I also added to the list values that are not in the official ISO 639 standard. For example Yandex Translator can translate text in emoji, and only Yandex supports translation of Latin Kazakh and Cyrillic Uzbek

Animenosekai · 2021-06-10T15:54:56Z

Do you think it would be a good idea to use CSV to store ISO 693 values? Wouldn't it be better to use JSON instead of csv?

New CSV table solves a few problems:

It makes easier to change and access the data. The data looks very structured.

This table also acts as a list of languages supported by the service

I also added to the list values that are not in the official ISO 639 standard. For example Yandex Translator can translate text in emoji, and only Yandex supports translation of Latin Kazakh and Cyrillic Uzbek

I guess we could make the data in CSV and convert it to a Python dict so it is a native python object and there is no I/O time when launching translatepy

I never worked concretely with CSV but if you are familiar with it why not.

Also, we should add the translation to the other languages

Animenosekai · 2021-06-10T23:07:58Z

@ZhymabekRoman I just added the interactive interface to translatepy!

ZhymabekRoman · 2021-06-11T05:40:04Z

I just added the interactive interface to translatepy!

Nice! I fixed some bug

ZhymabekRoman · 2021-06-12T11:56:51Z

So, I've reworked the ISO 639 data storage mechanism a little bit. All the information is stored in a CSV table, which is very easy to edit and the data looks structured. All of the ISO 639 data and the languages supported by the services were compiled from scratch from public sources.

And by the way, GitHub can display CSV in a browser: https://github.com/ZhymabekRoman/translate/blob/main/playground/iso639.csv

I also wrote a special script that converts CSV to Python code - named typle. This script is in translate/playground/export_csv_iso639_table.py, and it generates file iso639_table.py, which should be put into translate/translatepy/utils/ folder.

And yes by the way, I accidentally deleted the playground folder and all the scripts inside, if there were any needed scripts restore from GIT history.

Here are a couple of examples of changes. Previously, the Language class did not provide information about the languages which are not on the ISO 639 list but are supported by services. Now you can get full information about the language if you know the language code used by the translation service. For example, let's take the language supported by the Bing service - Chinese Simplified Language. The code of the language used by Bing for translation is zh-Hans. Let's get information about the language:

>>> Language.by_bing("zh-Hans")
Language(name='Chinese Simplified', alpha2='', alpha3='', in_foreign_languages={'sw': 'Kichina Kilichorahisishwa', 'ne': 'चिनियाँ सरलीकृत', 'sq': 'Kineze E Thjeshtuar', 'ht': 'Chinwa Senplifye', 'nl': 'Vereenvoudigd Chinees', 'be': 'Кітайскі спрошчаны', 'ga': 'Síneach A Simplithe', 'ba': 'Ябай ҡытай', 'ta': 'சீன எளிமைப்படுத்தப்பட்ட', 'mg': 'Sinoa Notsorina', 'pa': 'ਚੀਨੀ ਸਧਾਰਨ', 'gd': 'Sìnis Shimplichte', 'fi': 'Kiina, Yksinkertaistettu', 'ky': 'Кытай жөнөкөйлөштүрүлгөн', 'ar': 'الصينية المبسطة', 'he': 'סינית פשוטה', 'lt': 'Kinų Supaprastinta', 'uz': 'Xitoy Soddalashtirilgan', 'pl': 'Chiński Uproszczony', 'mi': 'Hainamana Ngāwari', 'ms': 'Cina', 'sv': 'Kinesiska, Förenklad', 'uk': 'Китайський спрощений', 'pt': 'Chinês Simplificado', 'vi': 'Trung Quốc Đơn Giản', 'hu': 'Egyszerűsített Kínai', 'cy': 'Tsieineaidd Simplified', 'gu': 'ચિની સરળીકૃત', 'eo': 'Ĉina Simpligita', 'km': 'ចិនសាមញ្ញ', 'no': 'Kinesisk (Forenklet)', 'bg': 'Китайски опростен', 'es': 'Chino Simplificado', 'cv': 'Китай ансатлатнӑ', 'et': 'Hiina Lihtsustatud', 'ja': '簡体字中国語', 'da': 'Forenklet Kinesisk', 'bn': 'সরলীকৃত চীনা', 'it': 'Cinese semplificato', 'en': 'Chinese Simplified', 'ca': 'Xinès Simplificat', 'th': 'ภาษาจีนประยุกต์ Name', 'tl': 'Pinasimpleng Tsino', 'la': 'Seres Simpliciores', 'te': 'సరళీకృత చైనీస్', 'tt': 'Кытайча', 'ko': '중국어 간체', 'xh': 'Isitshayina Esenziwe Lula', 'ml': 'Chinese Simplified', 'sl': 'Poenostavljeno Kitajsko', 'af': 'Vereenvoudigde Sjinees', 'fa': 'چینی ساده شده', 'tg': 'Чин соддакардашудаи', 'hy': 'Չինական պարզեցված', 'hi': 'चीनी सरलीकृत', 'my': 'တရုတ်ရိုးရှင်း', 'el': 'Κινέζικα Απλοποιημένα', 'id': 'Cina Disederhanakan', 'ka': 'ჩინური გამარტივებული', 'mk': 'Кинески-Поедноставен', 'cs': 'Zjednodušená Čínština', 'is': 'Kínverska Einfaldað', 'lo': 'ພາສາຈີນແບບງ່າຍ(ຈີນກາງ)', 'eu': 'Txinera Erraztua', 'mr': 'सरलीकृत चीनी', 'jv': 'Chinese Simplified', 'sr': 'Кинески', 'bs': 'Kineski Pojednostavljeni', 'kn': 'ಚೀನೀ ಸರಳೀಕೃತ', 'ru': 'Китайский упрощенный', 'zh': '简体中文', 'gl': 'Chinés Simplificado', 'si': 'සරල චීන', 'ro': 'Chineză Simplificată', 'su': 'Cina Saderhana', 'fr': 'Chinois Simplifié', 'ur': 'آسان کردہ چینی', 'sk': 'Zjednodušená Čínština', 'lb': 'Chinesesch (Vereinfacht)', 'hr': 'Kineski pojednostavljeni', 'am': 'ቻይንኛ ቀላል', 'yi': 'כינעזיש סימפּלאַפייד', 'mn': 'Хятадын Хялбаршуулсан', 'de': 'Chinesisch (Vereinfacht)', 'kk': 'Қытай жеңілдетілген', 'mt': 'Ċiniż Simplifikata', 'lv': 'Ķīniešu Vienkāršotā', 'tr': 'Basitleştirilmiş Çince', 'zu': 'Isi-Chinese Esenziwe Lula', 'az': 'Basitleştirilmiş çin'}, yandex='', google='zh-cn', bing='zh-Hans', reverso='', deepl='')

Great, we have full information about the language. Let's also try to get information about emoji, which only Yandex supports and is not on the official ISO 639 list:

>>> Language.by_yandex("emj")
Language(name='Emoji', alpha2='', alpha3='', in_foreign_languages={'sw': 'Emoji', 'ne': 'Emoji', 'sq': 'Emoji', 'ht': 'Anoji', 'nl': 'Emoji', 'be': 'Emoji', 'ga': 'Emoji', 'ba': 'Эмодзи', 'ta': 'ஈமோஜி', 'mg': 'Emoji', 'pa': 'Emoji', 'gd': 'Emoji', 'fi': 'Emoji', 'ky': 'Климаты мелүүн.', 'ar': 'رموز تعبيرية', 'he': 'Emoji', 'lt': 'Emoji', 'uz': 'Emoji', 'pl': 'Emoji', 'mi': 'Whakapā', 'ms': 'Smiley', 'sv': 'Emoji', 'uk': 'Емодзі', 'pt': 'Emoji', 'vi': 'Xúc', 'hu': 'Emoji', 'cy': 'Emoji', 'gu': 'ઇમોજી', 'eo': 'Emoji', 'km': 'អារម្មណ៍', 'no': 'Emoji', 'bg': 'Емоджи', 'es': 'Emoji', 'cv': 'Эмодзи', 'et': 'Emoji', 'ja': '絵文字', 'da': 'Emoji', 'bn': 'ইমোজি', 'it': 'Emoji', 'en': 'Emoji', 'ca': 'L"ús d"emoji', 'th': 'Emoji', 'tl': 'Mga Emoji', 'la': 'Emoji', 'te': 'Emoji', 'tt': 'Эмодзи', 'ko': '이모티콘', 'xh': 'Emoji', 'ml': 'Fast in malayalam', 'sl': 'Emoji', 'af': 'Emoji', 'fa': 'شکلک', 'tg': 'Эмодзи', 'hy': 'Էմոձի', 'hi': 'इमोजी', 'my': 'စိတ္၀င္စားစရာ', 'el': 'Emoji', 'id': 'Emoji', 'ka': 'ემოჯი', 'mk': 'Emoji', 'cs': 'Smajlík', 'is': 'Emoji', 'lo': 'ສັນຍາລັກ', 'eu': 'Emoji', 'mr': 'ईमोजी', 'jv': 'Emoji', 'sr': 'Емоји', 'bs': 'Emoji', 'kn': 'ಎಮೊಜಿಯನ್ನು', 'ru': 'Эмодзи', 'zh': '表情符号', 'gl': 'Emoji', 'si': 'එමොජි', 'ro': 'Emoji', 'su': 'Emoji', 'fr': 'Emoji', 'ur': 'Emoji', 'sk': 'Emoji', 'lb': 'Emoji', 'hr': 'Emoji', 'am': 'አዳዲስ', 'yi': 'עמאָדזשי', 'mn': 'Эможи', 'de': 'Emoji', 'kk': 'Эмодзи', 'mt': 'Emoji', 'lv': 'Emocijzīme', 'tr': 'Emoji', 'zu': 'Emoji', 'az': 'Emoji'}, yandex='emj', google='', bing='', reverso='', deepl='')

Also by this you can get about the language of the text. If before the language code was returned by the API

>>> from translatepy.translators import YandexTranslate
>>> dl = YandexTranslate()
>>> dl.language("Hello, how are you?")
LanguageResult(service=Yandex, source=Hello, how are you?, result=en)

Now returns the Language object

>>> from translatepy.translators import YandexTranslate
>>> dl = YandexTranslate()
>>> dl.language("Hello, how are you?")
LanguageResult(service=Yandex, source=Hello, how are you?, result=Language(name='English', alpha2='en', alpha3='eng', in_foreign_languages={'sw': 'Kiingereza', 'ne': 'नेपाली', 'sq': 'Anglisht', 'ht': 'Angle', 'nl': 'Engels', 'be': 'Англійскі', 'ga': 'Béarla', 'ba': 'Инглиз', 'ta': 'தமிழ்', 'mg': 'Malagasy', 'pa': 'ਅੰਗਰੇਜ਼ੀ', 'gd': 'Gaelic', 'fi': 'Englanti', 'ky': 'Кайнатылган.', 'ar': 'English', 'he': 'אנגלית', 'lt': 'Anglų', 'uz': 'Www uzbekona uz joni', 'pl': 'Angielski', 'mi': 'Maori', 'ms': 'Bahasa inggeris', 'sv': 'Engelsk', 'uk': 'Англійський', 'pt': 'Inglês', 'vi': 'Tiếng anh', 'hu': 'Angol', 'cy': 'Saesneg', 'gu': 'અંગ્રેજી', 'eo': 'La angla', 'km': 'គ្លេស', 'no': 'Engelsk', 'bg': 'Английски', 'es': 'Ingl', 'cv': 'Акӑлчанла', 'et': 'Inglise', 'ja': '英語', 'da': 'Engelsk', 'bn': 'বাংলা সেক্স ভিডিও', 'it': 'Inglese', 'en': 'English', 'ca': 'Anglès', 'th': 'ภาษาอังกฤษ', 'tl': 'Ingles', 'la': 'Anglorum', 'te': 'తెలుగు', 'tt': 'Инглизчә', 'ko': '영어', 'xh': 'Isixhosa', 'ml': 'മലയാളം', 'sl': 'Slovenian', 'af': 'Engels', 'fa': 'انگلیسی', 'tg': 'English', 'hy': 'Անգլերեն', 'hi': 'अंग्रेजी', 'my': 'အဂၤလိပ္စာ', 'el': 'Αγγλική', 'id': 'Inggris-US-sdh', 'ka': 'ინგლისური', 'mk': 'Англиски', 'cs': 'Anglický', 'is': 'Enska', 'lo': 'ອັງກິດ', 'eu': 'Euskara', 'mr': 'एचडी', 'jv': 'Inggris', 'sr': 'Енглески', 'bs': 'Engleski', 'kn': 'ಕನ್ನಡ', 'ru': 'Английский', 'zh': '中文', 'gl': 'Inglés', 'si': 'ඉංග්රීසි', 'ro': 'Română', 'su': 'Basa inggris', 'fr': 'Anglais', 'ur': 'انگریزی', 'sk': 'Anglický', 'lb': 'Englischsprachig', 'hr': 'Engleski', 'am': 'አማርኛ', 'yi': 'ענגליש', 'mn': 'Англи хэл', 'de': 'Englischsprachig', 'kk': 'Ағылшын', 'mt': 'Malti', 'lv': 'Angļu', 'tr': 'İngilizce', 'zu': 'Isizulu', 'az': 'İngilis dili'}, yandex='en', google='en', bing='en', reverso='en', deepl='EN'))

ZhymabekRoman · 2021-06-12T12:06:38Z

Lmao, I just remembered that it was possible to use named typle instead of creating separate results model classes

For example:

from collections import namedtuple
TranslationResult = namedtuple("TranslationResult", "service source source_language destination_language result")

Animenosekai · 2021-06-13T08:59:55Z

Lmao, I just remembered that it was possible to use named typle instead of creating separate results model classes

For example:
from collections import namedtuple
TranslationResult = namedtuple("TranslationResult", "service source source_language destination_language result")

I mean, using classes isn't that bad too lol

Also, I looked at the script creating the python version of the CSV: Did everything work while doing so much translation with Yandex?

(also if you changed all of the translations with the Yandex's ones we could merge it with the previous data, generated by translating using Google translate to get more data while checking the similarity to improve the accuracy)

… is handled

Animenosekai · 2021-06-14T11:04:50Z

@ZhymabekRoman Do you think that we should keep the _translate, _transliterate, _spellcheck, etc. methods abstract?

Like we could just leave them as normal functions, raise an exception by default so that we don't need to add it and raise an exception on each translator class.

I think though that we should keep the _language_normalize and _language_denormalize abstract as they are needed.

ZhymabekRoman · 2021-06-14T11:08:45Z

Did everything work while doing so much translation with Yandex?

Yes, I tried to make more than 100 000 requests - everything works fine

And I think the PR is ready. Idk why tests won't works, but in python interactive shell works fine

ZhymabekRoman · 2021-06-14T11:12:59Z

@ZhymabekRoman Do you think that we should keep the _translate, _transliterate, _spellcheck, etc. methods abstract?

Like we could just leave them as normal functions, raise an exception by default so that we don't need to add it and raise an exception on each translator class.

I think though that we should keep the _language_normalize and _language_denormalize abstract as they are needed.

Hmmm, yeah, I think that's a great idea

…guage for tts

Animenosekai · 2021-06-14T12:56:37Z

And I think the PR is ready. Idk why tests won't works, but in python interactive shell works fine

Yea, I think that I'll merge it and we'll continue the small changes on the main branch

[add] translatepy v2.0

f6ca5fc

ZhymabekRoman and others added 2 commits June 10, 2021 13:55

[fix] Fix test

8fff6bb

[add] Proxy Management, Response Management, Class Repr, Automatic La…

47e9f5f

…nguage Detection

[add] Interactive Interface [add] Command JSON access

f6b8ba9

[fix] Fixes some bugs

c41b11b

[add] Re-implement ISO 639 data

ff6612e

Animenosekai and others added 6 commits June 13, 2021 12:39

[add] MyMemory Translation API [fix] raise_for_status

31ba449

[add] Adding translate.com [fix] changing the way the Translator name…

08eaf0c

… is handled

[fix] changing the way rate limiting works in deepl

e490dfb

[fix] fixing command line access

e498c7a

[add] Implement text to speech methods

d1c5b1f

[fix] Fix test, and text to speech

bd8a0b3

ZhymabekRoman marked this pull request as ready for review June 14, 2021 11:00

[change] changing methods abstraction

aaafdb4

[change] removing unncesseary methods [fix] returning service and Lan…

ca79633

…guage for tts

Animenosekai merged commit 418cb4f into Animenosekai:main Jun 14, 2021

This was referenced Jun 14, 2021

Possible to specify translate.google.cn? #14

Closed

Implementing proxy? #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[add] translatepy v2.0 #15

[add] translatepy v2.0 #15

ZhymabekRoman commented Jun 10, 2021 •

edited

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021 •

edited

Animenosekai commented Jun 10, 2021 •

edited

ZhymabekRoman commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

ZhymabekRoman commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

ZhymabekRoman commented Jun 11, 2021

ZhymabekRoman commented Jun 12, 2021 •

edited

ZhymabekRoman commented Jun 12, 2021 •

edited

Animenosekai commented Jun 13, 2021

Animenosekai commented Jun 14, 2021

ZhymabekRoman commented Jun 14, 2021

ZhymabekRoman commented Jun 14, 2021

Animenosekai commented Jun 14, 2021

[add] translatepy v2.0 #15

[add] translatepy v2.0 #15

Conversation

ZhymabekRoman commented Jun 10, 2021 • edited

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021 • edited

Animenosekai commented Jun 10, 2021 • edited

ZhymabekRoman commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

ZhymabekRoman commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

Animenosekai commented Jun 10, 2021

ZhymabekRoman commented Jun 11, 2021

ZhymabekRoman commented Jun 12, 2021 • edited

ZhymabekRoman commented Jun 12, 2021 • edited

Animenosekai commented Jun 13, 2021

Animenosekai commented Jun 14, 2021

ZhymabekRoman commented Jun 14, 2021

ZhymabekRoman commented Jun 14, 2021

Animenosekai commented Jun 14, 2021

ZhymabekRoman commented Jun 10, 2021 •

edited

Animenosekai commented Jun 10, 2021 •

edited

Animenosekai commented Jun 10, 2021 •

edited

ZhymabekRoman commented Jun 12, 2021 •

edited

ZhymabekRoman commented Jun 12, 2021 •

edited