# Text to IPA

Install Epitran library and required files :

In [None]:
pip install epitran



In [None]:
import epitran

In [None]:
epitran.download.cedict()

The code argument for the Epitran class is as follows in this table :  https://github.com/dmort27/epitran#transliteration-languagescript-pairs

In [None]:
epi = epitran.Epitran('hin-Deva')

In [None]:
epi.transliterate("आप कैसे हैं")

'aːpə kæːse ɦæːn'

In [None]:
epi = epitran.Epitran("cmn-Hant")

In [None]:
epi.transliterate('你好嗎')

'nixaoma'

The backoff class is useful for when one language mode does not work, it falls back to another, and so on. This is helpful when there is more than one code for the same language

In [None]:
from epitran.backoff import Backoff

The cedict.txt file is required when working with Mandarin Chinese (both Simplified and Traditional)

In [None]:
backoff = Backoff(['cmn-Hant', 'cmn-Hans'], cedict_file='cedict.txt')

In [None]:
backoff.transliterate('中文')

'ʈ͡ʂoŋwen'

Unfortunately, Epitran requires some complicated installations to run on english text. This is an alternative to avoid that route :

In [None]:
pip install eng_to_ipa



In [None]:
import eng_to_ipa as eng

In [None]:
eng.convert("How are you ?")

'haʊ ər ju '

In [None]:
eng.convert("Hello world")

'hɛˈloʊ wərld'

# Translating Hindi to Farsi

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("hindi_loanwords.csv", index_col=0)

In [None]:
df

Unnamed: 0,variable,hin_loanwords
0,Deva,अंगूर
1,Deva,आइंदा
2,Deva,इंतक़ाल
3,Deva,-ई
4,Deva,उम्दा
...,...,...
1999,Deva 63,ताज़ा
2002,Deva 63,नौबत
2005,Deva 63,बावरची
2006,Deva 63,मादर


The googletrans API is not the most stable API out there (https://py-googletrans.readthedocs.io/en/latest/#:~:text=The%20maximum%20character%20limit%20on%20a%20single%20text%20is%2015k.)
For some reason we have to install current version of googletrans, and then install another one in order to have it work correctly.

In [None]:
pip install googletrans

Collecting googletrans
  Downloading googletrans-3.0.0.tar.gz (17 kB)
Collecting httpx==0.13.3
  Downloading httpx-0.13.3-py3-none-any.whl (55 kB)
[K     |████████████████████████████████| 55 kB 2.8 MB/s 
Collecting sniffio
  Downloading sniffio-1.2.0-py3-none-any.whl (10 kB)
Collecting httpcore==0.9.*
  Downloading httpcore-0.9.1-py3-none-any.whl (42 kB)
[K     |████████████████████████████████| 42 kB 1.4 MB/s 
[?25hCollecting hstspreload
  Downloading hstspreload-2021.12.1-py3-none-any.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 42.8 MB/s 
[?25hCollecting rfc3986<2,>=1.3
  Downloading rfc3986-1.5.0-py2.py3-none-any.whl (31 kB)
Collecting h11<0.10,>=0.8
  Downloading h11-0.9.0-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.1 MB/s 
[?25hCollecting h2==3.*
  Downloading h2-3.2.0-py2.py3-none-any.whl (65 kB)
[K     |████████████████████████████████| 65 kB 3.3 MB/s 
[?25hCollecting hpack<4,>=3.0
  Downloading hpack-3.0.0-py2.py3-n

In [None]:
pip install googletrans==3.1.0a0

Collecting googletrans==3.1.0a0
  Downloading googletrans-3.1.0a0.tar.gz (19 kB)
Building wheels for collected packages: googletrans
  Building wheel for googletrans (setup.py) ... [?25l[?25hdone
  Created wheel for googletrans: filename=googletrans-3.1.0a0-py3-none-any.whl size=16367 sha256=269a63cf81cd7b8aea27799b23c519ba7826699effe9a08b81da3e14d3fac6e4
  Stored in directory: /root/.cache/pip/wheels/0c/be/fe/93a6a40ffe386e16089e44dad9018ebab9dc4cb9eb7eab65ae
Successfully built googletrans
Installing collected packages: googletrans
  Attempting uninstall: googletrans
    Found existing installation: googletrans 3.0.0
    Uninstalling googletrans-3.0.0:
      Successfully uninstalled googletrans-3.0.0
Successfully installed googletrans-3.1.0a0


In [None]:
from googletrans import Translator

In [None]:
translator = Translator()

The next two cells are to execute if we want to know the code of each language.

In [None]:
# from googletrans import LANGUAGES

In [None]:
# LANGUAGES

In [None]:
translator.translate('안녕하세요.').text

'Hello.'

In [None]:
hindi = df.iloc[0,1]
hindi

'अंगूर'

In [None]:
translator.translate(hindi, dest='en', src='hi').text

'Grape'

In [None]:
df["farsi"] = df.apply(lambda x:translator.translate(x["hin_loanwords"], dest='fa', src='hi').text, axis=1)

Another issue with this API is that it doesn't work all the time. Even with the same code, it's not consistant. We could see that through the resulting dataframe, only the first 237 rows have been translated.
Google's official API should be the one to work with if we need a more reliable one : https://cloud.google.com/translate/docs

In [None]:
df

Unnamed: 0,variable,hin_loanwords,farsi
0,Deva,अंगूर,انگور
1,Deva,आइंदा,در آینده
2,Deva,इंतक़ाल,تایم اوت
3,Deva,-ई,-e
4,Deva,उम्दा,خوب
...,...,...,...
1999,Deva 63,ताज़ा,ताज़ा
2002,Deva 63,नौबत,नौबत
2005,Deva 63,बावरची,बावरची
2006,Deva 63,मादर,मादर


In [None]:
df.to_csv("translated.csv", index=False)

In [None]:
df_ = df.iloc[:10].copy()

In [None]:
df_["farsi"] = df_.apply(lambda x:translator.translate(x["hin_loanwords"], dest='fa', src='hi').text, axis=1)

In [None]:
df_

Unnamed: 0,variable,hin_loanwords,farsi
0,Deva,अंगूर,انگور
1,Deva,आइंदा,در آینده
2,Deva,इंतक़ाल,تایم اوت
3,Deva,-ई,-e
4,Deva,उम्दा,خوب
5,Deva,एकाएक,یکدفعه
6,Deva,ऐतराज़,اعتراض
7,Deva,ओ,O
8,Deva,कद्दू,كدو حلوايي
9,Deva,कारगर,تاثير گذار
