# AWS Notebook
## Importing Libraries and Data

In [80]:
import pandas as pd
import s3fs
import boto3
from nltk.translate.bleu_score import sentence_bleu

In [74]:
!aws s3 ls example-mallik

2022-10-30 20:24:16     327126 AWS-ML-API.html
2022-11-19 19:00:14        378 Trial.rtf
2022-10-17 14:54:24     576453 my-notebook.html
2022-11-19 19:28:52        908 tableConvert.csv
2022-11-20 16:40:48       3167 tableConvert_2.csv
2022-11-20 16:43:08       3183 tableConvert_4.csv
2022-11-25 19:32:37       3283 tableConvert_5.csv


In [75]:
fs = s3fs.S3FileSystem()

In [76]:
file = fs.open('s3://example-mallik/tableConvert_5.csv')

In [77]:
data = pd.read_csv(file)

In [78]:
data

Unnamed: 0,Language,Informal_Text_1,Informal_Text_2,Formal_Text_1,Formal_Text_2
0,English,In my younger and more vulnerable years my fat...,I had a dog — at least I had him for a few day...,"In consequence, I’m inclined to reserve all ju...",Almost any exhibition of complete self suffici...
1,Chinese,我年纪还轻，阅历不深的时候，我父亲教导过我一句话，我至今还念念不忘。“每逢你想要批评任何人的...,我有一条狗——至少在它跑掉以前我养了它几天——一辆旧道吉汽车和一个芬兰女佣人，她替我收拾床铺...,久而久之，我就惯于对所有的人都保留判断，这个习惯既使得许多有怪僻的人肯跟我讲心里话，也使我成...,这种几乎是完全我行我素的神情总是使我感到目瞪口呆，满心赞佩。
2,Spanish,En mis años mozos y más vulnerables mi padre m...,Tenía un perro -o al menos lo tuve durante var...,"En consecuencia, soy una persona dada a reserv...",Casi cualquier exhibición de total autosuficie...
3,French,"Quand j’étais plus jeune, ce qui veut dire plu...",J’avais un chien – du moins je l’eus pendant q...,"En conséquence, je suis porté à réserver mes j...",N’importe quelle exhibition d’assurance m’exto...


We will take the Chinese, Spanish, and French translation of the same sentence and translate this sentence into English. We will do this in both Google Translate and AWS Translate.

In [None]:
translate=boto3.client("translate")

# Informal Text 1

**The original text in English is:**
In my younger and more vulnerable years my father gave me some advice that I’ve been turning over in my mind ever since. “Whenever you feel like criticizing any one,” he told me, “just remember that all the people in this world haven’t had the advantages that you’ve had.”

### Chinese to English
**The text in Chinese is:**
我年纪还轻，阅历不深的时候，我父亲教导过我一句话，我至今还念念不忘。“每逢你想要批评任何人的时候，”他对我说，“你就记住，这个世界上所有的人，并不是个个都有过你拥有的那些优越条件。”

Retrieving and translating the text data 

In [16]:
text = data['Informal_Text_1'].iloc[1]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'When I was still young and not experienced, my father taught me a phrase I still remember. “Every time you want to criticize anyone,” he said to me, “just remember, not everyone in this world has the advantages you have',
 'SourceLanguageCode': 'zh',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '8dee054f-4182-4b6b-82ab-4f99bad39618',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '8dee054f-4182-4b6b-82ab-4f99bad39618',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '298',
   'date': 'Fri, 25 Nov 2022 18:48:53 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
When I was still young and not experienced, my father taught me a phrase I still remember. “Every time you want to criticize anyone,” he said to me, “just remember, not everyone in this world has the advantages you have".

### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**8.5/10.**

#### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [18]:
eng_text = data['Informal_Text_1'].iloc[0]

In [19]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in my younger and more vulnerable years my father gave me some advice that i’ve been turning over in my mind ever since whenever you feel like criticizing any one he told me just remember that all the people in this world haven’t had the advantages that you’ve had


Then, converted to a list

In [20]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'my', 'younger', 'and', 'more', 'vulnerable', 'years', 'my', 'father', 'gave', 'me', 'some', 'advice', 'that', 'i’ve', 'been', 'turning', 'over', 'in', 'my', 'mind', 'ever', 'since', 'whenever', 'you', 'feel', 'like', 'criticizing', 'any', 'one', 'he', 'told', 'me', 'just', 'remember', 'that', 'all', 'the', 'people', 'in', 'this', 'world', 'haven’t', 'had', 'the', 'advantages', 'that', 'you’ve', 'had']


Next, converted the translated text to lowercase and removed special characters

In [25]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [26]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

when i was still young and not experienced my father taught me a phrase i still remember every time you want to criticize anyone he said to me just remember not everyone in this world has the advantages you have


Then, converted this to a list too.

In [27]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['when', 'i', 'was', 'still', 'young', 'and', 'not', 'experienced', 'my', 'father', 'taught', 'me', 'a', 'phrase', 'i', 'still', 'remember', 'every', 'time', 'you', 'want', 'to', 'criticize', 'anyone', 'he', 'said', 'to', 'me', 'just', 'remember', 'not', 'everyone', 'in', 'this', 'world', 'has', 'the', 'advantages', 'you', 'have']


Finally, we checked the percentage of words from translated text that matched the original text.

In [28]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.4


We also checked the BLEU Score.

**2) BLEU Score**

In [29]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 2.411457623351326e-78


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


This indicates that the Chinese to English translation using the AWS API did not perform well.

### Spanish to English
**The text in Spanish is:**
"En mis años mozos y más vulnerables mi padre me dio un consejo que desde aquella época no ha dejado de darme vueltas en la cabeza. “Cuando sientas deseos de criticar a alguien” -fueron sus palabras- “recuerda que no todo el mundo ha tenido las mismas oportunidades que tú tuviste.”

Retrieving and translating the text data

In [31]:
text = data['Informal_Text_1'].iloc[2]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': "In my younger and most vulnerable years, my father gave me a piece of advice that hasn't stopped spinning in my head ever since. 'When you feel like criticizing someone' -were his words- 'remember that not everyone has had the same opportunities that you had. '",
 'SourceLanguageCode': 'es',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'eb1247b3-f979-4dfb-923f-b64242ea4ef6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'eb1247b3-f979-4dfb-923f-b64242ea4ef6',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '334',
   'date': 'Fri, 25 Nov 2022 18:55:30 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
In my younger and more vulnerable years, my father gave me advice that hasn't stopped spinning in my head ever since. “When you feel like criticizing someone” -were his words- “remember that not everyone has had the same opportunities as you did.”

### Qualitative Analysis

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**9/10.**

### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [33]:
eng_text = data['Informal_Text_1'].iloc[0]

In [34]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in my younger and more vulnerable years my father gave me some advice that i’ve been turning over in my mind ever since whenever you feel like criticizing any one he told me just remember that all the people in this world haven’t had the advantages that you’ve had


Then, converted to a list

In [35]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'my', 'younger', 'and', 'more', 'vulnerable', 'years', 'my', 'father', 'gave', 'me', 'some', 'advice', 'that', 'i’ve', 'been', 'turning', 'over', 'in', 'my', 'mind', 'ever', 'since', 'whenever', 'you', 'feel', 'like', 'criticizing', 'any', 'one', 'he', 'told', 'me', 'just', 'remember', 'that', 'all', 'the', 'people', 'in', 'this', 'world', 'haven’t', 'had', 'the', 'advantages', 'that', 'you’ve', 'had']


Next, converted the translated text to lowercase and removed special characters

In [38]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [39]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

in my younger and most vulnerable years my father gave me a piece of advice that hasn't stopped spinning in my head ever since 'when you feel like criticizing someone' -were his words- 'remember that not everyone has had the same opportunities that you had '


Then, converted this to a list too.

In [40]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['in', 'my', 'younger', 'and', 'most', 'vulnerable', 'years', 'my', 'father', 'gave', 'me', 'a', 'piece', 'of', 'advice', 'that', "hasn't", 'stopped', 'spinning', 'in', 'my', 'head', 'ever', 'since', "'when", 'you', 'feel', 'like', 'criticizing', "someone'", '-were', 'his', 'words-', "'remember", 'that', 'not', 'everyone', 'has', 'had', 'the', 'same', 'opportunities', 'that', 'you', 'had', "'"]


Finally, we checked the percentage of words from translated text that matched the original text.

In [41]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.5652173913043478


We also checked the BLEU Score.

**2) BLEU Score**

In [42]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.2330502445581504


This indicates that the Spanish to English translation using the AWS API didn't perform too well, but better than the other translations.

### French to English
**The text in French is:**
Quand j’étais plus jeune, ce qui veut dire plus vulnérable, mon père me donna un conseil que je ne cesse de retourner dans mon esprit : – Quand tu auras envie de critiquer quelqu’un, songe que tout le monde n’a pas joui des mêmes avantages que toi.


Retrieving and translating the text data

In [44]:
text = data['Informal_Text_1'].iloc[3]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")


{'TranslatedText': 'When I was younger, which means more vulnerable, my father gave me advice that I keep coming back to my mind: — When you feel like criticizing someone, think that not everyone has enjoyed the same benefits as you.',
 'SourceLanguageCode': 'fr',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '78faa28d-f3f8-45d9-b6cf-43ec906fb8b2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '78faa28d-f3f8-45d9-b6cf-43ec906fb8b2',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '288',
   'date': 'Fri, 25 Nov 2022 19:01:24 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
'When I was younger, which means more vulnerable, my father gave me advice that I keep coming back to my mind: — When you feel like criticizing someone, think that not everyone has enjoyed the same benefits as you.'

### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**9/10.**

### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [45]:
eng_text = data['Informal_Text_1'].iloc[0]

In [46]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in my younger and more vulnerable years my father gave me some advice that i’ve been turning over in my mind ever since whenever you feel like criticizing any one he told me just remember that all the people in this world haven’t had the advantages that you’ve had


Then, converted to a list

In [47]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'my', 'younger', 'and', 'more', 'vulnerable', 'years', 'my', 'father', 'gave', 'me', 'some', 'advice', 'that', 'i’ve', 'been', 'turning', 'over', 'in', 'my', 'mind', 'ever', 'since', 'whenever', 'you', 'feel', 'like', 'criticizing', 'any', 'one', 'he', 'told', 'me', 'just', 'remember', 'that', 'all', 'the', 'people', 'in', 'this', 'world', 'haven’t', 'had', 'the', 'advantages', 'that', 'you’ve', 'had']


Next, converted the translated text to lowercase and removed special characters

In [48]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

'When I was younger, which means more vulnerable, my father gave me advice that I keep coming back to my mind: — When you feel like criticizing someone, think that not everyone has enjoyed the same benefits as you.'

In [49]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

when i was younger which means more vulnerable my father gave me advice that i keep coming back to my mind — when you feel like criticizing someone think that not everyone has enjoyed the same benefits as you


Then, converted this to a list too.

In [50]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['when', 'i', 'was', 'younger', 'which', 'means', 'more', 'vulnerable', 'my', 'father', 'gave', 'me', 'advice', 'that', 'i', 'keep', 'coming', 'back', 'to', 'my', 'mind', '—', 'when', 'you', 'feel', 'like', 'criticizing', 'someone', 'think', 'that', 'not', 'everyone', 'has', 'enjoyed', 'the', 'same', 'benefits', 'as', 'you']


Finally, we checked the percentage of words from translated text that matched the original text.

In [51]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.46153846153846156


We also checked the BLEU Score.

**2) BLEU Score**

In [52]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.1245581366486582


This indicates that the French to English translation using the AWS API did not perform well.

# Informal Text 2

**The original text in English is:**
I had a dog — at least I had him for a few days until he ran away — and an old Dodge and a Finnish woman, who made my bed and cooked breakfast and muttered Finnish wisdom to herself over the electric stove.

### Chinese to English
**The text in Chinese is:**
我有一条狗——至少在它跑掉以前我养了它几天——一辆旧道吉汽车和一个芬兰女佣人，她替我收拾床铺，烧早饭，在电炉上一面做饭，一面嘴里咕哝着芬兰的格言。

Retrieving and translating the text data

In [59]:
text = data['Informal_Text_2'].iloc[1]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")


{'TranslatedText': 'I have a dog — I kept it for at least a few days before it ran away — an old Doggie car and a Finnish maid who packed my bed, cooked breakfast, cooked on an electric stove, and muttered Finnish sayings in her mouth.',
 'SourceLanguageCode': 'zh',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'db652bed-59cd-4741-b900-56e68343dd42',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'db652bed-59cd-4741-b900-56e68343dd42',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '292',
   'date': 'Fri, 25 Nov 2022 19:07:14 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
I have a dog — I kept it for at least a few days before it ran away — an old Doggie car and a Finnish maid who packed my bed, cooked breakfast, cooked on an electric stove, and muttered Finnish sayings in her mout

#### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**8.5/10.**

#### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [60]:
eng_text = data['Informal_Text_2'].iloc[0]

special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

i had a dog at least i had him for a few days until he ran away and an old dodge and a finnish woman who made my bed and cooked breakfast and muttered finnish wisdom to herself over the electric stove


Then, converted to a list

In [71]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['i', 'had', 'a', 'dog', 'at', 'least', 'i', 'had', 'him', 'for', 'a', 'few', 'days', 'until', 'he', 'ran', 'away', 'and', 'an', 'old', 'dodge', 'and', 'a', 'finnish', 'woman', 'who', 'made', 'my', 'bed', 'and', 'cooked', 'breakfast', 'and', 'muttered', 'finnish', 'wisdom', 'to', 'herself', 'over', 'the', 'electric', 'stove']


Next, converted the translated text to lowercase and removed special characters.

In [72]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [73]:
special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

i have a dog i kept it for at least a few days before it ran away an old doggie car and a finnish maid who packed my bed cooked breakfast cooked on an electric stove and muttered finnish sayings in her mouth


Then, converted this to a list too.

In [74]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['i', 'have', 'a', 'dog', 'i', 'kept', 'it', 'for', 'at', 'least', 'a', 'few', 'days', 'before', 'it', 'ran', 'away', 'an', 'old', 'doggie', 'car', 'and', 'a', 'finnish', 'maid', 'who', 'packed', 'my', 'bed', 'cooked', 'breakfast', 'cooked', 'on', 'an', 'electric', 'stove', 'and', 'muttered', 'finnish', 'sayings', 'in', 'her', 'mouth']


Finally, we checked the percentage of words from translated text that matched the original text.

In [75]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.6744186046511628


We also checked the BLEU Score.

**2) BLEU Score**

In [76]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 4.193528232732243e-78


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


This indicates that the Chinese to English translation using the AWS API did not perform well.

### Spanish to English
**The text in Spanish is:**
Tenía un perro -o al menos lo tuve durante varios días, antes de que escapara-, un viejo Dodge y una criada oriunda de Finlandia que me tendía la cama, hacía el desayuno y mascullaba máximas finlandesas junto a la estufa eléctrica.

Retrieving and translating the text data

In [82]:
text = data['Informal_Text_2'].iloc[2]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'I had a dog -or at least I had one for several days, before I escaped-, an old Dodge and a maid from Finland who laid out my bed for me, made breakfast and mumbled Finnish maxims next to the electric stove.',
 'SourceLanguageCode': 'es',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'eb54930e-9b86-4383-be5b-1a1a6a657c62',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'eb54930e-9b86-4383-be5b-1a1a6a657c62',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '279',
   'date': 'Fri, 25 Nov 2022 19:14:19 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
'I had a dog -or at least I had one for several days, before I escaped-, an old Dodge and a maid from Finland who laid out my bed for me, made breakfast and mumbled Finnish maxims next to the electric stove.',

#### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**8.5/10.**

#### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [80]:
eng_text = data['Informal_Text_2'].iloc[0]

special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

i had a dog at least i had him for a few days until he ran away and an old dodge and a finnish woman who made my bed and cooked breakfast and muttered finnish wisdom to herself over the electric stove


Then, converted to a list

In [81]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['i', 'had', 'a', 'dog', 'at', 'least', 'i', 'had', 'him', 'for', 'a', 'few', 'days', 'until', 'he', 'ran', 'away', 'and', 'an', 'old', 'dodge', 'and', 'a', 'finnish', 'woman', 'who', 'made', 'my', 'bed', 'and', 'cooked', 'breakfast', 'and', 'muttered', 'finnish', 'wisdom', 'to', 'herself', 'over', 'the', 'electric', 'stove']


Next, converted the translated text to lowercase and removed special characters.

In [83]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [84]:
special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

i had a dog -or at least i had one for several days before i escaped- an old dodge and a maid from finland who laid out my bed for me made breakfast and mumbled finnish maxims next to the electric stove


Then, converted this to a list too.

In [85]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['i', 'had', 'a', 'dog', '-or', 'at', 'least', 'i', 'had', 'one', 'for', 'several', 'days', 'before', 'i', 'escaped-', 'an', 'old', 'dodge', 'and', 'a', 'maid', 'from', 'finland', 'who', 'laid', 'out', 'my', 'bed', 'for', 'me', 'made', 'breakfast', 'and', 'mumbled', 'finnish', 'maxims', 'next', 'to', 'the', 'electric', 'stove']


Finally, we checked the percentage of words from translated text that matched the original text.

In [86]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.6666666666666666


We also checked the BLEU Score.

**2) BLEU Score**

In [87]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.2566098980488463


This indicates that the Chinese to English translation using the AWS API didn't perform too well, but better than the other translations.

### French to English
**The text in French is:**
J’avais un chien – du moins je l’eus pendant quelques jours jusqu’à ce qu’il prît la clef des champs – une vieille auto Dodge et une Finlandaise qui faisait mon lit, préparait mon petit déjeuner et marmottait des proverbes finnois, en s’affairant devant le fourneau électrique.

Retrieving and translating the text data

In [88]:
text = data['Informal_Text_2'].iloc[3]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")


{'TranslatedText': 'I had a dog — at least I had him for a few days until he took the key to the fields — an old Dodge car and a Finnish woman who made my bed, cooked my breakfast, and marmot Finnish proverbs while working in front of the electric stove.',
 'SourceLanguageCode': 'fr',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '261aca4f-60e2-4841-9d8b-84d7ee276df5',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '261aca4f-60e2-4841-9d8b-84d7ee276df5',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '311',
   'date': 'Fri, 25 Nov 2022 19:15:02 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:**
'I had a dog — at least I had him for a few days until he took the key to the fields — an old Dodge car and a Finnish woman who made my bed, cooked my breakfast, and marmot Finnish proverbs while working in front of the electric stove.'

#### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning well enough. We would score it a 
**8.5/10.**

#### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [89]:
eng_text = data['Informal_Text_2'].iloc[0]

special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

i had a dog at least i had him for a few days until he ran away and an old dodge and a finnish woman who made my bed and cooked breakfast and muttered finnish wisdom to herself over the electric stove


Then, converted to a list

In [90]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['i', 'had', 'a', 'dog', 'at', 'least', 'i', 'had', 'him', 'for', 'a', 'few', 'days', 'until', 'he', 'ran', 'away', 'and', 'an', 'old', 'dodge', 'and', 'a', 'finnish', 'woman', 'who', 'made', 'my', 'bed', 'and', 'cooked', 'breakfast', 'and', 'muttered', 'finnish', 'wisdom', 'to', 'herself', 'over', 'the', 'electric', 'stove']


Next, converted the translated text to lowercase and removed special characters.

In [91]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [92]:
special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

i had a dog at least i had him for a few days until he took the key to the fields an old dodge car and a finnish woman who made my bed cooked my breakfast and marmot finnish proverbs while working in front of the electric stove


Then, converted this to a list too.

In [93]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['i', 'had', 'a', 'dog', 'at', 'least', 'i', 'had', 'him', 'for', 'a', 'few', 'days', 'until', 'he', 'took', 'the', 'key', 'to', 'the', 'fields', 'an', 'old', 'dodge', 'car', 'and', 'a', 'finnish', 'woman', 'who', 'made', 'my', 'bed', 'cooked', 'my', 'breakfast', 'and', 'marmot', 'finnish', 'proverbs', 'while', 'working', 'in', 'front', 'of', 'the', 'electric', 'stove']


Finally, we checked the percentage of words from translated text that matched the original text.

In [94]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)
    

0.7708333333333334


We also checked the BLEU Score.

**2) BLEU Score**

In [95]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.5075146872653443


This indicates that the Spanish to English translation using the AWS API performed decently well.

# Formal

# Formal Text 1

**The original text in English is:**  In consequence, I’m inclined to reserve all judgments, a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores. 


## Chinese to English

Retrieving and translating the text data

In [16]:
text = data['Formal_Text_1'].iloc[1]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'Over time, I got used to reserving judgment on everyone. This habit not only made many weird people willing to talk to me but also made me a victim of many annoying people who love to nag.',
 'SourceLanguageCode': 'zh',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '01c379cd-1489-493e-add8-c203e94f2161',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '01c379cd-1489-493e-add8-c203e94f2161',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '261',
   'date': 'Fri, 25 Nov 2022 19:34:08 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** 'Over time, I got used to keeping my judgment on everyone. This habit not only made many strange people willing to talk to me but also made me a victim of many annoying people who love to talk to me'

### Qualitative Analysis:
Based on our judgement as English speakers, we feel this translation was not very accurate and did not convey the meaning well enough. We would score it a 3/10.

### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.


In [17]:
eng_text = data['Formal_Text_1'].iloc[0]



special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in consequence i’m inclined to reserve all judgments a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores


Then, converted to a list

In [18]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'consequence', 'i’m', 'inclined', 'to', 'reserve', 'all', 'judgments', 'a', 'habit', 'that', 'has', 'opened', 'up', 'many', 'curious', 'natures', 'to', 'me', 'and', 'also', 'made', 'me', 'the', 'victim', 'of', 'not', 'a', 'few', 'veteran', 'bores']


Next, converted the translated text to lowercase and removed special characters

In [19]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [20]:

# A list of special_characters to be removed
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    # Replace the special character with an empty string
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

over time i got used to reserving judgment on everyone this habit not only made many weird people willing to talk to me but also made me a victim of many annoying people who love to nag


Then, converted this to a list too.

In [21]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['over', 'time', 'i', 'got', 'used', 'to', 'reserving', 'judgment', 'on', 'everyone', 'this', 'habit', 'not', 'only', 'made', 'many', 'weird', 'people', 'willing', 'to', 'talk', 'to', 'me', 'but', 'also', 'made', 'me', 'a', 'victim', 'of', 'many', 'annoying', 'people', 'who', 'love', 'to', 'nag']


Finally, we checked the percentage of words from translated text that matched the original text.

In [22]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.43243243243243246


We also checked the BLEU Score.

**2) BLEU Score**

In [23]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 2.0671366360407434e-78


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


This indicates that the Chinese to English translation using the AWS API did not perform well.

## Spanish to English

**The text in Spanish is:** En consecuencia, soy una persona dada a reservarme todo juicio, hábito que me ha facilitado el conocimiento de gran número de personas singulares, pero que también me ha hecho víctima de más de un latoso inveterado.

Retrieving and translating the text data

In [25]:
text = data['Formal_Text_1'].iloc[2]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'Consequently, I am a person willing to reserve all judgment, a habit that has facilitated the knowledge of a large number of natural people, but which has also made me the victim of more than one inveterate person.',
 'SourceLanguageCode': 'es',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '0fb364c2-f61b-4d8c-a338-c512d02d4567',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '0fb364c2-f61b-4d8c-a338-c512d02d4567',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '287',
   'date': 'Fri, 25 Nov 2022 19:40:17 GMT',
   'connection': 'close'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** 'Consequently, I am a person willing to reserve all judgment, a habit that has facilitated the knowledge of a large number of natural people, but which has also made me the victim of more than one inveterate person.'

### Qualitative Analysis:
Based on our judgement as English speakers, we feel this translation was relatively accurate and did convey the meaning well enough. We would score it a 7/10.

### Quantitative Analysis:

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [27]:
eng_text = data['Formal_Text_1'].iloc[0]


special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in consequence i’m inclined to reserve all judgments a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores


Then, converted to a list

In [28]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'consequence', 'i’m', 'inclined', 'to', 'reserve', 'all', 'judgments', 'a', 'habit', 'that', 'has', 'opened', 'up', 'many', 'curious', 'natures', 'to', 'me', 'and', 'also', 'made', 'me', 'the', 'victim', 'of', 'not', 'a', 'few', 'veteran', 'bores']


Next, converted the translated text to lowercase and removed special characters.

In [None]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']


In [29]:

special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

consequently i am a person willing to reserve all judgment a habit that has facilitated the knowledge of a large number of natural people but which has also made me the victim of more than one inveterate person


Then, converted this to a list too

In [30]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['consequently', 'i', 'am', 'a', 'person', 'willing', 'to', 'reserve', 'all', 'judgment', 'a', 'habit', 'that', 'has', 'facilitated', 'the', 'knowledge', 'of', 'a', 'large', 'number', 'of', 'natural', 'people', 'but', 'which', 'has', 'also', 'made', 'me', 'the', 'victim', 'of', 'more', 'than', 'one', 'inveterate', 'person']


Finally, we checked the percentage of words from translated text that matched the original text.

In [31]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.5


We also checked the BLEU Score.

**2) BLEU Score**

In [32]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.2141634115480047


This indicates that the Spanish to English translation using the AWS API did not perform too well.

## French to English 

The text in French is: En conséquence, je suis porté à réserver mes jugements, habitude qui m’a ouvert bien des natures curieuses, non sans me rendre victime de pas mal de raseurs invétérés.

Retrieving and translating the text data

In [37]:
text = data['Formal_Text_1'].iloc[3]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")


{'TranslatedText': 'As a result, I am inclined to reserve my judgments, a habit that has opened me up to many curious natures, not without making me the victim of a lot of inveterate razors.',
 'SourceLanguageCode': 'fr',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'ea9d3523-3036-4151-89db-aa1736e36749',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ea9d3523-3036-4151-89db-aa1736e36749',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '243',
   'date': 'Fri, 25 Nov 2022 19:46:42 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** As a result, I am inclined to reserve my judgments, a habit that has opened me up to many curious natures, not without making me the victim of a lot of inveterate razors.'

### Qualitative Analysis:
Based on our judgement as English speakers, we feel this translation was pretty accurate and conveyed the meaning relatively well. We would score it a 7/10.

### Quantitative Analysis:

**1) Percentage of words matching with original text** 

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [35]:
eng_text = data['Formal_Text_1'].iloc[0]


special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

in consequence i’m inclined to reserve all judgments a habit that has opened up many curious natures to me and also made me the victim of not a few veteran bores


Then, converted to a list

In [36]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['in', 'consequence', 'i’m', 'inclined', 'to', 'reserve', 'all', 'judgments', 'a', 'habit', 'that', 'has', 'opened', 'up', 'many', 'curious', 'natures', 'to', 'me', 'and', 'also', 'made', 'me', 'the', 'victim', 'of', 'not', 'a', 'few', 'veteran', 'bores']


Next, converted the translated text to lowercase and removed special characters.

In [38]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']


In [39]:


special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

as a result i am inclined to reserve my judgments a habit that has opened me up to many curious natures not without making me the victim of a lot of inveterate razors


Finally, we checked the percentage of words from translated text that matched the original text

In [40]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.5


We also checked the BLEU Score.

**2) BLEU Score**

In [41]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 0.3013761541024282


This indicates that the French to English translation using the AWS API did not too perform well, but performed better with this text than the Chinese and Spanish to English translations.


In [81]:
# translated_text = translate.translate_text(Text="久而久之，我就惯于对所有的人都保留判断，这个习惯既使得许多有怪僻的人肯跟我讲心里话，也使我成为不少爱唠叨的惹人厌烦的人的受害者.", SourceLanguageCode= "auto", TargetLanguageCode="en")
# chinese_text = translated_text['TranslatedText'].lower()


# Formal Text 2

**The original text in English is:** Almost any exhibition of complete self sufficiency draws a stunned tribute from me.

In [44]:
eng_text = data['Formal_Text_2'].iloc[0]

## Chinese to English

**The text in Chinese is:** 这种几乎是完全我行我素的神情总是使我感到目瞪口呆，满心赞佩。

Retrieving and translating the text data

In [49]:
text = data['Formal_Text_2'].iloc[1]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'This almost completely self-serving look always left me amazed and full of admiration.',
 'SourceLanguageCode': 'zh',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '93321ac6-2496-431c-b3b7-a0e0ece07485',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '93321ac6-2496-431c-b3b7-a0e0ece07485',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '159',
   'date': 'Fri, 25 Nov 2022 19:53:03 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** This almost completely self-serving look always left me amazed and full of admiration.

### Qualitative Analysis:

Based on our judgement as English speakers, we feel this translation was not extremely accurate and did not fully convey the meaning well enough. We would score it a 5/10.

### Quantitative Analysis

**1) Percentage of words matching with original text**



First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [46]:
eng_text = data['Formal_Text_2'].iloc[0]

In [47]:
special_characters=[',','.',':','?','!', "”", "“", "— "]

for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

almost any exhibition of complete self sufficiency draws a stunned tribute from me


Then, converted to a list

In [48]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['almost', 'any', 'exhibition', 'of', 'complete', 'self', 'sufficiency', 'draws', 'a', 'stunned', 'tribute', 'from', 'me']


Next, converted the translated text to lowercase and removed special characters.

In [51]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

'This almost completely self-serving look always left me amazed and full of admiration.'

In [52]:
special_characters=[',','.',':','?','!', "”", "“", "— "]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

this almost completely self-serving look always left me amazed and full of admiration


Then, converted this to a list too.

In [53]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['this', 'almost', 'completely', 'self-serving', 'look', 'always', 'left', 'me', 'amazed', 'and', 'full', 'of', 'admiration']


Finally, we checked the percentage of words from translated text that matched the original text.

In [54]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.23076923076923078


We also checked the BLEU Score.

**2) BLEU Score**

In [55]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 1.2627076138080564e-231


The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


This indicates that the Chinese to English translation using the AWS API did not perform well.

## Spanish to English 

**The text in Spanish is:** Casi cualquier exhibición de total autosuficiencia arranca de mí un atónito tributo.

Retrieving and translating the text data

In [57]:
text = data['Formal_Text_2'].iloc[2]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'Almost any display of total self-sufficiency draws an astonished tribute from me.',
 'SourceLanguageCode': 'es',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': '997a4bdd-2b47-4759-8e4f-18a0fde0da38',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '997a4bdd-2b47-4759-8e4f-18a0fde0da38',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '154',
   'date': 'Fri, 25 Nov 2022 19:54:37 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** 'Almost any display of total self-sufficiency draws an astonished tribute from me.'

### Qualitative Analysis

Based on our judgement as English speakers, we feel this translation was very accurate and did convey the meaning well enough. We would score it a 9/10.

### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [58]:
eng_text = data['Formal_Text_2'].iloc[0]


special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

almost any exhibition of complete self sufficiency draws a stunned tribute from me


Then, converted to a list

In [106]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['almost', 'any', 'exhibition', 'of', 'complete', 'self', 'sufficiency', 'draws', 'a', 'stunned', 'tribute', 'from', 'me']


Next, converted the translated text to lowercase and removed special characters.

In [59]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

In [60]:

special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

almost any display of total self-sufficiency draws an astonished tribute from me


Then, converted this to a list too.

In [61]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['almost', 'any', 'display', 'of', 'total', 'self-sufficiency', 'draws', 'an', 'astonished', 'tribute', 'from', 'me']


Finally, we checked the percentage of words from translated text that matched the original text.

In [62]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.5833333333333334


We also checked the BLEU Score.

**2) BLEU Score**

In [63]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 4.0622028886850106e-78


This indicates that the Spanish to English translation using the AWS API did not perform well.

## French to English 

**The text in French is:** N’importe quelle exhibition d’assurance m’extorque un tribut étonné.

Retrieving and translating the text data

In [65]:
text = data['Formal_Text_2'].iloc[3]
translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")

{'TranslatedText': 'Any insurance show extorts an astonishing tribute from me.',
 'SourceLanguageCode': 'fr',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'd7ebf606-1732-4fd2-a8f2-b042b1e9020b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd7ebf606-1732-4fd2-a8f2-b042b1e9020b',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '131',
   'date': 'Fri, 25 Nov 2022 19:57:06 GMT'},
  'RetryAttempts': 0}}

**From above, the AWS translation is:** Any insurance show extorts an astonishing tribute from me.'

### Qualitative Analysis

Based on our judgement as English speakers, we feel this translation was not accurate and did not convey the meaning well enough. We would score it a 5/10.

### Quantitative Analysis

**1) Percentage of words matching with original text**

First, we converted the original text to lowercase and removed special characters. This is to ensure a more fair and accurate comparison.

In [66]:
eng_text = data['Formal_Text_2'].iloc[0]

special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    eng_text=eng_text.replace(i,"")
    
eng_text = eng_text.lower()
print(eng_text)

almost any exhibition of complete self sufficiency draws a stunned tribute from me


Then, converted to a list

In [67]:
eng_text_words = eng_text.split(" ")
print(eng_text_words)

['almost', 'any', 'exhibition', 'of', 'complete', 'self', 'sufficiency', 'draws', 'a', 'stunned', 'tribute', 'from', 'me']


Next, converted the translated text to lowercase and removed special characters.

In [68]:
translation_info = translate.translate_text(Text= text, SourceLanguageCode = "auto", TargetLanguageCode = "en")
trans_text = translation_info['TranslatedText']

'Any insurance show extorts an astonishing tribute from me.'

In [69]:
special_characters=[',','.',':','?','!', "”", "“"]
for i in special_characters:
    trans_text=trans_text.replace(i,"")
    
trans_text = trans_text.lower()
print(trans_text)

any insurance show extorts an astonishing tribute from me


Then, converted this to a list too.

In [70]:
trans_text_words = trans_text.split(" ")
print(trans_text_words)

['any', 'insurance', 'show', 'extorts', 'an', 'astonishing', 'tribute', 'from', 'me']


Finally, we checked the percentage of words from translated text that matched the original text.

In [71]:
acc = 0
tot = 0
for word in trans_text_words:
    if word in eng_text_words:
        acc += 1
    tot += 1
    
score = acc / tot
print(score)

0.4444444444444444


We also checked the BLEU Score.

**2) BLEU Score**

In [72]:
reference = [
    trans_text.split()
]

candidate = eng_text.split()
print('BLEU score -> {}'.format(sentence_bleu(reference, candidate)))

BLEU score -> 3.191385619211647e-78


This indicates that the French to English translation using the AWS API did not perform well.