# Reducing Localization costs and improving accuracy with Amazon Translate

This is an accompanying notebook for Chapter 10 - Reducing locationlization costs and improving accuracy from the Natural Language Processing with AWS AI Services book. Please make sure to read the instructions provided in the book prior to attempting this notebook. In this chapter we will walkthrough a solution example of how to automate the translation of your web pages and save on localization costs using Amazon Translate. Organizations looking to expand internationally no longer have to implement time consuming and cost prohibitive localization projects to change their web pages, they can leverage [Amazon Translate](https://aws.amazon.com/translate/) which is a neural ML powered translation service as part of the development lifecycle to automatically convert web pages into multiple languages. We will show you how in this notebook. 

## Input HTML Web Page

For this example we will use an `About Us` HTML and Javascript page the authors created for the fictional **Family Bank**, a subsidiary of the fictional LiveRight financial organization. The page looks as shown in the cell below and is assumed to be part of an overall organizational website that has an `About Us` link leading to this page. 

In [None]:
# display the About Us page
from IPython.display import IFrame
IFrame(src='./input/aboutLRH.html', width=800, height=400)

#### Let us now review the HTML and Javascript source code for this page
As we see below, this has a small HTML div block, and a corresponding Script block to print the current date. The Style block provides some CSS styling for our page.

In [None]:
!pygmentize './input/aboutLRH.html'

## Prepare for Translation

In [None]:
# Install the HTML parser
!pip install beautifulsoup4

In [None]:
html_doc = ''
input_htm = './input/aboutLRH.html'
with open(input_htm) as f:
    content = f.readlines()
for i in content:
    html_doc += i+' '

In [None]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

In [None]:
# HTML tags containing text we are interested in translating
tags = ['title','h1','h2','p']

In [None]:
# Now we will extract the text content from the HTML for each tag in our tags list and load this to a new dict
x_dict = {}
for tag in tags:
    x_dict[tag] = getattr(getattr(soup, tag),'string')
x_dict

## Translate to target languages
We will now translate the input text from English to German, Spanish, Tamil and Hindi

In [None]:
import boto3

translate = boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True)
out_text = {}
languages = ['de','es','ta','hi']

for target_lang in languages:
    out_dict = {}
    for key in x_dict:
        result = translate.translate_text(Text=x_dict[key], 
            SourceLanguageCode="en", TargetLanguageCode=target_lang)
        out_dict[key] = result.get('TranslatedText')
    out_text[target_lang] = out_dict

print("German Version of Website Text")
print("******************************")
print(out_text['de'])
print("******************************")
print("Spanish Version of Website Text")
print("******************************")
print(out_text['es'])
print("******************************")
print("Tamil Version of Website Text")
print("******************************")
print(out_text['ta'])
print("******************************")
print("Hindi Version of Website Text")
print("******************************")
print(out_text['hi'])
print("******************************")


## Build webpages for translated text
We will now create separate HTML web pages for each of the translated languages and display them

### German Webpage

In [None]:
web_de = soup

In [None]:
web_de.title.string = out_text['de']['title']
web_de.h1.string = out_text['de']['h1']
web_de.h2.string = out_text['de']['h2']
web_de.p.string = out_text['de']['p']

In [None]:
de_html = web_de.prettify()
with open('./output/aboutLRH_DE.html','w') as de_w:
    de_w.write(de_html)

In [None]:
# display the About Us page in German
from IPython.display import IFrame
IFrame(src='./output/aboutLRH_DE.html', width=800, height=500)

### Spanish Webpage

In [None]:
web_es = soup
web_es.title.string = out_text['es']['title']
web_es.h1.string = out_text['es']['h1']
web_es.h2.string = out_text['es']['h2']
web_es.p.string = out_text['es']['p']

In [None]:
es_html = web_es.prettify()
with open('./output/aboutLRH_ES.html','w') as es_w:
    es_w.write(es_html)

In [None]:
# display the About Us page in German
from IPython.display import IFrame
IFrame(src='./output/aboutLRH_ES.html', width=800, height=500)

### Hindi Webpage

In [None]:
web_hi = soup
web_hi.title.string = out_text['hi']['title']
web_hi.h1.string = out_text['hi']['h1']
web_hi.h2.string = out_text['hi']['h2']
web_hi.p.string = out_text['hi']['p']

In [None]:
hi_html = web_hi.prettify()
with open('./output/aboutLRH_HI.html','w') as hi_w:
    hi_w.write(hi_html)

In [None]:
# display the About Us page in German
from IPython.display import IFrame
IFrame(src='./output/aboutLRH_HI.html', width=800, height=500)

### Tamil Webpage

In [None]:
web_ta = soup
web_ta.title.string = out_text['ta']['title']
web_ta.h1.string = out_text['ta']['h1']
web_ta.h2.string = out_text['ta']['h2']
web_ta.p.string = out_text['ta']['p']

In [None]:
ta_html = web_ta.prettify()
with open('./output/aboutLRH_TA.html','w') as ta_w:
    ta_w.write(ta_html)

In [None]:
# display the About Us page in German
from IPython.display import IFrame
IFrame(src='./output/aboutLRH_TA.html', width=800, height=500)

## End of Notebook
Please return back to the book to continue reading from there