<a href="https://colab.research.google.com/github/Tor-Storli/COLAB_DEMOS/blob/master/Language_Translator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Language Translator - Python**
### **<font color='red'>Google Translator</font>.**

#### ***`Googletrans`*** is a free and unlimited python library that implemented Google Translate API. This uses the Google Translate Ajax API to make calls to such methods as detect and translate.

##### For more information - visit: https://pypi.org/project/googletrans/

### **<font color='blue'>Features</font>.**
- Fast and reliable - it uses the same servers that
- translate.google.com uses
- Auto language detection
- Bulk translations
- Customizable service URL
- Connection pooling (the advantage of using requests.Session)
- HTTP/2 support

### Install modules

# **<font color='blue'>Google traslator - Walk through</font>.**

In [0]:
!pip install googletrans
#!pip install matplotlib
!pip install beautifulsoup4
!pip install requests


### Import modules

In [0]:
from googletrans import Translator
import requests
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import time
from tabulate import tabulate

### Create an instance of the Translator object

In [0]:
translator = Translator()

### You can find the language codes that Google translate supports here...
#### https://cloud.google.com/translate/docs/languages

####  **You can find ISO 639-1 language  codes here...**
##### https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

#### English to German

In [0]:
output = translator.translate(src='en', dest='de', text='Good Morning! How are you?')
print(output)

#### English to Spanish

In [0]:
output = translator.translate(src='en', dest='es', text='Good Morning! How are you?')
print(output)

#### English to French

In [0]:
translator= Translator(service_urls=['translate.google.com'])
translation = translator.translate("Good Morning! How are you?",dest='fr')
print(translation)

#### English to Norwegian

In [0]:
translation = translator.translate("Good Morning! How are you? I am going for a walk. Do you want to join me?",dest="no")
print(translation)

#### English to Hindi

In [0]:
translation = translator.translate("Good Morning India! How are you? What a beautiful day!", dest='hi')
print(translation)

#### Then Hindi to Norwegian

In [0]:
translation = translator.translate("गुड मॉर्निंग भारत! क्या हाल है? वाह क्या सुंदर दिन है!",src='hi', dest='no')
print(translation)

#### Norwegian to Hindi

In [0]:
translation = translator.translate("God Morgen India! Hvordan har du det? For en vakker dag!",src='no', dest='hi')
print(translation)

### **English to Chinese**

#### Chinese (Simplified)

In [0]:
translation = translator.translate("Good Morning! How are you?", dest='zh-CN')
print(translation)

#### Chinese (Traditional)

In [0]:
translation = translator.translate("Good Morning! How are you?", dest='zh-TW')
print(translation)

#### English to Danish

In [0]:
translation = translator.translate("Good morning! I am on my way to work. Is the coffee ready?", dest='da')
print(translation)

#### Then Norwegian to English

In [0]:
translation = translator.translate("God morgen. Jeg skal lage litt kaffe, så kan vi spise frokost.", dest='en')
print(translation)

## Bulk translation
### You can also add a list of text from different languages that will translate to default language (in this case English).



In [0]:
translations = translator.translate(['God morgen. Jeg skal lage litt kaffe, så kan vi spise frokost.', 'गुड मॉर्निंग भारत! क्या हाल है? वाह क्या सुंदर दिन है!', 'Buenos días! ¿Cómo estás?'])
for translation in translations:
  print(translation.origin, ' -> ', translation.text)

### OR... You can also add a list of text from different languages that will translate to a different destination language, i.e. Greek.

In [0]:
translations = translator.translate(['God morgen. Jeg skal lage litt kaffe, så kan vi spise frokost.', \
                                     'गुड मॉर्निंग भारत! क्या हाल है? वाह क्या सुंदर दिन है!', 'Buenos días! ¿Cómo estás?'], dest='el')
for translation in translations:
   print(translation.origin, ' -> ', translation.text)

## Language Detection
The detect method, as its name implies, identifies the language used in a given sentence

In [0]:
languages = [('no','Norwegian'),('de','German'),('bg','Bulgarian'),('fi','Finnish'), \
             ('el','Greek'),('hi','Hindi'),('es','Spanish'),('fr','French'),('zh-TW','Chinese (Traditional)')] 

lang1 = translator.detect('God morgen. Jeg skal lage litt kaffe, så kan vi spise frokost.').lang
lang2 = translator.detect('गुड मॉर्निंग भारत! क्या हाल है? वाह क्या सुंदर दिन है!').lang
lang3 = translator.detect('Buenos días! ¿Cómo estás?').lang

for language in languages:
    if language[0] == lang1:
      print(translator.detect('God morgen. Jeg skal lage litt kaffe, så kan vi spise frokost.'), '-> ' + language[1].upper())
    if language[0] == lang2:
      print(translator.detect('गुड मॉर्निंग भारत! क्या हाल है? वाह क्या सुंदर दिन है!'), '-> ' + language[1].upper())
    if language[0] == lang3:
      print(translator.detect('Buenos días! ¿Cómo estás?'),  '-> ' + language[1].upper())

## **Scrape the web for news and translate news articles into different languages**

## **A big Thanks to <font color='green'>Miguel Fernández Zafra.</font>**
### for sharing the Web scraping code used below.
### Check it out at this location...
https://towardsdatascience.com/web-scraping-news-articles-in-python-9dd605799558


In [0]:
# Newspaper url
url = "https://www.theguardian.com/uk"

### Use the 'View page source' in the web page by right-clicking on the page  and search for `<h3 class="fc-item__title">` to find the Titles of the articles
### and `<div class="content__article-body from-content-api js-article__body"` to find the content for each article

In [0]:
# Request
r1 = requests.get(url)
r1.status_code

# We'll save in coverpage the cover page content
coverpage = r1.content

# Soup creation
soup1 = BeautifulSoup(coverpage, 'html5lib')

# News identification
coverpage_news = soup1.find_all('h3', class_='fc-item__title')
len(coverpage_news)

In [0]:
number_of_articles = 10

In [0]:
pd.set_option('display.max_colwidth', 120)

In [0]:
# Empty lists for content, links and titles
news_contents = []
list_links = []
list_titles = []

for n in np.arange(0, number_of_articles):
        
    # We need to ignore "live" pages since they are not articles
    if "live" in coverpage_news[n].find('a')['href']:  
        continue
    
    # Getting the link of the article
    link = coverpage_news[n].find('a')['href']
    list_links.append(link)
    
    # Getting the title
    title = coverpage_news[n].find('a').get_text()
    list_titles.append(title)
    
    # Reading the content (it is divided in paragraphs)
    article = requests.get(link)
    article_content = article.content
    soup_article = BeautifulSoup(article_content, 'html5lib') # To read about this syntax - go here: https://beautiful-soup-4.readthedocs.io/en/latest/#line-numbers
    body = soup_article.find_all('div', class_='content__article-body from-content-api js-article__body')
    x = body[0].find_all('p')
    
    # Unifying the paragraphs
    list_paragraphs = []
    for p in np.arange(0, len(x)):
        paragraph = x[p].get_text()
        list_paragraphs.append(paragraph)
        final_article = " ".join(list_paragraphs)
        
    news_contents.append(final_article)


In [0]:
#numpy.arange([start, ]stop, [step, ]
# includes start number but not end number (not 5)
# ================================================
print('First Loop')
for i in np.arange(0,5):
    print(i)
print('')
print('Second Loop - Step 2')
print('')      
for i in np.arange(0,5,2):
    print(i)
                    


In [0]:
?requests.get

In [0]:
?BeautifulSoup.find_all

In [0]:
# df_show_info
df_show_info = pd.DataFrame(
    {'Article Title': list_titles})
df_show_info

## **Let us take one more step and run the news stories through our translator!**

In [0]:
def get_language_desc(language_code,language):
    translator = Translator()
    source_language_code = 'en'
    dest_language = language
    dest_language_code = language_code
    df = df_show_info.loc[:,['Article Title']]
    df[dest_language] = ""
    for index, row in df.iterrows():
        text = translator.translate(row['Article Title'], src=source_language_code, dest=dest_language_code).text
        row[dest_language] = text

    print(tabulate(df,headers=('Article Title',dest_language + ' Translation')))

In [0]:
languages = [('no','Norwegian'),('de','German'),('bg','Bulgarian'),('fi','Finnish'), \
             ('el','Greek'),('hi','Hindi'),('es','Spanish'),('fr','French'),('zh-TW','Chinese (Traditional)')]
s = '='
n = 180

for language in languages:
  print(' \n ')
  print(language[1].upper()) 
  print(''.join([char*n for char in s]))
  get_language_desc(language[0],language[1])
 

## Writing and reading to/from disk

In [0]:
import os

In [0]:
def write_to_file(dest_language_code,language,content,index):
    file_name = os.path.join('/content/drive/My Drive/Language_Translator/',language + '_' + index + '.txt')
    with open(file_name, 'w') as f:
         text = translator.translate(content, src='en', dest=dest_language_code).text
         f.write("%s\n" % text)

In [0]:
languages = [('no','Norwegian'),('de','German'),('bg','Bulgarian'),('fi','Finnish'),('el','Greek'),('hi','Hindi'),('es','Spanish'),('fr','French')]
n = 0
translator = Translator()
for article in news_contents:
    for language in languages:
        write_to_file(language[0],language[1],article, str(n))
    n = n + 1

In [0]:
def read_from_file(dest_language_code,language, index):
    file_name = os.path.join('/content/drive/My Drive/Language_Translator/',language + '_' + index + '.txt')
    with open(file_name, 'r') as f:
         
         print(f.read())

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
languages = [('no','Norwegian'),('de','German'),('bg','Bulgarian'),('fi','Finnish'),('el','Greek'),('hi','Hindi'),('es','Spanish'),('fr','French')]
n = 0
s = '='
n = 180
for language in languages:
    print(' \n ')
    print(language[1].upper()) 
    print(''.join([char*n for char in s]))
    for n in range(number_of_articles - 1):
        read_from_file(language[0],language[1], str(n))

### ================================== END OF MODULE ==================================