### **What is gTTS**
gTTS (Google Text-to-Speech)is a Python library and CLI tool to interface with Google Translate text-to-speech API. ... The output of the converted text is stored in the form of speech in the tts variable. The tts. save function allows us to save the converted speech in a format that allows us to play sounds.

In [13]:
# import packages
# !pip install gTTS
# !pip install PyPDF2

import re
import string
import gtts
import PyPDF2
from gtts import gTTS

In [2]:
# before we start let's see the availabel languages
gtts.lang.tts_langs()

{'af': 'Afrikaans',
 'ar': 'Arabic',
 'bg': 'Bulgarian',
 'bn': 'Bengali',
 'bs': 'Bosnian',
 'ca': 'Catalan',
 'cs': 'Czech',
 'cy': 'Welsh',
 'da': 'Danish',
 'de': 'German',
 'el': 'Greek',
 'en': 'English',
 'eo': 'Esperanto',
 'es': 'Spanish',
 'et': 'Estonian',
 'fi': 'Finnish',
 'fr': 'French',
 'gu': 'Gujarati',
 'hi': 'Hindi',
 'hr': 'Croatian',
 'hu': 'Hungarian',
 'hy': 'Armenian',
 'id': 'Indonesian',
 'is': 'Icelandic',
 'it': 'Italian',
 'ja': 'Japanese',
 'jw': 'Javanese',
 'km': 'Khmer',
 'kn': 'Kannada',
 'ko': 'Korean',
 'la': 'Latin',
 'lv': 'Latvian',
 'mk': 'Macedonian',
 'ml': 'Malayalam',
 'mr': 'Marathi',
 'my': 'Myanmar (Burmese)',
 'ne': 'Nepali',
 'nl': 'Dutch',
 'no': 'Norwegian',
 'pl': 'Polish',
 'pt': 'Portuguese',
 'ro': 'Romanian',
 'ru': 'Russian',
 'si': 'Sinhala',
 'sk': 'Slovak',
 'sq': 'Albanian',
 'sr': 'Serbian',
 'su': 'Sundanese',
 'sv': 'Swedish',
 'sw': 'Swahili',
 'ta': 'Tamil',
 'te': 'Telugu',
 'th': 'Thai',
 'tl': 'Filipino',
 'tr': 'Turk

In [3]:
%%html
<style>
table {float:left}
</style>

### Localized ‘accents’
For a given language, Google Translate text-to-speech can speak in different local ‘accents’ depending on the Google domain (google.<tld>) of the request, with some examples shown in the table below.

| Local accent  |  Language code (lang)  |  Top-level domain (tld) | 
|:------------- |:---:                   |:---:                    |
| English (Australia)  |  en  |  com.au | 
| English (United Kingdom)  |  en  |  co.uk | 
| English (United States)  |  en  |  com (default) | 
| English (Canada)  |  en  |  ca | 
| English (India)  |  en  |  co.in | 
| English (Ireland)  |  en  |  ie | 
| English (South Africa)  |  en  |  co.za | 
| French (Canada)  |  fr  |  ca | 
| French (France)  |  fr  |  fr | 
| Mandarin (China Mainland)  |  zh-CN  |  any | 
| Mandarin (Taiwan)  |  zh-TW  |  any | 
| Portuguese (Brazil)  |  pt  |  com.br | 
| Portuguese (Portugal)  |  pt  |  pt | 
| Spanish (Mexico)  |  es  |  com.mx | 
| Spanish (Spain)  |  es  |  es | 
| Spanish (United States)  |  es  |  com (default) | 

<b>Note</b>
This is an incomplete list. Try different combinaisons of language codes and [known localized Google domains](https://www.google.com/supported_domains). Feel free to add new combinaisons to this list via a Pull Request!


#### <B>Demo</B>

In [4]:
text = 'Welcome to Buggy Programmer!'
lang = 'en'

audio = gTTS(text=text, lang=lang, tld='com')
audio.save('gtts/sample1.mp3')

In [5]:
# if you are using jupyter/colab notebook
import IPython.display as ipd
ipd.Audio('gtts/sample1.mp3')

# # if you are not using Jupyter notebook or colab then use this
# import os
# os.system('new_audio')

In [6]:
text = 'This is text to speech conversion!'
tts = gTTS(text=text, lang='en', tld='com.au')
tts.save('gtts/sample2.mp3')

In [7]:
ipd.Audio('gtts/sample2.mp3')

#### _*Create Audiobook*_

In [9]:
# function for cleaning text
def clean_txt(text):
    """This function will remove redudant metadata from text"""
    a = list(string.punctuation)+list(string.whitespace)
    b = ['.', ',', '!', '?', '&', ' ']
    exclude = set([x for x in a if x not in b])
    
    txt = ''
    for x in text:
        txt+=x
    
    txt = ''.join([x.replace('\n', '') for x in text if x not in exclude]) # remove whitespace metadata
    txt = re.sub(pattern='(https?://)?w{3}\.\S*\.\S*', repl='', string=txt) # removes hyperlink
    return txt

# function for creating audiobook
def audioBook(file, save_as='audiobook.mp3', lang='en', tld='com.au'):
    file_type = file.split('.')[1].lower()
    # for pdf file → converting .pdf file to audio
    if file_type == 'pdf':
        all_txt = [] # ---> here we will store all pdf text 
        
        # reading pdf file
        pdfReader = PyPDF2.PdfFileReader(open(file, 'rb'))
        
        for page_num in range(pdfReader.numPages):
            text =  pdfReader.getPage(page_num).extractText()
            all_txt.append(str(text))
        
        # clean text
        txt = clean_txt(all_txt)
        
        # generate audio file
        tts = gTTS(text=txt, lang=lang, tld=tld)
        tts.save(f'gtts/{save_as}')
        print("file converted successfully")        
              
    
    # for text file →  converting text file to audio
    elif file_type == 'txt':
        all_txt = []
        with open(file) as f: 
            lines = f.readlines()
            all_txt += lines 
        f.close()    
        
        # clean text
        txt = clean_txt(all_txt)
        
        # generate audiofile
        tts = gTTS(text=txt, lang=lang, tld=tld)
        tts.save(f'gtts/{save_as}')
        
        print("file converted successfully")
    
    else:
        print(f'".{file_type}" file type not supported')

In [10]:
# generating audiobook from pdf file
audioBook(file='files/Clean Code.pdf', save_as='cleancode.mp3')
ipd.Audio('gtts/cleancode.mp3')

In [14]:
# generating audiobook from txt file
audioBook(file='files/Elon.txt', save_as='elon.mp3')
ipd.Audio('gtts/elon.mp3')

file converted successfully
