# Babel fish echo

## Implementation steps:
1. Add audio file
  - let the user select from 2 options: record an audio or upload a .wav file
2. Convert speech to text
  - tune recognizer for Russian speech
3. Translate text to English
4. Translate text to Russian
5. Convert text to speech
  - set Russian accent
6. Save resultant speech in the file "result.wav"

## Used technical stack:
- For speech recognizing - Google API - [rationale](http://ceur-ws.org/Vol-2298/paper13.pdf)
- For text translation - deep_translator package - rationale: easy to use, no need in special credentials
- For text-to-speech convertion - gtts package - rationale: easy to use

## MVP specifications:
1. Actions are performed in a step-by-step manner meaning that each processing stage (recognition-translation-conversion) is executed after the previous one is done. So, it is not right on the fly.
2. That's why there is a limit for duration of the recording.
3. Running the program resembles playing a game Chinese whispers: after several steps of processing the inintial audio input turns into some funny output. 


In [1]:
!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
!pip install pyaudio

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libasound2-dev is already the newest version (1.1.3-5ubuntu0.6).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
Suggested packages:
  portaudio19-doc
The following NEW packages will be installed:
  libportaudio2 libportaudiocpp0 portaudio19-dev
0 upgraded, 3 newly installed, 0 to remove and 40 not upgraded.
Need to get 184 kB of archives.
After this operation, 891 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libportaudio2 amd64 19.6.0-1 [64.6 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libportaudiocpp0 amd64 19.6.0-1 [15.1 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 portaudio19-dev amd64 19.6.0-1 [104 kB]
Fetched 184 kB in 1s (307 kB/s)
Selecting previously unselected package libportaudio2:amd64.
(Reading database ... 148492 files and directories currently installed.)
Preparing to 

In [2]:
!pip install wavio
!pip install SpeechRecognition
!pip install pyttsx3
!pip install deep-translator
!pip install gTTS

Collecting wavio
  Downloading wavio-0.0.4-py2.py3-none-any.whl (9.0 kB)
Installing collected packages: wavio
Successfully installed wavio-0.0.4
Collecting SpeechRecognition
  Downloading SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8 MB)
[K     |████████████████████████████████| 32.8 MB 32 kB/s 
[?25hInstalling collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.8.1
Collecting pyttsx3
  Downloading pyttsx3-2.90-py3-none-any.whl (39 kB)
Installing collected packages: pyttsx3
Successfully installed pyttsx3-2.90
Collecting deep-translator
  Downloading deep_translator-1.5.4-py3-none-any.whl (29 kB)
Collecting click<9.0.0,>=8.0.1
  Downloading click-8.0.1-py3-none-any.whl (97 kB)
[K     |████████████████████████████████| 97 kB 7.2 MB/s 
[?25hCollecting beautifulsoup4<5.0.0,>=4.9.1
  Downloading beautifulsoup4-4.10.0-py3-none-any.whl (97 kB)
[K     |████████████████████████████████| 97 kB 7.8 MB/s 
Collecting soupsieve>1.2
  Downloading soupsieve-2.2.1

In [3]:
import speech_recognition as sr
import pyaudio
import wave
import pyttsx3
from deep_translator import GoogleTranslator
from gtts import gTTS 
import IPython.display as ipd

In [4]:
def record_to_file(filename, player, seconds):
    # default settings for record and playback
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 22050
    RECORD_SECONDS = seconds

    stream = player.open(
                input=True,
                format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                frames_per_buffer=CHUNK)

    print("Start recording... ", end="")
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)  
        frames.append(data)
    print("Recorded", seconds, "second(s)")
    stream.stop_stream()
    stream.close()

    wf = wave.open(filename, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(player.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()


def add_audio():
    option = input('Would you like to record audio or upload it? Enter "r" or "u": ')
    if option == 'r':
        filename = 'myfile.wav'
        seconds = input('For how long would you like to record? Enter number of seconds (the limit is 60 seconds): ')
        print("Please, speak loudly and clearly. "
              "The quality of produced translation depends on the quality of an initial audio.")
        record_to_file(filename, pyaudio.PyAudio(), int(seconds))
        return filename
    if option == 'u':
        filename = input('Upload a file and enter its name: ')
        return filename
    else:
        add_audio()


def play_audio(filename):
    ipd.display(ipd.Audio(filename))
    print("Played.")

In [5]:
class BabelFish:

    def recognize_speech(self, filename, language_from):
      recognizer = sr.Recognizer()
      audio_file = sr.AudioFile(filename)

      with audio_file as source:
        audio = recognizer.record(source)

      text = None
      try:
        text = recognizer.recognize_google(audio, language=language_from)
        print("You said : {}".format(text))
      except:
        print("Sorry could not recognize what you said")
      return text


    def translate_text(self, text, language_from, language_to):
      translated_text = GoogleTranslator(source=language_from, target=language_to).translate(text)
      print("Translation: ")
      print(translated_text)
      return translated_text


    def convert_text(self, text, language_to):
      tts = gTTS(text, lang=language_to) 
      filename = 'result.wav'
      tts.save(filename) 
      return filename

In [6]:
def main():
    fish = BabelFish()
    # 1. record/upload audio file
    filename = add_audio()

    # 2. convert speech to text
    text = fish.recognize_speech(filename, "ru-RU")

    # 3. translate from Russian to English
    translated_text = fish.translate_text(text, 'ru', 'en')

    # 4. translate from English to Russian
    translated_text_v2 = fish.translate_text(translated_text, 'en', 'ru')

    # 5. convert text to speech
    speech_file = fish.convert_text(translated_text_v2, 'ru')

    # 6. play the result
    play_audio(speech_file)


main()

Would you like to record audio or upload it? Enter "r" or "u": u
Upload a file and enter its name: myfile.wav
You said : спустя неделю после рождения нашей дочери Лорен мы с Боней чувствовали себя совершенно измотан нами ночами ребёнок то и дело будет у нас во время родов Кубани были сильные разрывы и ей приходилось принимать болеутоляющие даже ходить ей было нелегко пять дней я просидел дома помогаю жене но потом Разумеется мне пришлось снова выйти на работу в бане казалось начала выздоравливать когда меня не было дома она обнаружила что у неё закончилась лекарства вместо того чтобы позвонить мне на работу Она попросила сходить за таблетками одного из моих братьев которые как раз зашёл навестить её Однако тот Видимо забыла поручение в результате Бонни про мучилась от боли целый день вынуждены при этом ещё и возиться с новорождённой и я даже понятия не имел что этот день оказался для неё таким ужасным
Translation: 
a week after the birth of our daughter Lauren, Bonya and I felt complet

Played.
