<a href="https://colab.research.google.com/github/arponbasu/ITSP_Noise_Reduction_Project_2ndSem/blob/master/DataFilesGeneration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install gTTS
!pip install pydub
print("Process Completed...")

Collecting gTTS
  Downloading https://files.pythonhosted.org/packages/5f/b9/94e59337107be134b21ce395a29fc0715b707b560108d6797de2d93e1178/gTTS-2.2.2-py3-none-any.whl
Installing collected packages: gTTS
Successfully installed gTTS-2.2.2
Collecting pydub
  Downloading https://files.pythonhosted.org/packages/a6/53/d78dc063216e62fc55f6b2eebb447f6a4b0a59f55c8406376f76bf959b08/pydub-0.25.1-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.25.1
Process Completed...


In the code below we generate both clean (by applying a text to Speech API on a sentence database) and noisy audio clips to be used as testing datasets for a model designed for denoising the clips.

gTTS is the Google Text to Speech API in python, and pydub is an audio processing library. We'll use both quite extensively below.

In [4]:
import IPython
from scipy.io import wavfile
import scipy.signal
import numpy as np
import matplotlib.pyplot as plt
import librosa
%matplotlib inline
print("Process Completed...")

Process Completed...


Given below is a demonstration of the gTTS library (https://www.geeksforgeeks.org/convert-text-speech-python/). The tld argument stands for the accent/ dialect we want to hear in speech. co.in represents "Indian English". 

In [5]:
from gtts import gTTS
from pydub import AudioSegment
import os
language = 'en'

mytext = 'My name is Arpon Basu.'
myobj = gTTS(text=mytext,tld='co.in', lang=language, slow=False)
myobj.save("welcome.mp3")
sound = AudioSegment.from_mp3("welcome.mp3")
sound.export("welcome.wav", format="wav")
print("Process Completed...")

Process Completed...


Given below is the process used for rendering a wav file into an audio clip to be played within the notebook itself. This was taken from https://timsainburg.com/noise-reduction-python.html, also given in the references. 

In [6]:
import IPython
wav_loc = "welcome.wav"
src_rate, src_data = wavfile.read(wav_loc)
IPython.display.Audio(data = src_data, rate = src_rate)

Below we have opened a file named 'ActualSent.txt' containing 10,000 sentences, which was taken from the Tatoeba database (https://tatoeba.org/en/downloads). 
Those contained indexed sentences along with a tag 'eng', which were spliced and removed.
Note that the file Actual sent.txt was uploaded through the library google.colab from our local devices. However, since that piece of code would stall the serial execution (as it would ask you to upload that file from your device, we haven't included that cell).

In [7]:
file1 = open('ActualSent.txt', 'r')
Lines = file1.readlines()
 
count = 0
sents = []

for line in Lines:
    count += 1
    index = line.strip().find('eng') 
    str = (line.strip())[index:]
    str = str[4:]
    sents.append(str)
    print(str)

print("Process Completed...")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
The holiday continues to be very boring.
I'm dying to see Kumiko.
Kumi did not talk about her club.
Hearing this song after so long really brings back the old times.
After a long absence he came back.
An old friend of mine dropped in on me for the first time in ages.
It's been so long since we last met up. Let's have a drink or two and talk about the good old days.
I haven't seen him for a long time.
Please forgive me for not having written for a long time.
Will Mr Oka teach English?
After running up the hill, I was completely out of breath.
The hills are bathed in sunlight.
The hill lay covered with snow.
The hill slopes downward to the river.
The hill was all covered with snow.
The hill glows with autumnal colors.
The hill is always green.
You see a white building at the foot of the hill.
The wind blew harder yet when we reached the top of the hill.
The building which stands on the hillside is our school.
A castle stand

The following function assigns an accent based on the (language) code passed to it. The language in all 4 cases however, remains English.

In [8]:
def AssignAccent(code):
  if code == 1: return 'com.au' #Australian Accent
  elif code == 2: return 'co.uk' #British Accent
  elif code == 3: return 'com' #American Accent
  elif code == 4: return 'co.in' #Indian Accent
  else: return 'co.in' #Indian Accent
print("Process Completed...")

Process Completed...


Note that ideally we should've (would've?) parsed the entire file 'ActualSent.txt' to obtain our datasets. However, since the file is too large, we get a HTTP error 429 (from the gTTs API) when we pass the entire file.
Thus we've limited ourselves to just 1800 here, but one can just change that ceiling here (to anything less than or equal to 10,000) and the rest of the code will work perfectly. However, my experience tells that it generally gives the error 429 at around ~1850, so we have extracted the maximum we could have from this API.
A possible work-around the error could be a token bucket/ leaky bucket which would limit the number of requests sent per unit amount of time.

In [9]:
UpperLimit = 1800
print("Process Completed...")

Process Completed...


Here we create serialized file names through a list comprehension, and store the speech verion of each sentence into the corresponding file.
Note that 'cap' has really no use here, and one could as well replace it with (len(sents)+1), but it was pre-emptively put in place just to prevent crashes.
Also note that the accent of speech is chosen randomly.

In [10]:
from gtts import gTTS
from pydub import AudioSegment
import os
import random

count = 0
cap = int(0.5*len(sents)+1)
file_names_mp3=['file{}.mp3'.format(i) for i in range(1,cap)] #len(sents)+1
file_names_wav=['file{}.wav'.format(i) for i in range(1,cap)] #len(sents)+1

#print(file_names)
print("Saving sentences in  mp3 format....")
for sentence in sents:
   ind = sents.index(sentence)
   if ind > UpperLimit:
     break
   myobj = gTTS(text = sentence, tld = AssignAccent(random.randint(1, 4)), lang = 'en', slow = False)
   myobj.save(file_names_mp3[ind])
   
print("Saving sentences in  mp3 format complete")

Saving sentences in  mp3 format....
Saving sentences in  mp3 format complete


Note that most sound processing application files deal with wav files, and hence the mp3 files generated by gTTs have to be converted first.

In [11]:
print("Converting  mp3 format to .wav format....")
iterator = 0
for audio_mp3 in file_names_mp3:
   iterator = iterator + 1
   if iterator > UpperLimit + 1:
     break
   sound = AudioSegment.from_mp3(audio_mp3)
   sound.export(file_names_wav[file_names_mp3.index(audio_mp3)], format="wav")
print("Convertion of all  mp3s to wav is complete")

Converting  mp3 format to .wav format....
Convertion of all  mp3s to wav is complete


A 100 different types of noise commonly encountered in real life (obtained from http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html) were uploaded above as noise files (The code cell, as with 'ActualSent.txt', hasn't been included).

The following function assigns SNR based on the mode given to it. It's used below in generating noisy files with varying intensities of noise.

In [13]:
def AssignSNR (mode,scale=1):
  if mode <= 4: return (scale*mode)
  else: return 20
print("Process Completed...")

Process Completed...


The following code, also inspired from https://timsainburg.com/noise-reduction-python.html, generates as many "noisy files" as clean ones, by randomly mixing with it a noise file (from n1.wav, n2.wav, ..., n100.wav) in a randomly chosen ratio (which is the SNR), and the resulting files are also saved.

In [22]:
import random
from scipy.io.wavfile import write

corrupt_clip_wav = ['noisy_clip{}.wav'.format(k) for k in range(1,UpperLimit+2)]

for j in range(1, UpperLimit + 2):
    
    wav_loc = 'file{}.wav'.format(j)
    noise_loc = 'n{}.wav'.format(random.randint(1, 100))

    src_rate, src_data = wavfile.read(wav_loc)
    src_data = src_data / 32768
    
    noise_rate, noise_data = wavfile.read(noise_loc)
  
  # get some noise to add to the signal
    rem = (len(src_data))%(len(noise_data))
    quotient = len(src_data) // len(noise_data) 
    noise_to_add = noise_data[:rem]
    for i in range(quotient):
        noise_to_add = np.concatenate([noise_to_add, noise_data])
    noise_to_add = noise_to_add / max(noise_to_add)
  
  
  #  apply noise
    snr = AssignSNR(random.randint(1,4))  # signal to noise ratio
    corrupt_clip = src_data + noise_to_add / snr
    
    scaled = np.int16(corrupt_clip/np.max(np.abs(corrupt_clip)) * 32767)
    write(corrupt_clip_wav[j-1], src_rate, scaled)
    
    
print("Process Completed...")

Process Completed...


The command below confirms that the noisy files have indeed been produced and stored properly.

In [29]:
!ls -a | grep noisy

noisy_clip1000.wav
noisy_clip1001.wav
noisy_clip1002.wav
noisy_clip1003.wav
noisy_clip1004.wav
noisy_clip1005.wav
noisy_clip1006.wav
noisy_clip1007.wav
noisy_clip1008.wav
noisy_clip1009.wav
noisy_clip100.wav
noisy_clip1010.wav
noisy_clip1011.wav
noisy_clip1012.wav
noisy_clip1013.wav
noisy_clip1014.wav
noisy_clip1015.wav
noisy_clip1016.wav
noisy_clip1017.wav
noisy_clip1018.wav
noisy_clip1019.wav
noisy_clip101.wav
noisy_clip1020.wav
noisy_clip1021.wav
noisy_clip1022.wav
noisy_clip1023.wav
noisy_clip1024.wav
noisy_clip1025.wav
noisy_clip1026.wav
noisy_clip1027.wav
noisy_clip1028.wav
noisy_clip1029.wav
noisy_clip102.wav
noisy_clip1030.wav
noisy_clip1031.wav
noisy_clip1032.wav
noisy_clip1033.wav
noisy_clip1034.wav
noisy_clip1035.wav
noisy_clip1036.wav
noisy_clip1037.wav
noisy_clip1038.wav
noisy_clip1039.wav
noisy_clip103.wav
noisy_clip1040.wav
noisy_clip1041.wav
noisy_clip1042.wav
noisy_clip1043.wav
noisy_clip1044.wav
noisy_clip1045.wav
noisy_clip1046.wav
noisy_clip1047.wav
noisy_clip1048.w

In [30]:
import IPython
wav_loc = "noisy_clip231.wav" #A test example
src_rate, src_data = wavfile.read(wav_loc)
IPython.display.Audio(data = src_data, rate = src_rate)

The clean files are saved in a directory named Y, while the noisy corrupted files are saved in a directory named Noisy2. The '2' has no specific meaning as such.

In [40]:
import shutil
for i in range(1, UpperLimit + 1):
  shutil.copyfile('file{}.wav'.format(i),'./Y/file{}.wav'.format(i))
  shutil.copyfile('noisy_clip{}.wav'.format(i),'./Noisy2/noisy_clip{}.wav'.format(i))

The above directories are zipped to allow us to download them. The files are then consequently downloaded and used for further work.

In [37]:
import shutil
shutil.make_archive('N2', 'zip', 'Noisy2')

'/content/N2.zip'

In [39]:
import shutil
shutil.make_archive('Clean', 'zip', 'Y')

'/content/Clean.zip'

References used for the above are given below, to the best of my knowledge:
1. Sound processing and Noise addition: https://timsainburg.com/noise-reduction-python.html
2. Noise files Database:
http://web.cse.ohio-state.edu/pnl/corpus/HuNonspeech/HuCorpus.html
3. Some additional noise files (not used here) can be found here: https://www.ecs.utdallas.edu/loizou/speech/noizeus/
4. Sentence Database : https://tatoeba.org/en/downloads
5. Basic Introduction to gTTs : https://www.geeksforgeeks.org/convert-text-speech-python/
6. A guide to using tld in gTTs : https://gtts.readthedocs.io/en/latest/module.html#examples
7. Why the UpperLimit is required : https://cloud.google.com/speech-to-text/quotas
8. Some Audio processing trivia : https://stackoverflow.com/questions/10357992/how-to-generate-audio-from-a-numpy-array

