<a href="https://colab.research.google.com/github/arafMustavi/SPitch/blob/main/AudioToText.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transcribe

[ Following the tutorial : https://realpython.com/python-speech-recognition/ ]

Dataset for Speech Recognition : 

[http://www.voiptroubleshooter.com/open_speech/index.html]

Speech Recognition | Library Installation

In [1]:
!pip install SpeechRecognition

Collecting SpeechRecognition
[?25l  Downloading https://files.pythonhosted.org/packages/26/e1/7f5678cd94ec1234269d23756dbdaa4c8cfaed973412f88ae8adf7893a50/SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8MB)
[K     |████████████████████████████████| 32.8MB 111kB/s 
[?25hInstalling collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.8.1


Import and Version Check

In [2]:
import speech_recognition as sr
sr.__version__

'3.8.1'

## Recognizer Instance

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

- recognize_bing(): Microsoft Bing Speech

- recognize_google(): Google Web Speech API

- recognize_google_cloud(): Google Cloud Speech - requires installation of the
google-cloud-speech package

- recognize_houndify(): Houndify by SoundHound

- recognize_ibm(): IBM Speech to Text

- recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx

- recognize_wit(): Wit.ai

of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

In [3]:
recog = sr.Recognizer()

Supported File Types

Currently, SpeechRecognition supports the following file formats:

- WAV: must be in PCM/LPCM format

- AIFF

- AIFF-C

- FLAC: must be native FLAC format; OGG-FLAC is not supported

In [4]:
#  WILL Throw and Error

recog.recognize_google()

# How could something be recognized from nothing?

TypeError: ignored

# Using record() to Capture Data From a File

Type the following into your interpreter session to process the contents of the “harvard.wav” file:

In [8]:
inputAudio = sr.AudioFile('OSR_us_000_0010_8k.wav')
with inputAudio as source:
    audio = recog.record(source)

In [9]:
type(audio)

speech_recognition.AudioData

In [10]:
recog.recognize_google(audio)

"Burke's canoe slid on the smooth planks glue the sea to a dark blue background it is easy to tell the depth of a well these day the chicken leg of a variegated rice is often served in Randall's the juice of lemons makes fine punch the boxes on the side the pump truck the ha grimstead top corn and garbage 4 hours of City Works in a large cell"

# **Congratulations! You’ve just transcribed your first audio file!**


Capturing Segments With offset and duration

What if you only want to capture a portion of the speech in a file? The record() method accepts a duration keyword argument that stops the recording after a specified number of seconds.

For example, the following captures any speech in the first four seconds of the file:

In [12]:
with inputAudio as source:
    audio = recog.record(source, duration=8)
recog.recognize_google(audio)

'Birch canoe slid on the smooth planks glue the sea to a dark blue background'

The record() method, when used inside a with block, always moves ahead in the file stream. This means that if you record once for four seconds and then record again for four seconds, the second time returns the four seconds of audio after the first four seconds.

In [13]:
with inputAudio as source:
    audio1 = recog.record(source, duration=8)
    audio2 = recog.record(source, duration=8)

print(recog.recognize_google(audio1))
print(recog.recognize_google(audio2))

Burke's canoe slid on the smooth planks glue the sea to a dark blue background
need to tell the depth of a well with a very good price is often served in round


Notice that audio2 contains a portion of the third phrase in the file. When specifying a duration, the recording might stop mid-phrase—or even mid-word—which can hurt the accuracy of the transcription. More on this in a bit.

In addition to specifying a recording duration, the record() method can be given a specific starting point using the offset keyword argument. This value represents the number of seconds from the beginning of the file to ignore before starting to record.

To capture only the second phrase in the file, you could start with an offset of four seconds and record for, say, three seconds.

In [14]:
with inputAudio as source:
    audio = recog.record(source,offset=4, duration=3)
print(recog.recognize_google(audio))

gluta C to a dog food bag
