# Speech to Text using SpeechRecognition - Working with Audio Files

## Working with Audio Files

<b>Supported File Types:</b>

Currently, SpeechRecognition supports the following file formats:

-WAV: must be in PCM/LPCM format

-AIFF

-AIFF-C

-FLAC: must be native FLAC format; OGG-FLAC is not supported

In [1]:
import speech_recognition as sr

In [2]:
sr.__version__

'3.8.1'

In [3]:
r = sr.Recognizer()

## Using record() to Capture Data from an Audio File

In [4]:
test_audio = sr.AudioFile('Test_Audio.wav')
with test_audio as source:
    audio = r.record(source)

In [5]:
type(audio)

speech_recognition.AudioData

In [6]:
r.recognize_google(audio)

'the salesman old beer drinkers it takes hi to bring out the order I called them restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun'

## Capturing Segments with "offset" and "duration"

In [7]:
with test_audio as source:
    audio = r.record(source, duration=4)
r.recognize_google(audio)

'the sales Mela old beer drinkers'

In [8]:
with test_audio as source:
    audio1 = r.record(source, duration=4)
    audio2 = r.record(source, duration=4)

In [9]:
r.recognize_google(audio1)

'the sales Mela old beer drinkers'

In [10]:
r.recognize_google(audio2)

'ethics hi to bring out the order I called them'

In [11]:
with test_audio as source:
    audio = r.record(source, offset=4, duration=3)
r.recognize_google(audio)

'ethics hi to bring out the order'

In [12]:
with test_audio as source:
    audio = r.record(source, offset=4.7, duration=2.8)
r.recognize_google(audio)

'excreta bring out the order'

## Noise Filtering

In [13]:
with test_audio as source:
    r.adjust_for_ambient_noise(source) # For nosiy environment
    audio = r.record(source)
r.recognize_google(audio)

'best elss Mela old beer drinkers ethics hi to bring up order I called up restore selfinvest a selfie kurtis find him because of my favourite is just for food is Bihar cross bun'

In [14]:
with test_audio as source:
    r.adjust_for_ambient_noise(source, duration=0.4)
    audio = r.record(source)
r.recognize_google(audio)

'the still smell old beer drinkers it takes hi to bring out the order I called them restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun'

## View the API Response

In [15]:
r.recognize_google(audio, show_all=True)

{'alternative': [{'transcript': 'the still smell old beer drinkers it takes hi to bring out the order I called them restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun',
   'confidence': 0.850577},
  {'transcript': 'distil smell old beer drinkers it takes hi to bring out the order I called them restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun'},
  {'transcript': 'the still smell old beer drinkers it takes hi to bring out the order I call dip restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun'},
  {'transcript': 'distil smell old beer drinkers it takes hi to bring out the order I call dip restore selfinvest a salt ACL this find him because of my favourite is just for food is Bihar cross bun'},
  {'transcript': 'but still smell old beer drinkers it takes hi to bring out the order I called them restore selfinvest a salt a call this