1\. Creating transcription helper functions
-------------------------------------------

00:00 - 00:45

You've come a long way. From exploring an audio file from scratch to manipulating audio files to working with different transcription APIs. In this chapter, you're going to be putting everything you've learned together by building a proof of concept spoken language processing pipeline. Acme Studios, a technology company, has approached you to use your speech processing skills to gain insights on their customer support calls. They've sent you a handful of audio samples to explore and to see what you can find. They let you know they're not quite sure of the quality of the files or the format they're recorded in.

2\. Exploring audio files
-------------------------

00:45 - 01:07

You open the folder of audio files Acme have sent through using the os module's listdir function and notice they're in the mp3 format. You've seen this before but before continuing you decide to write down a list of things you're going to do to prepare for building the proof of concept.

```python
# Import os module
import os

# Check the folder of audio files
os.listdir("acme_audio_files")

# List of audio file names
['call_1.mp3', 'call_2.mp3', 'call_3.mp3', 'call_4.mp3']
```

3\. Preparing for the proof of concept
--------------------------------------

01:07 - 01:37

The first thing will be to listen to a few of the files using your media player or PyDub's play function to get an understanding of what you're working with, and then to transcribe one as soon as possible using recognize google so you have a baseline to work off. You convert the first file to wav and transcribe but you know from previous work, doing this for every file is tedious.

```python
# Import speech_recognition as sr
from pydub import AudioSegment

# Import call 1 and convert to .wav
call_1 = AudioSegment.from_file("acme_audio_files/call_1.mp3")
call_1.export("acme_audio_files/call_1.wav", format="wav")

# Transcribe call 1
recognizer = sr.Recognizer()
call_1_file = sr.AudioFile("acme_audio_files/call_1.wav")

with call_1_file as source:
    call_1_audio = recognizer.record(call_1_file)
    recognizer.recognize_google(call_1_audio)
```

4\. Functions we'll create
--------------------------

01:37 - 01:54

You decide it's a good idea to create functions which will help you for the rest of the proof of concept. One to convert files to wav format, one to find stats of an audio file using PyDub and another to transcribe an audio file using recognize google.

### Convert non-.wav files to .wav format
`convert_to_wav()` converts non-.wav files to .wav files.

### Show audio file attributes
`show_pydub_stats()` displays the audio attributes of a .wav file.

### Transcribe audio
`transcribe_audio()` uses `recognize_google()` to transcribe a .wav file.

5\. Creating a file format conversion function
----------------------------------------------

01:54 - 02:24

The first one convert to wav takes a file pathname and converts the file to a wav file. You'll first import the file as an AudioSegment, then create a new file name for it using the split function on the filename and adding the dot wav string extension. Finally, you'll use the export function to export it to wav format with the new file name, similar to what you did in a previous lesson.

```python
def convert_to_wav(filename):
    """Takes an audio file of non .wav format and converts to .wav"""
    
    # Import audio file
    audio = AudioSegment.from_file(filename)
    
    # Create new filename
    new_filename = filename.split(".")[0] + ".wav"
    
    # Export file as .wav
    audio.export(new_filename, format="wav")
    
    print(f"Converting {filename} to {new_filename}...")
```

This Python function `convert_to_wav()` takes an audio file of a non-.wav format and converts it to a .wav file. Here's how it works:

1. The `AudioSegment.from_file()` function is used to import the audio file.
2. A new filename is created by taking the original filename, splitting it on the "." and adding ".wav" to the end.
3. The `audio.export()` method is used to export the audio to the new .wav file format.
4. A print statement is included to show the conversion progress.

This function can be called with the filename of the audio file you want to convert to .wav format.

6\. Using the file format conversion function
---------------------------------------------

02:24 - 02:36

Great, now you can convert audio files without repeating yourself. Now let's make one to find an audio files attributes using PyDub.

```python
convert_to_wav("acme_studios_audio/call_1.mp3")
```
Converting acme_audio_files/call_1.mp3 to acme_audio_files/call_1.wav...

This code calls the `convert_to_wav()` function with the file path `"acme_studios_audio/call_1.mp3"` as the argument. It will convert the `call_1.mp3` audio file located in the `acme_studios_audio` directory to a `.wav` format file.

The function handles the conversion process, including:

1. Importing the audio file using `AudioSegment.from_file()`.
2. Creating a new filename by taking the original filename, splitting it on the "." and adding ".wav" to the end.
3. Exporting the audio to the new .wav file format using `audio.export()`.
4. Printing a message to show the conversion progress.

After running this code, the converted .wav file will be available in the same directory as the original .mp3 file.


7\. Creating an attribute showing function
------------------------------------------

02:36 - 02:51

show pydub stats takes a filename of an audio file and imports it as an AudioSegment. It then prints a number of attributes such as number of channels, sample width, frame rate and more.

```python
def show_pydub_stats(filename):
    """Returns different audio attributes related to an audio file."""

    audio_segment = AudioSegment.from_file(filename)

    print(f"Channels: {audio_segment.channels}")
    print(f"Sample width: {audio_segment.sample_width}")
    print(f"Frame rate (sample rate): {audio_segment.frame_rate}")
    print(f"Frame width: {audio_segment.frame_width}")
    print(f"Length (ms): {len(audio_segment)}")
    print(f"Frame count: {audio_segment.frame_count()}")
```

This Python function `show_pydub_stats()` takes an audio file path as input and prints various attributes of the audio file, including:

- Number of channels
- Sample width
- Frame rate (sample rate)
- Frame width
- Length in milliseconds
- Frame count

It creates an `AudioSegment` instance from the input file and then accesses and prints the relevant attributes of the audio file.

8\. Using the attribute showing function
----------------------------------------

02:51 - 03:07

Since you're working with customer support calls, this will help especially with files with different numbers of channels. If there are two channels, you might be able to split them and transcribe each speaker separately.

```python
show_pydub_stats("acme_audio_files/call_1.wav")
```

This code calls the `show_pydub_stats()` function with the file path `"acme_audio_files/call_1.wav"` as the argument. It will print various audio attributes related to the `call_1.wav` audio file, including:

- Channels: 2
- Sample width: 2 
- Frame rate (sample rate): 32000
- Frame width: 4
- Length (ms): 54888
- Frame count: 1756416.0

9\. Creating a transcribe function
----------------------------------

03:07 - 03:33

Finally, since you could be transcribing many audio files, you create a function to transcribe an audio file. transcribe audio takes a file path of an audio file and creates a speech recognition recognizer instance. It transcribes the audio file using recognize google as you've done in a previous lesson and returns the transcribed text.

```python
def transcribe_audio(filename):
    """Takes a .wav format audio file and transcribes it to text."""

    recognizer = sr.Recognizer()
    audio_file = sr.AudioFile(filename)

    with audio_file as source:
        audio_data = recognizer.record(audio_file)
        return recognizer.recognize_google(audio_data)
```

This function `transcribe_audio()` takes a `.wav` format audio file and uses the `recognize_google()` method from the `speech_recognition` library to transcribe the audio into text. Here's how it works:

1. It creates a `Recognizer` instance to perform the speech recognition.
2. It loads the audio file using `sr.AudioFile()`.
3. It records the audio data from the file using `recognizer.record()`.
4. It then passes the audio data to `recognizer.recognize_google()` to transcribe the audio to text.
5. The transcribed text is returned as the output of the function.

To use this function, you can call it with the path to a `.wav` audio file:

```python
transcribed_text = transcribe_audio("acme_audio_files/call_1.wav")
print(transcribed_text)
```

This will print the transcribed text of the `call_1.wav` audio file.

10\. Using the transcribe function
----------------------------------

03:33 - 03:51

Testing out the function on one of the calls works as expected. It reads in an audio file and returns the transcribed text. Excellent. Setting up helper functions like this at the start of a project may seem time-consuming but they'll help save time in the long run.

```python
def transcribe_audio(filename):
    """Takes a .wav format audio file and transcribes it to text."""

    recognizer = sr.Recognizer()
    audio_file = sr.AudioFile(filename)

    with audio_file as source:
        audio_data = recognizer.record(audio_file)
        return recognizer.recognize_google(audio_data)
```

`"hello welcome to Acme studio support line my name is Daniel how can I best help you hey Daniel this is John I've recently bought a smart from you guys and I know that's not good to hear John let's let's get your cell number and then we can we can set up a way to fix it for you one number for 1757 varies how long do you reckon this is going to take about an hour now while John we're going to try our best hour I will we get the sealing member will start up this support case I'm just really really really I've been trying to contact 34 been put on hold more than an hour and a half so I'm not really happy I kind of wanna get this issue 6 is fossil"`

The `transcribe_audio()` function takes a `.wav` format audio file, loads it using `sr.AudioFile()`, records the audio data, and then uses the `recognize_google()` method to transcribe the audio to text. The transcribed text is returned as the output.

To use this function, you can call it with the path to a `.wav` audio file:

```python
transcribed_text = transcribe_audio("acme_audio_files/call_1.wav")
print(transcribed_text)
```

This will print the transcribed text of the `call_1.wav` audio file.

11\. Let's practice!
--------------------

03:51 - 04:05

With that said, it's time to build them! Once you've got these ready to go, you'll be able to use some of your natural language processing skills on the transcribed text.

Converting audio to the right format
====================================

Acme Studios have asked you to do a proof of concept to find out more about their audio files.

After exploring them briefly, you find there's a few calls but they're in the wrong file format for transcription.

As you'll be interacting with many audio files, you decide to begin by creating some helper functions.

The first one, `convert_to_wav(filename)`takes a file path and uses `PyDub` to convert it from a non-wav format to `.wav` format.

Once it's built, we'll use the function to convert [Acme's first call](https://assets.datacamp.com/production/repositories/4637/datasets/83ef1650407e911a0f52f491068e3082661db743/ex4_call_1_stereo_mp3.mp3), `call_1.mp3`, from `.mp3`format to `.wav`.

`PyDub`'s `AudioSegment` class has already been imported. Remember, to work with non-wav files, you'll need `ffmpeg` ([docs](https://www.ffmpeg.org/)).

Instructions
------------

-   Import the `filename` parameter using `AudioSegment`'s `from_file()`.
-   Set the export format to `"wav"`.
-   Pass the target audio file, `call_1.mp3`, to the function.

In [None]:
# Create function to convert audio file to wav
def convert_to_wav(filename):
  """Takes an audio file of non .wav format and converts to .wav"""
  # Import audio file
  audio = AudioSegment.from_file(filename)
  
  # Create new filename
  new_filename = filename.split(".")[0] + ".wav"
  
  # Export file as .wav
  audio.export(new_filename, format='wav')
  print(f"Converting {filename} to {new_filename}...")
 
# Test the function
convert_to_wav("call_1.mp3")  #takes "call_1.mp3" not 'call_1.mp3'

Finding PyDub stats
===================

You decide it'll be helpful to know the audio attributes of any given file easily. This will be especially helpful for finding out how many channels an audio file has or if the frame rate is adequate for transcription.

In this exercise, we'll create `show_pydub_stats()` which takes a filename of an audio file as input. It then imports the audio as a `PyDub` `AudioSegment` instance and prints attributes such as number of channels, length and more.

It then returns the `AudioSegment` instance so it can be used later on.

We'll use our function on the [newly converted .wav file](https://assets.datacamp.com/production/repositories/4637/datasets/43c5aff8c419d07f8cef70fdf40e4657b78b70be/ex4_call_1_stereo_formatted.wav), `call_1.wav`

`AudioSegment` has already imported from `PyDub`.

Instructions
------------

-   Create an `AudioSegment` instance called `audio_segment` by importing the `filename`parameter.
-   Print the number of channels using the `channels` attribute.
-   Return the `audio_segment` variable.
-   Test the function on `"call_1.wav"`.

In [None]:
def show_pydub_stats(filename):
  """Returns different audio attributes related to an audio file."""
  # Create AudioSegment instance
  audio_segment = AudioSegment.from_file(filename)
  
  # Print audio attributes and return AudioSegment instance
  print(f"Channels: {audio_segment.channels}")
  print(f"Sample width: {audio_segment.sample_width}")
  print(f"Frame rate (sample rate): {audio_segment.frame_rate}")
  print(f"Frame width: {audio_segment.frame_width}")
  print(f"Length (ms): {len(audio_segment)}")
  return audio_segment

# Try the function
call_1_audio_segment = show_pydub_stats("call_1.wav")
# output:
#     Channels: 2
#     Sample width: 2
#     Frame rate (sample rate): 32000
#     Frame width: 4
#     Length (ms): 54888

Transcribing audio with one line
================================

Alright, now you've got functions to convert audio files and find out their attributes, it's time to build one to transcribe them.

In this exercise, you'll build `transcribe_audio()` which takes a `filename`as input, imports the `filename` using `speech_recognition`'s `AudioFile` class and then transcribes it using `recognize_google()`.

You've seen these functions before but now we'll put them together so they're accessible in a function.

To test it out, we'll transcribe [Acme's first call](https://assets.datacamp.com/production/repositories/4637/datasets/43c5aff8c419d07f8cef70fdf40e4657b78b70be/ex4_call_1_stereo_formatted.wav), `"call_1.wav"`.

`speech_recognition` has been imported as `sr`.

Instructions
------------

-   Define a function called `transcribe_audio`which takes `filename` as an input parameter.
-   Setup a `Recognizer()` instance as `recognizer`.
-   Use `recognize_google()` to transcribe the audio data.
-   Pass the target call to the function.

In [None]:
def transcribe_audio(filename):
  """Takes a .wav format audio file and transcribes it to text."""
  # Setup a recognizer instance
  recognizer = sr.Recognizer()
  
  # Import the audio file and convert to audio data
  audio_file = sr.AudioFile(filename)
  with audio_file as source:
    audio_data = recognizer.record(source)
  
  # Return the transcribed text
  return recognizer.recognize_google(audio_data)

# Test the function
print(transcribe_audio("call_1.wav"))
# output:
#     hello welcome to Acme studio support line my name is Daniel how can 
# I best help you hey Daniel this is John I've recently bought a smart from 
# you guys 3 weeks ago and I'm already having issues with it I know that's not 
# good to hear John let's let's get your cell number and then we can we can set up 
# a way to fix it for you one number for 17 varies how long do you reckon this is going 
# to try our best to get the steel number will start up this support case I'm just really 
# really really really I've been trying to contact past three 4 days now and I've been put 
# on hold more than an hour and a half so I'm not really happy I kind of wanna get this issue 6 is f***** possible


Using the helper functions you've built
=======================================

Okay, now we've got some helper functions ready to go, it's time to put them to use!

You'll first use `convert_to_wav()` to convert Acme's `call_1.mp3` ([file](https://assets.datacamp.com/production/repositories/4637/datasets/56f523fb855eaecc14a87c5619ec5e6e7c4490bc/ex4_call_1_stereo_formatted_mp3.mp3)) to `.wav` format and save it as `call_1.wav`

Using `show_pydub_stats()` you find `call_1.wav` has 2 channels so you decide to split them using `PyDub`'s `split_to_mono()`. Acme tells you the [customer channel](https://assets.datacamp.com/production/repositories/4637/datasets/03ace2e9b866aaa554c465d6698500aaf48599dc/ex4_call_1_channel_2_split.wav) is likely channel 2. So you export channel 2 using `PyDub`'s `.export()`.

Finally, you'll use `transcribe_audio()` to transcribe channel 2 only.

Instructions 1/3
----------------

-   Convert the `.mp3` version of `call_1` to `.wav` and then check the stats of the `.wav` version.

In [None]:
# Convert mp3 file to wav
convert_to_wav("call_1.mp3")

# Check the stats of new file
call_1 = show_pydub_stats("call_1.wav")

Instructions 2/3
----------------

-   Split `call_1` to mono and then export the second channel in `.wav` format.

In [None]:
# Convert mp3 file to wav
convert_to_wav("call_1.mp3")

# Check the stats of new file
call_1 = show_pydub_stats("call_1.wav")

# Split call_1 to mono
call_1_split = call_1.split_to_mono()

# Export channel 2 (the customer channel)
call_1_split[1].export("call_1_channel_2.wav",
                       format="wav")

Instructions 3/3
----------------

-   Transcribe the audio of call 1's channel 2.

In [None]:
# Convert mp3 file to wav
convert_to_wav("call_1.mp3")

# Check the stats of new file
call_1 = show_pydub_stats("call_1.wav")

# Split call_1 to mono
call_1_split = call_1.split_to_mono()

# Export channel 2 (the customer channel)
call_1_split[1].export("call_1_channel_2.wav",
                       format="wav")

# Transcribe the single channel
print(transcribe_audio(call_1_split[1]))