1\. Introduction to audio data in Python
----------------------------------------

00:00 - 00:12

Hello and welcome to the course! My name is Daniel Bourke and I'll be your instructor. To get started, we're first going to see how speech and audio processing is different to other kinds of data processing.

2\. Dealing with audio files in Python
--------------------------------------

00:12 - 00:59

Much like other data types, audio files come in many different formats, such as, mp3, wav, m4a and flac. But each of these formats has a standard measure of frequency. Frequency is measured in kilohertz but is also referred to as kHz or sampling rate. Much like how a movie shows 30 pictures per second which our brains register as moving pictures, the sampling rate of an auido file is a measure of the number of data chunks per second used to represent a digital sound. With one kilohertz equaling one thousand pieces of information per second.

• Different kinds all of audio files
  - mp3 
  - wav
  - m4a
  - flac

• Digital sounds measured in frequency (kHz)
  - 1 kHz = 1000 pieces of information per second

3\. Frequency examples
----------------------

00:59 - 01:46

For example, a song you stream will usually have a 32 kHz sampling rate. This means 32,000 pieces of information per second. Speech and audio books are usually between 8 and 16 kHz. We'll look at some of these later. And as you might've guessed, audio files are different to tabular or text data because you can't immediately see the data you're working with. To get spoken language audio files into something we can see and manipulate, we first have to open the audio file with Python's built-in wave module. We can get started with the wave module by running the command import wave.

• Streaming songs have a frequency of 32 kHz

• Audiobooks and spoken language are between 8 and 16 kHz 

• We can't see audio files so we have to transform them first

```python
import wave
```

4\. Opening an audio file in Python
-----------------------------------

01:46 - 03:18

Now, we have an audio file, good morning dot wav ready to go. It contains a person saying the words good morning. To import it, we'll use wave's open method. Now we've saved the good morning dot wav audio file to the variable good_morning in the format of a wave_object. However, in this state it's not very useful to us. To manipulate it further, we'll use the readframes method to convert the wave_object to bytes. The -1 means we want to read in all of the pieces of information within the wave_object. Now we've converted the audio file to bytes, what do they look like? Okay, we can see a snippet of the entire soundwave in byte form. But remember how kilohertz means thousands of pieces of information per second? The good morning dot wav audio file is 48 kilohertz and 2-seconds long. 48,000 pieces of information per second and 2-seconds long equals 96,000 chunks of data all for only two words. So if we printed out the entire soundwave in byte form we'd see 96,000 of these combinations of letters and numbers. Don't worry, if the output looks confusing for now, we'll learn how to convert these bytes into something more useful shortly.

• Audio file saved as `good-morning.wav`

```python
# Import audio file as wave object
good_morning = wave.open("good-morning.wav", "r")

# Convert wave object to bytes
good_morning_soundwave = good_morning.readframes(-1)

# View the wav file in byte form
good_morning_soundwave
```

```
b'\xfd\xff\xfb\xff\xf8\xff\xf8\xff\xf7...'
```

5\. Working with audio is different
-----------------------------------

03:18 - 03:58

Now you can start to see how working with audio and spoken language files is different to other kinds of data. First of all, unlike text or tabular data, you can't immediately see what you're working with. So many audio files often require a conversion step before you can begin working with them. And because of the frequency measure, even a few seconds of audio can contain large amounts of data. Add in background noise, other sounds, more speakers and the number of pieces of information grows even more. We'll look into this later on.

• Have to convert the audio to something useful

• Small sample of audio = large amount of information

6\. Let's practice!
-------------------

03:58 - 04:06

Alright, it's time to get hands on and practice importing your first audio file!

The right frequency
===================

Movies play multiple pictures per second succession to give the illusion of moving pictures, sound is similar but usually at a much larger rate. What's the standard unit of measure for sound frequency?

##### Answer the question

#### Possible Answers

Select one answer

[/] -   FPS (frames per second)

-   SPS (sound per second)

-   Hz (Hertz)

-   WPS (waves per second)

Importing an audio file with Python
===================================

You've seen how there are different kinds of audio files and how streaming music and spoken language have different sampling rates. But now we want to start working with these files.

To begin, we're going to import the `good_morning.wav` audio file using Python's in-built `wave` library. Then we'll see what it looks like in byte form using the built-in `readframes()` method.

You can listen to `good_morning.wav` [here](https://assets.datacamp.com/production/repositories/4637/datasets/d30b8e2319792fb3e9d7ce1e469b15ecf3f75227/good-morning.wav).

Remember, `good_morning.wav` is only a few seconds long but at 48 kHz, that means it contains 48,000 pieces of information per second.

Instructions
------------

-   Import the Python `wave` library.
-   Read in the `good_morning.wav` audio file and save it to `good_morning`.
-   Create `signal_gm` by reading all the frames from `good_morning` using `readframes()`.
-   See what the first 10 frames of audio look like by slicing `signal_gm`.

In [None]:
import wave

# Create audio file wave object
good_morning = wave.open('good_morning.wav', 'r')

# Read all frames from wave object 
signal_gm = good_morning.readframes(-1)

# View first 10
print(signal_gm[:10])

1\. Converting sound wave bytes to integers
-------------------------------------------

00:00 - 00:12

Excellent effort, you've imported the good morning audio file and seen what it looks like in byte form. Now let's see if we can make those bytes even more useful.

2\. Converting bytes to integers
--------------------------------

00:12 - 01:51

To make our audio data more useful, we're going to convert it from byte form to integers. To do this, we'll use NumPy. NumPy is a numerical Python library full of helpful functions. First, we'll import it with the common alias ehn-pee to avoid typing NumPy every time. Then, the NumPy method we'll use to convert our bytes to integers is frombuffer. frombuffer turns a series of data into a 1-dimensional array of a specified data type. Remember, we saved the good morning audio file bytes to the variable soundwave gm. Since this is an array of data, we can pass it to frombuffer as the first parameter. And then we can set the dtype parameter to the data type we'd like to get back. There are multiple datatype's we could pass in but for our case, int16 is what we're after. So if we wanted to see the first values of our soundwave in integer form, what do they look like? Much better. But, we're only looking at the first 10. Can you guess how long the whole array is? Remember how frequency is a measure of information per second? Our good morning soundwave has a frequency of 48 kilohertz, a length of 2-seconds and thus 96,000 pieces of information. So, this array only shows the first 10 of those 96,000.

• Can't use bytes

• Convert bytes to integers using numpy

```python
import numpy as np
# Convert soundwave_gm from bytes to integers
signal_gm = np.frombuffer(soundwave_gm, dtype='int16')
# Show the first 10 items
signal_gm[:10]
```

```
array([ -3,  -5,  -8,  -8,  -9, -13,  -8, -10,  -9, -11], dtype=int16)
```

3\. Finding the frame rate
--------------------------

01:51 - 02:49

Okay, we know our good morning sound wave has a frequency of 48 kilohertz. But what if we didn't? To find it, we could divide the length of the wave object array by the duration of the sound wave in seconds. But Python's wave module has a programmatic way. Calling get frame rate on a wave object will return its frame rate. Let's use it on our good morning wave object. Excellent, the method returns the number we were expecting, 48,000, or, 48 kilohertz. We can use this frame rate variable for one more thing which will be handy for visualizing our sound waves later. By dividing the number of items in the sound wave array by the frame rate, we can get the duration of our audio file.

• Frequency (Hz) = length of wave object array/duration of audio file (seconds)

```python
# Get the frame rate
framerate_gm = good_morning.getframerate()
# Show the frame rate
framerate_gm
```

```
48,000
```

• Duration of audio file (seconds) = length of wave object array/frequency (Hz)

4\. Finding sound wave timestamps
---------------------------------

02:49 - 03:47

With this value, we can leverage NumPy's linspace method to figure out the timestamp where each sound wave value occurs. The linspace method takes start, stop and num integers as parameters. Calling it will return num evenly spaced values between start and stop. Let's try it with start as 1, stop as 10 and num as 10. As you can see, it returns an array of evenly spaced numbers between 1 and 10. Let's try it on our own values to get the timestamps of pieces of information in our sound wave. Start will be 0 for the beginning of the audio file. Stop will be the length of our sound wave array over the framerate, or in other words, the duration. And num will be the length of our sound wave array, since each item in the array is a sound wave value.

```python
# Return evenly spaced values between start and stop
np.linspace(start=1, stop=10, num=10)
```

```
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
```

```python
# Get the timestamps of the good morning sound wave
time_gm = np.linspace(start=0,
                     stop=len(soundwave_gm)/framerate_gm,
                     num=len(soundwave_gm))
```

5\. Finding sound wave timestamps
---------------------------------

03:47 - 04:05

Let's check out the first 10 timestamps. Each of these values is the time in seconds where each sound wave byte occurred. We'll be able to use these timestamp values later to see what our sound wave looks like.

```python
# View first 10 time stamps of good morning sound wave
time_gm[:10]
```

```
array([0.00000000e+00, 2.08334167e-05, 4.16668333e-05, 6.25002500e-05,
      8.33336667e-05, 1.04167083e-04, 1.25000500e-04, 1.45833917e-04,
      1.66667333e-04, 1.87500750e-04])
```

6\. Let's practice!
-------------------

04:05 - 04:11

Okay, it's your turn to make our bytes more useful!

The right data type
===================

`dtype` defaults to float in `np.frombuffer()`, what's the correct `dtype` to set it to for visualizing sound wave bytes?

You can try the different options by running `np.frombuffer(signal_gm, dtype=____)`.

Instructions
------------

### Possible answers

float

[/] 'int16'

np.uint8

string

Bytes to integers
=================

You've seen how to import and read an audio file using Python's `wave` module and the `readframes()` method. But doing that results in an array of bytes.

To convert the bytes into something more useful, we'll use NumPy's `frombuffer()`method. 

Passing `frombuffer()` our sound waves bytes and indicating a `dtype` of `'int16'`, we can convert our bytes to integers. Integers are much easier to work with than bytes.

The Python `wave` library has already been imported along with the `good_morning.wav`[audio file](https://assets.datacamp.com/production/repositories/4637/datasets/d30b8e2319792fb3e9d7ce1e469b15ecf3f75227/good-morning.wav).

Instructions
------------

-   Import the `numpy` package with its common alias `np`.
-   Open and read the good morning audio file.
-   Convert the `signal_gm` bytes to `int16`integers.
-   View the first 10 sound wave values.

In [None]:
import numpy as np

# Open good morning sound wave and read frames as bytes
good_morning = wave.open('good_morning.wav', 'r')
signal_gm = good_morning.readframes(-1)

# Convert good morning audio bytes to integers
soundwave_gm = np.frombuffer(signal_gm, dtype='int16')

# View the first 10 sound wave values
print(soundwave_gm[:10])

Finding the time stamps
=======================

We know the frequency of our sound wave is 48 kHz, but what if we didn't? We could find it by dividing the length of our sound wave array by the duration of our sound wave. However, Python's `wave` module has a better way. Calling `getframerate()` on a wave object returns the frame rate of that wave object.

We can then use NumPy's `linspace()`method to find the time stamp of each integer in our sound wave array. This will help us visualize our sound wave in the future.

The `linspace()` method takes `start`, `stop`and `num` parameters and returns `num` evenly spaced values between `start` and `stop`.

In our case, `start` will be zero, `stop` will be the length of our sound wave array over the frame rate (or the duration of our audio file) and `num` will be the length of our sound wave array.

Instructions
------------

-   Convert the sound wave bytes to integers.
-   Get the frame rate of the good morning audio file using `getframerate()`.
-   Set `stop` to be the length of `soundwave_gm`over the frame rate.
-   Set `num` to be the length of `soundwave_gm`.

In [None]:
# Read in sound wave and convert from bytes to integers
good_morning = wave.open('good_morning.wav', 'r')
signal_gm = good_morning.readframes(-1)
soundwave_gm = np.frombuffer(signal_gm, dtype='int16')

# Get the sound wave frame rate
framerate_gm = good_morning.getframerate()

# Find the sound wave timestamps
time_gm = np.linspace(start=0,
                      stop=len(soundwave_gm)/framerate_gm,
                      num=len(soundwave_gm))

# Print the first 10 timestamps
print(time_gm[:10])

1\. Visualizing sound waves
---------------------------

00:00 - 00:14

It took a few conversion steps but now you've seen what it takes to transform an audio file into numbers. Because of your efforts, we'll now be able to visualize our good morning sound wave using the plotting library MatPlotLib.

2\. Adding another sound wave
-----------------------------

00:14 - 01:22

To add to the visualization we're creating, we'll bring in another sound wave, good afternoon. This will highlight the difference between two similar sound waves and set up the intuition for the rest of the course. Both the good morning and good afternoon audio files are 48 kHz or 48,000 frames per second. You'll see in future lessons, having your audio files at the same frame rate and ensuring the same data transformations are made on each of them is important. This is because, if they're different, we've got the potential for data mismatches, which will prevent us from further processing. In the previous lesson, we used the frame rate to calculate the time stamps of where each piece of audio information appears. Behind the scenes, we've done the same calculations with the good afternoon audio file as the good morning file. Now we've got both sound wave arrays and timestamps ready, we can plot them.

• New audio file: good_afternoon.wav
• Both are 48 kHz
• Same data transformations to all audio files


3\. Setting up a plot
---------------------

01:22 - 02:49

To set up a plot, we first import the MatPlotLib pyplot module under the common alias pee-ell-tee. We can then start creating a plot by calling title on plt and passing it a string. The string will be the title of our plot. Then we can add some labels for the x and y axis' using the xlabel and ylabel methods. The x-axis will be the timestamps we've calculated, measured in seconds. And the y-axis is the amplitude or how much the sound wave displaces air particles as it moves through the air. A value of 0 indicates no sound at all. Now we've got the plot labels set up, we can add both of the sound waves. We'll make sure each of them has a label so we can differentiate them using a legend. And to make sure we can highlight the differences, we'll add an alpha parameter of zero point five to the good morning values, this is so they appear slightly transparent on the plot. Since we've given our data labels, we can create a legend by calling the legend method on our plot. And now every thing is set up, we can see our plot by calling the show method. Let's see it!

```python
import matplotlib.pyplot as plt
# Initialize figure and setup title
plt.title("Good Afternoon vs. Good Morning")
# x and y axis labels
plt.xlabel("Time (seconds)")
plt.ylabel("Amplitude")
# Add good morning and good afternoon values
plt.plot(time_ga, soundwave_ga, label="Good Afternoon")
plt.plot(time_gm, soundwave_gm, label="Good Morning", 
         alpha=0.5)
# Create a legend and show our plot
plt.legend()
plt.show()
```

4\. Visualizing our sound waves
-------------------------------

02:49 - 03:13

Woah, that looks nice. In the beginning, you can see how the two sound waves are similar where the word 'good' would be but then they begin to differ as morning and afternoon get uttered. These differences are what we'll be working with throughout the rest of the course as we convert sound wave integers to words.

```markdown
# Good Afternoon vs. Good Morning

    Time (seconds)                   Amplitude
    0 ------------------|----------------------------------- 10000
                        |
                        |              ^^^^
                        |             ^^^^^^
    1 ------------------|------------^^^^^^^^----------------- 5000
                        |         ^^^^^^^^^^
                        |      ^^^^^^^^^^^^^^
    2 ------------------|----^^^^^^^^^^^^^^^^^---------------- 0
                        |   ^^^^^^^^^^^^^^^^^^^
                        |  ^^^^^^^^^^^^^^^^^^^^^
    3 ------------------|-^^^^^^^^^^^^^^^^^^^^^^------------- -5000
                        |^^^^^^^^^^^^^^^^^^^^^^^^
                        |^^^^^^^^^^^^^^^^^^^^^^^^^
    4 ------------------|^^^^^^^^^^^^^^^^^^^^^^^^^^--------- -10000
                        |^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        |^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    5 ------------------|^^^^^^^^^^^^^^^^^^^^^^^^^^^^------- -15000
                        |
                        |                  
    6 ------------------|---------------------------------- -20000

    Legend:
    ^^^^ Good Afternoon (blue line)
    ^^^^ Good Morning (orange line, 50% transparency)
```
- Note: Not the actual representation
- The image shows a line plot comparing two audio waveforms for the phrases "good afternoon" and "good morning". The x-axis represents time in seconds (from 0 to 7 seconds) and the y-axis shows amplitude (ranging from -15000 to 10000). The waveforms overlap with "good morning" shown in orange with transparency and "good afternoon" shown in blue. Both phrases show characteristic speech patterns with varying amplitudes over time, with the most intense activity occurring between 1.5 and 3.5 seconds. The plot is titled "Good Afternoon vs. Good Morning" and includes a legend identifying each waveform.

5\. Time to visualize!
----------------------

03:13 - 03:25

Now you've seen what it takes to import a sound wave, transform it from bytes to integers and then plot it, it's your turn to visualize some sound waves!

Staying consistent
==================

Why is it important to ensure the same data transformations are performed on all of your audio files?

##### Answer the question

#### Possible Answers

Select one answer

[/] -   So data can be processed faster.

-   Only performing transformations on one audio file is okay.

-   Audio files don't have to be transformed, if you can hear it, it's ready.

-   To ensure data consistency and prevent potential data mismatches.

Processing audio data with Python
=================================

You've seen how a sound waves can be turned into numbers but what does all that conversion look like?

And how about another similar sound wave? One slightly different?

In this exercise, we're going to use MatPlotLib to plot the sound wave of `good_morning`against `good_afternoon`.

To have the `good_morning` and `good_afternoon` sound waves on the same plot and distinguishable from each other, we'll use MatPlotLib's `alpha` parameter.

You can listen to the `good_morning` audio [here](https://assets.datacamp.com/production/repositories/4637/datasets/d30b8e2319792fb3e9d7ce1e469b15ecf3f75227/good-morning.wav)and `good_afternoon` audio [here](https://assets.datacamp.com/production/repositories/4637/datasets/16379ca3c3689f5f7cfb3de20585cb6da609294b/good-afternoon.wav).

Instructions
------------

-   Set the title to reflect the plot we are making.
-   Add the `good_afternoon` time variable (`time_ga`) and amplitude variable (`soundwave_ga`) to the plot.
-   Do the same with the `good_morning` time variable (`time_gm`) and amplitude variable (`soundwave_gm`) to the plot.
-   Set the alpha variable to `0.5`.

In [None]:
# Setup the title and axis titles
plt.title('Good Afternoon vs. Good Morning')
plt.ylabel('Amplitude')
plt.xlabel('Time (seconds)')

# Add the Good Afternoon data to the plot
plt.plot(time_ga, soundwave_ga, label='Good Afternoon')

# Add the Good Morning data to the plot
plt.plot(time_gm, soundwave_gm, label='Good Morning',
   # Set the alpha variable to 0.5
   alpha=0.5)

plt.legend()
plt.show()