Requirements:

pip install sounddevice soundfile numpy torch

In [7]:
import sounddevice as sd
import soundfile as sf
import numpy as np
import torch

In [9]:

# Set the recording parameters
samplerate = 44100  # Sample rate (samples per second)
duration = 10  # Duration (seconds)
channels = 2  # Stereo
filename = "recorded_audio.wav"  # Name of the file where audio will be saved

In [3]:
print("Recording...")
# Record audio
audio = sd.rec(int(samplerate * duration), samplerate=samplerate, channels=channels, dtype='float32')
sd.wait()  # Wait until recording is complete
print("Recording finished!")

Recording...
Recording finished!


In [None]:
# Save the audio
sf.write(filename, audio, samplerate)
print(f"Audio saved as {filename}")

In [None]:
# Play the audio
print("Playing the recorded audio...")
sd.play(audio, samplerate)
sd.wait()  # Wait until audio playback is done

print("Playback finished!")

### Save as a tensor

Once you have the audio data in a PyTorch tensor (audio_tensor), you can leverage the power of PyTorch to perform various operations on the data. Here are some potential uses:

Deep Learning:

Audio Classification: If you have a trained model, you can use the tensor to make predictions. For instance, you could classify sounds (e.g., dog barking, car honking, music playing) or recognize spoken commands.
Training Models: If you collect multiple audio samples, you can use them to train deep learning models for various tasks, such as speech recognition, sound classification, or emotion detection from voice.
Feature Extraction: You can use pre-trained models like VGGish or other architectures to extract meaningful features from the audio tensor and then use those features for classification or clustering tasks.
Transformations:

Spectrogram Generation: Convert the audio waveform to a spectrogram representation for visualization or to input into neural networks. This is often done in speech and audio processing before feeding the data into models.
Augmentation: You can perform audio data augmentation, which is essential for training robust models. This includes operations like time stretching, pitch shifting, adding noise, etc.
Analysis:

Basic Statistics: Compute mean, variance, and other statistical measures of the audio signal.
Feature Extraction: Extract audio features like Mel-frequency cepstral coefficients (MFCCs), chroma features, spectral contrast, etc., which can be used in traditional machine learning algorithms or for audio analysis.
Manipulation:

Filtering: Apply various filters to the audio data, like low-pass, high-pass, or band-pass filters.
Volume Adjustment: Normalize or adjust the amplitude of the audio.
Visualization:

Waveform Plotting: Visualize the waveform of the audio signal to see its amplitude variations over time.
Frequency Analysis: Visualize the frequency components of the audio signal using tools like Fourier transform.
Integration with other Libraries:

You can easily convert the PyTorch tensor to NumPy arrays (using .numpy()) and then utilize a vast array of scientific libraries available in Python for more specialized audio processing tasks.
Custom Operations:

Given that PyTorch is a deep learning library with auto-differentiation, you can define custom operations on the audio tensor and compute gradients, which might be useful for research purposes or custom applications.
When working with audio data in deep learning, it's common to convert the raw waveform into a different representation (like spectrograms, MFCCs, etc.) because they can be more informative and lead to better model performance. The torchaudio library, which is an extension of PyTorch, provides many useful tools and transformations for working with audio data. If you plan on doing a lot with audio in PyTorch, you might want to look into torchaudio.

In [4]:
# Convert audio data to PyTorch tensor
audio_tensor = torch.tensor(audio)