# Whisper Model Exploration

This notebook is designed to explore the capabilities and performance of the Whisper model for speech-to-text tasks. We will load the model, process audio inputs, and evaluate the output.

In [1]:
import torch
from src.models.whisper.model import WhisperModel
from src.utils.audio import load_audio
from src.utils.text import process_text
from src.config.hyperparameters import Hyperparameters

# Check if GPU is available and set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the Whisper model
whisper_model = WhisperModel().to(device)
whisper_model.load_model()  # Assuming this method loads the model weights

## Audio Processing

We will load an audio file and preprocess it for the Whisper model.

In [2]:
audio_file_path = 'path/to/audio/file.wav'  # Replace with your audio file path
audio_data = load_audio(audio_file_path)
processed_audio = whisper_model.preprocess_audio(audio_data)  # Assuming this method exists

## Speech-to-Text Conversion

Now we will use the Whisper model to convert the processed audio into text.

In [3]:
transcription = whisper_model.transcribe(processed_audio)
print('Transcription:', transcription)

## Evaluation

We can evaluate the performance of the transcription using various metrics defined in our evaluation module.

In [4]:
from src.evaluation.metrics import evaluate_transcription

reference_text = 'Expected transcription text.'  # Replace with the actual expected text
evaluation_results = evaluate_transcription(transcription, reference_text)
print('Evaluation Results:', evaluation_results)

## Conclusion

In this notebook, we explored the Whisper model's capabilities for speech-to-text conversion. We loaded the model, processed audio, generated transcriptions, and evaluated the results. Further exploration can include hyperparameter tuning and summarization of the transcriptions.