<a href="https://colab.research.google.com/github/TirendazAcademy/An-LLM-App-with-Chainlit/blob/main/3-Music-Genre-Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing

Let's install the trasformers library.

In [None]:
!pip install -qU git+https://github.com/huggingface/transformers
!pip install -qU git+https://github.com/huggingface/datasets

In [None]:
import transformers, datasets
print("The transformers version:", transformers.__version__)
print("The datasets version:", datasets.__version__)

Next, let's load the dataset we'll use.

In [None]:
from datasets import load_dataset

minds = load_dataset("PolyAI/minds14", name="en-AU", split="train")

#  Pre-trained models and datasets for audio classification

Then let's create our pipeline:

In [None]:
from transformers import pipeline

classifier = pipeline(
    "audio-classification",
    model="anton-l/xtreme_s_xlsr_300m_minds14",
)

Finally, we can pass a sample to the classification pipeline to make a prediction:

In [None]:
classifier(minds[0]["audio"])

## Speech Commands

First, let's load the dataset.

In [None]:
speech_commands = load_dataset(
    "speech_commands", "v0.02", split="validation", streaming=True
)

In [None]:
sample = next(iter(speech_commands))

In [None]:
sample

 Let's load an official Audio Spectrogram Transformer checkpoint fine-tuned on the Speech Commands dataset

In [None]:
classifier = pipeline(
    "audio-classification", model="MIT/ast-finetuned-speech-commands-v2"
)
classifier(sample["audio"].copy())

As you can see, the prediction of our model is backward. Let me take a listen to the sample and verify this is prediction:

In [None]:
from IPython.display import Audio

Audio(sample["audio"]["array"], rate=sample["audio"]["sampling_rate"])

## Language Identification

Language identification (LID) is the task of identifying the language spoken in an audio sample from a list of candidate languages systems in 102 languages.

In [None]:
fleurs = load_dataset("google/fleurs", "all", split="validation", streaming=True, trust_remote_code=True)
sample = next(iter(fleurs))

Next, let's create our pipeline.

In [None]:
classifier = pipeline(
    "audio-classification", model="sanchit-gandhi/whisper-medium-fleurs-lang-id"
)

We can then pass the audio through our classifier and generate a prediction:

In [None]:
classifier(sample["audio"])

## Zero-Shot Audio Classification

In [None]:
dataset = load_dataset("ashraq/esc50", split="train", streaming=True)
audio_sample = next(iter(dataset))["audio"]["array"]

In [None]:
candidate_labels = ["Sound of a dog", "Sound of vacuum cleaner"]

In [None]:
classifier = pipeline(
    task="zero-shot-audio-classification", model="laion/clap-htsat-unfused"
)
classifier(audio_sample, candidate_labels=candidate_labels)

In [None]:
Audio(audio_sample, rate=16000)