# Unit 2. A gentle introduction to audio applications

## Audio classification with a pipeline

Please see [https://huggingface.co/learn/audio-course/chapter2/audio_classification_pipeline#audio-classification-with-a-pipeline](https://huggingface.co/learn/audio-course/chapter2/audio_classification_pipeline#audio-classification-with-a-pipeline)

In [1]:
from datasets import load_dataset
from datasets import Audio

minds = load_dataset("PolyAI/minds14", name="en-AU", split="train")

# "upsample" to 16 kHz
minds = minds.cast_column("audio", Audio(sampling_rate=16_000))

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


## ???

[`anton-l/xtreme_s_xlsr_300m_minds14` model card](https://huggingface.co/anton-l/xtreme_s_xlsr_300m_minds14#xtreme_s_xlsr_300m_minds14):

> This model is a fine-tuned version of [`facebook/wav2vec2-xls-r-300m`](https://huggingface.co/facebook/wav2vec2-xls-r-300m#wav2vec2-xls-r-300m) on the GOOGLE/XTREME_S - MINDS14.ALL dataset (sic). ([`google/xtreme_s`](https://huggingface.co/datasets/google/xtreme_s#xtreme-s))

In [2]:
from transformers import pipeline

classifier = pipeline(
    "audio-classification",
    model="anton-l/xtreme_s_xlsr_300m_minds14",
)

Some weights of the model checkpoint at anton-l/xtreme_s_xlsr_300m_minds14 were not used when initializing Wav2Vec2ForSequenceClassification: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at anton-l/xtreme_s_xlsr_300m_minds14 and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos

In [3]:
example = minds[0]
example

{'path': '/home/a_notable_alpaca/.cache/huggingface/datasets/downloads/extracted/6647af8fc432b0860cf489c57dcb5f6c7483fb05c037f9923015f60a77ad431b/en-AU~PAY_BILL/response_4.wav',
 'audio': {'path': '/home/a_notable_alpaca/.cache/huggingface/datasets/downloads/extracted/6647af8fc432b0860cf489c57dcb5f6c7483fb05c037f9923015f60a77ad431b/en-AU~PAY_BILL/response_4.wav',
  'array': array([2.36119668e-05, 1.92324660e-04, 2.19284790e-04, ...,
         9.40907281e-04, 1.16613181e-03, 7.20883254e-04]),
  'sampling_rate': 16000},
 'transcription': 'I would like to pay my electricity bill using my card can you please assist',
 'english_transcription': 'I would like to pay my electricity bill using my card can you please assist',
 'intent_class': 13,
 'lang_id': 2}

In [4]:
classifier(example["audio"]["array"])

[{'score': 0.9625311493873596, 'label': 'pay_bill'},
 {'score': 0.028672724962234497, 'label': 'freeze'},
 {'score': 0.003349797800183296, 'label': 'card_issues'},
 {'score': 0.0020058038644492626, 'label': 'abroad'},
 {'score': 0.000848432828206569, 'label': 'high_value_payment'}]

In [5]:
id2label = minds.features["intent_class"].int2str
id2label(example["intent_class"])

'pay_bill'

----

## References

* [Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) blog post explaining XLS-R and Wav2Vec2.