Skip to content

training a huggingface model to differentiate between spanish dialects

Notifications You must be signed in to change notification settings

HSavich/dialect_discrimination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

Dialect Classification

In many cases for speech classification, audio -> text -> text classification is a valid workflow, and avoids a lot of the awkwardness of working with audio. However, in some cases salient information is lost in the process of transcription. One class of these cases is when we want to investigate a certain manner of speaking.

In this project, I have trained a wav2vec2 model to classify Spanish speakers based on their accents by where they are from. The data we use contains 5 locales of Latin American Spanish, + Basque to further investigate how much easier we can expect language discrimination to be compared to dialect discrimination.

The Data

The high-quality audio clips of Spanish were introduced in a 2020 ACL Anthology paper available here.

spanish dialects dataset info

For each language, the dataset contains a variety of speakers with different speaker profiles. This is favorable to some other datasets with 1 or very few speakers because our model is more likely to learn identifiers of dialects than individual voices / prosodies.

The data for Basque comes from a similar campaign described here, only with a focus on minority Western European languages rather than Latin American Spanish.

The Model

For this task, I used a wav2vec2 as a base before fine tuning. This model is introduced here.

A quick rundown of the model architecture (image from original paper).

wav2vec2 architecture

The zoomed-out view of this model is we use a CNN on a normalized waveform to extract features. From the features we use a transformer network to learn a contextualized representation, but also use a discretized representation that helps our model identify distinct speech units. Both the discretized and contextualized representations are passed forward in the network.

Fine Tuning

The actual training process for this model was relatively light weight, only taking about half an hour (and starting to overfit within that time).

image

Results

image

Accuracy = 0.980

Analysis

Even splitting on speakers, our model achieves excellent accuracy on the testing set. This is interesting because it indicates that accent classification, at least at this granularity, is an easier task than voice identification, which could have just as easily met the training objective.

The confusion matrix shows that Basque is the most easily distinguished, which should be expecting as it is the only language that isn't Spanish. Puerto Rican was the hardest to identify in the testing set, but I think this is more having to do with PR having the least data moreso than something about the accent itself.

I think if this same size of dataset was used for this same experiment, but there were more speakers (and so not as much fitting on individual voices), we could expect near perfect accuracy.

References

Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech (Guevara-Rukoz et al., LREC 2020)

Open-Source High Quality Speech Datasets for Basque, Catalan and Galician (Kjartansson et al., SLTU 2020)

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baveski et al., Facebook AI 2020)

Huggingface Audio Classification Tutorial

Further Reading

Towards Data Science Blog about wav2vec2.0 This blog posts offers some easy reading for understanding the wav2vec2.0 model, as well as how it relates to and improves upon models before it.

Classifying Accents from Spectograms This blog post describes a computer-vision approach to classifying accents by using spectrograms rather than raw waveforms.

What's a Language Anyways? Outside the machine-learning realm, this The Atlantic article discusses the nuances of classifying languages, dialects, accents, etc. These nuances describe what's achievable with a dialect identifier like the one I trained. Sidenote: this author, John McWhorter, has a great podcast about linguistics!

This Model on Huggingface Hub

https://huggingface.co/hhsavich/accent_determinator

About

training a huggingface model to differentiate between spanish dialects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published