clbecker edited this page Jul 13, 2012 · 4 revisions


Accessing Pronunciation Data

In this section I'll cover how to access pronunciation metadata, including phonetic representations ( IPA, SAMPA, EnPR/AHD), and audio files. The code for this example can be found in /bin/

use Wiktionary::Parser;

my $parser = Wiktionary::Parser->new();
my $document = $parser->get_document(title => $word);

Retrieve all pronunciation metadata using the get_pronunciations() method

my $pron = $document->get_pronunciations();

The code block below walks through this data structure. There's a lot of stuff lumped into the pronunciation sections in Wiktionary, including rhyming words, homophones, and hyphenated word forms.

The example below contains the code for going through pronunciations and printing out the phonetic representations.

The section below that shows you how to download the pronunciation audio files to your local machine.

for my $language_code (sort keys %{ $pron || {} }) {

    my $pronunciation = $pron->{$language_code}{pronunciation};
    my $audio = $pron->{$language_code}{audio};
    my $rhyme = $pron->{$language_code}{rhyme};
    my $homophone = $pron->{$language_code}{homophone};
    my $hyphenation = $pron->{$language_code}{hyphenation};

    my $language_name = $pron->{$language_code}{language};

    print "\n Language: $language_name, Code: $language_code\n";

    if ($pronunciation) {
        print "\n\t Pronunciations: \n";
        for my $representation (@$pronunciation) {
                "\t\t%s: %s\n",
                    ', ',
                    map {encode('utf8',$_)} @{ $representation->get_pronunciation() },

    # if there are audio files linked to these pronunciations
    # the audio objects provide methods for downloading the .ogg files
    if ($audio) {
        for my $aud (@$audio) {
            printf( "\n\t Audio Available: %s, File: %s\n",

            $aud->download_file(directory => '/tmp/')