Parts of speech

clbecker edited this page Jul 13, 2012 · 1 revision

Home

Accessing Parts of speech Data

In this section I'll go over the code for grabbing parts of speech for each language.

use Wiktionary::Parser;

my $parser = Wiktionary::Parser->new();
my $document = $parser->get_document(title => 'dog');

You can retrieve all part of speech data using the get_parts_of_speech() method on the Document object. This returns a hash mapping language to metadata that includes a list of parts of speech (noun, verb, adjective, etc).

my $pos = $document->get_parts_of_speech();

for my $language_code (keys %{$pos || {}}) {

    my @parts_of_speech = @{ $pos->{$language_code}{part_of_speech} };

    print "\n$pos->{$language_code}{language}\n";

    print "\t $_ \n" for @parts_of_speech;

}

The parts of speech that are parsed from each Wiktionary page come from section headings. The current list of sections that are included in "parts of speech" are the following:

noun
verb
adjective
adverb
pronoun
preposition
article
conjunction
determiner
interjection
symbol

Home