Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing 's' key when looking up similar adjectives #15

Closed
rdeits opened this issue Jan 13, 2019 · 1 comment
Closed

Missing 's' key when looking up similar adjectives #15

rdeits opened this issue Jan 13, 2019 · 1 comment

Comments

@rdeits
Copy link
Contributor

rdeits commented Jan 13, 2019

(x-posted from Slack)

Looking up the similar adjectives for some synsets throws a key error:

julia> db = DB()
WordNet.DB

julia> lemma = db['a', "bare"]
bare.a

julia> WordNet.relation(db, synsets(db, lemma)[1], WordNet.SIMILAR_TO)
ERROR: KeyError: key 's' not found
Stacktrace:
 [1] getindex at ./dict.jl:478 [inlined]
 [2] #25 at /home/rdeits/.julia/dev/WordNet/src/operations.jl:10 [inlined]
 [3] iterate at ./generator.jl:47 [inlined]
 [4] _collect(::Array{WordNet.Pointer,1}, ::Base.Generator{Array{WordNet.Pointer,1},getfield(WordNet, Symbol("##25#27")){DB,Synset}}, ::Base.EltypeUnknown, ::Base.HasShape{1}) at ./array.jl:632
 [5] map at ./array.jl:561 [inlined]
 [6] relation(::DB, ::Synset, ::String) at /home/rdeits/.julia/dev/WordNet/src/operations.jl:9
 [7] top-level scope at none:0

This seems to be resolved by just replacing synset.synset_type with synset.synset_type == 's' ? 'a' : synset.synset_type in

ptr -> db.synsets[synset.synset_type][ptr.offset],
but I'm not sure if that's the right thing to do.

@oxinabox
Copy link
Member

I think,
rather that we should replace synset.synset_type with ptr.pos in

ptr -> db.synsets[synset.synset_type][ptr.offset],

I think that is a mistake in this implementation.
Determining which index to look-in for the related synset is a property of the pointer,
not of the source sysset.
(The question of POS vs Sysset type is interesting but moot, as the only time it differs is for Satelite adjectives s, and those never occur on ther right hand side of a relation (I checked with regex)).


Ramblings

A synset's pos determines which file it is indexed in, adjectives are in data.adj,
a sysnset's sysset_type is a marking on it's line in that file, for most things it is one to one with the pos, but for adjectives the data.adj file contains both s and a.

The syset objects themselfs in wordnet.jl contain both pos (from the file) and syset_type (from the line).
And for indexing purposes the pos is used. (both for our interal dict, and as mentioned for files)

If we look at 2 links of dict.adj

00459631 00 a 01 unclothed 0 019 ^ 00060656 a 0000 ! 00455759 a 0101 & 00460031 a 0000 & 00460299 a 0000 & 00460521 a 0000 & 00460697 a 0000 & 00460843 a 0000 & 00460973 a 0000 & 00461135 a 0000 & 00461243 a 0000 & 00461363 a 0000 & 00461476 a 0000 & 00461586 a 0000 & 00461779 a 0000 & 00461914 a 0000 & 00461986 a 0000 & 00462109 a 0000 & 00462190 a 0000 & 00462329 a 0000 | not wearing clothing  

00460031 00 s 04 bare 0 au_naturel(p) 0 naked 0 nude 0 007 & 00459631 a 0000 + 14479883 n 0401 + 10385098 n 0401 + 14479586 n 0402 + 14479586 n 0403 + 14479586 n 0301 + 14480341 n 0101 | completely unclothed; "bare bodies"; "naked from the waist up"; "a nude model"  

We can see 00460031 00 s 04 bare which is what you are querying about.
having a relation & 00459631 a 0000
which reads as similar to 00459631 a 0000 which is I believe 00459631 00 a 01 unclothed.

Now the thing we ned to extract out of a relation line (& 00459631 a 0000) info about which index to look in (which is our dict currently keyed on pos) and the offset.
So for the purpose of this the a tells us which index (this is ptr.pos), and the 00459631 tells us the offset (which is basically an ID for a synset).
The first 00 tells us the source and the last 00 tells us the target, idk what that actually means though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants