Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Connect GAZ to ENVO #65

Open
GoogleCodeExporter opened this Issue Mar 28, 2015 · 16 comments

Comments

Projects
None yet
1 participant
We need to provide linking axioms from GAZ to ENVO

These should be rdf:type/ClassAssertion links

E.g.

'Lake Nyar' rdf:type ENVO:lake

There are a variety of approaches

 1. manual curation
 2. parsing labels
 3. mining wikipedia
 4. use some external resource

The simplest cheapest approach is (2). E.g. "Lake Nyar" is a lake.

This will undoubtedly produce multiple false positives, we don't know how many 
until we try the experiment.

After the experiment we can try approaches (e.g. override list, using OWL 
axioms) to weed out incorrect assertions.

Original issue reported on code.google.com by cjmung...@lbl.gov on 24 Jun 2013 at 10:13

This issue was updated by revision r70.

Original comment by cmung...@gmail.com on 24 Jun 2013 at 10:23

  • Added labels: ****
  • Removed labels: ****
Some amusing false positives from a completely naive textual approach:

GAZ:00212662 ! Blood River      ENVO:02000020 ! blood
GAZ:00279813 ! Sweat Lodge Cave ENVO:02000025 ! sweat
GAZ:00210016 ! Tears Gut        ENVO:02000034 ! tears

GAZ:00401117 ! Municipality of Saliva   ENVO:02000036 ! saliva

GAZ:00267895 ! Custard Hollow Cave      ENVO:02000066 ! custard
GAZ:00030673 ! Jam County       ENVO:0010139 ! jam
GAZ:00167599 ! Cake Rock        ENVO:02000063 ! cake
GAZ:00110585 ! Busted Mushroom  ENVO:02000072 ! mushroom



Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 10:27

  • Added labels: ****
  • Removed labels: ****
This commit improves the candidates:

http://code.google.com/p/envo/source/detail?r=71


Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 10:59

  • Added labels: ****
  • Removed labels: ****
remaining oddities:

GAZ:00256267 ! Market Shop      ENVO:00002221 ! shop
GAZ:22224553 ! Park County      ENVO:00000562 ! park

Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 11:13

  • Added labels: ****
  • Removed labels: ****
This issue was updated by revision r73.

added scoring scheme - longer words score more, use of 'of' adds a bonus (e.g. 
city of X), appearing last has a slight edge (e.g. glacier bay is a bay not a 
glacier)
only select best match

Original comment by cmung...@gmail.com on 24 Jun 2013 at 11:16

  • Added labels: ****
  • Removed labels: ****
This issue was updated by revision r74.

Original comment by cmung...@gmail.com on 24 Jun 2013 at 11:17

  • Added labels: ****
  • Removed labels: ****
This issue was updated by revision r75.

Original comment by cmung...@gmail.com on 24 Jun 2013 at 11:28

  • Added labels: ****
  • Removed labels: ****
This issue was updated by revision r76.

Original comment by cmung...@gmail.com on 24 Jun 2013 at 11:28

  • Added labels: ****
  • Removed labels: ****
Here is a first pass:

http://envo.googlecode.com/svn/trunk/experimental/gaz-typeof-envo-auto.tbl

Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 11:30

  • Added labels: ****
  • Removed labels: ****
Here is a first pass:

http://envo.googlecode.com/svn/trunk/experimental/gaz-typeof-envo-auto.tbl

Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 11:42

  • Added labels: ****
  • Removed labels: ****
cc @vagpafilis@gmail.com who has experience entity matching using ENVO

Original comment by cjmung...@lbl.gov on 24 Jun 2013 at 11:59

  • Added labels: ****
  • Removed labels: ****
Note:

ENVO EXACT synonyms are not reliable enough to be used here.

E.g. "islands" not be an exact syn for "island". Use PLURAL tag. "Canary 
Islands" is not a single island.

Consider adding a new term "collection of islands" or "island cluster" (not 
necessarily an archipelago)

However, we have many missing links because we are not using exact syns.
For example, the many GAZ terms that end "Trench" are probably "ocean trenches" 
(but not all).

Probably better to use rules here. E.g. if a path to GAZ:00000590 ! Oceans and 
Seas, AND ends in "trench" then it is an 'ocean trench'


Original comment by cjmung...@lbl.gov on 25 Jun 2013 at 12:03

  • Added labels: ****
  • Removed labels: ****
It would help if we added more synonyms to Gaz. E.g.

Add synonym "amphoe" to "second-level administrative distrinct" (maybe add Thai 
language tag? But it's not really a language distinction I think). Same for 
"tambon". See http://en.wikipedia.org/wiki/Tambon

Original comment by cmung...@gmail.com on 30 Sep 2013 at 2:54

  • Added labels: ****
  • Removed labels: ****
It would be useful to have collections in envo:

 * chain of islands
 * chain of lakes
 * chain of hills

..

Original comment by cmung...@gmail.com on 30 Sep 2013 at 4:36

  • Added labels: ****
  • Removed labels: ****
@ #12 and #14
"archipelago" may also refer to the stretch of water that contains several 
islands. Is this the concern? The term is used interchangeably with island 
group or island chain, however. Perhaps we can replace "archipelago" with a 
class like "island chain", but keep the archipelago term as a broad or related 
synonym?

We do have "hill range", "montain range", etc. Are these terms not appropriate?



Original comment by p.buttig...@gmail.com on 12 Oct 2013 at 2:56

  • Added labels: ****
  • Removed labels: ****
I was naively assuming archipelago = mereological sum of islands, but it makes 
sense it can be the sum of islands plus water in the region, i.e. the whole 
connected region. I think it's fine to go with this definition, whatever the 
experts say. It would be good to be precise in the definition.

#12 - here is an example:

[Term]
id: ENVO:00000098
name: island
...
synonym: "islands" EXACT [Geonames:feature]

I don't think it's good to include the plural form as a synonym. However, where 
this is done, I think it is good to tag this. E.g.

synonym: "islands" EXACT PLURAL [Geonames:feature]

#14

I retract this suggestion for now. I think existing "range" terms are much 
better than making abstract disconnected mereological sum classes. (the ranges 
could still be defined in terms of a mereological sum plus connecting "tissue" 
- sorry can't help but think in anatomical terms - but this would be more in 
the background)



Original comment by cmung...@gmail.com on 12 Oct 2013 at 7:32

  • Added labels: ****
  • Removed labels: ****
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment