Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

support for first-class place model #79

Merged
merged 11 commits into from
@stoicflame
Owner

The following commentary was submitted by @ttwetmore, opening it up here for discussion, my own comments to follow:

In the GEDCOM X model places are “second-class citizens” (don’t occur in their own right as record level entities) and are not hierarchical. I believe there are places where places can be second-class citizens, that is, simply as attributes of an event or other attribute, but there are many places where an independent set of hierarchical place records could be useful. I also believe places should be hierarchical, with each place able to refer to one or more more inclusive places that they are a part of. Most modern genealogical systems seem to have some kind of place expert module built in that provides this associated structure of place information. Here is a proto specification of a possibility:

enum PlaceType {
  Village = 1;
  Ward = 20;
  District = 21;
  Town = 2;
  City = 3;
  Parish = 4;
  County = 5;
  State = 6;
  Province = 7;
  Region = 8;
  Country = 9;
  Duchy = 20;
  Bishopric = 21;
  Kingdom = 22;
  Continent = 10;
  Planet = 11;
  StellarSystem = 12;
  Galaxy = 13;
  Universe = 14; // In case the multi-universe conjecture is true.
}
message PlaceStructure
{
  required string name = 2; // Name of this place.
  optional PlaceType type = 3; // Type of this place.
  repeated UUIDValue parentIds = 4; // Places this place is in.
  repeated AttributeStructure attributes = 9; // Attributes of this place.
  optional InfoStructure info = 10; // Other information about this place.
}
message PlaceMessage
{
  // Record id of this place.
  required UUIDValue recordId = 1;
  required PlaceStructure place = 2; // Structure describing this place.
}

In my Google protocol notation, a structure is information that is found inside a record, and a message is a record that will be encoded, transmitted and decoded as a whole. In the DeadEnds model a place structure may be used in a number of contexts. One such context is shown here as the contents, along with a UUID value for a record id, of a place record.

@ranbo
Owner

It has been surprising to me how many different people have asked that place be included as a first-class part of a record model. Part of this may be because places are so important to genealogy, and everyone recognizes the importance of some kind of source authority (or "expert system").

In a genealogical record, I would think places are properly expressed as an attribute of an event (a "second-class" citizen), because we care most about what the record says about the persons it mentions, instead of what it says about the places it mentions. However, when building a place authority, you may well care about what a gazetteer or history book says about a place, and in fact you might want to capture such information as "evidence" for your conclusions about what is true (and has been true at different points in time) about that place. On the third hand (left foot?), when doing genealogical research, you might want to see the conclusional knowledge about the place that is mentioned in a record in order to give you deeper background knowledge from which to draw stronger conclusions or guide your research.
Finally, we often have information in a record about the "place type" of various "place parts", such as when a form says "county:" and the blank is filled in with "Fulton". So we want a record model to be able to capture what the record says about place parts, in order to disambiguate when interpreting the place.

A "place source data model" seems like it would be best as an effort separate from GedcomX. On the other hand, GedcomX might benefit from having access to a "place conclusion model" (or just "place model") that can serve up things like the place name in various languages; place boundaries and name changes over time; hierarchical relationship to "parent" and "child" places; etc.

@carpentermp
Owner

My gut reaction is that there ought to be a robust "place model", but that it belongs as a separate model from Record. Places in Records might refer to Places in a place authority, or "archive of place records", but they wouldn't redundantly include place information beyond identifying the intended place. In SoRD, we had a "standardized place" which was a URI for the "place authority" and "identifier" for the place in that authority. Perhaps we should consider something along these lines for GedcomX?

@ttwetmore

I agree that places are not as important as persons in a genealogical model, but why wouldn't they be modeled in the same model that holds the record object? I would think that GEDCOM X should model the universe that is important to working genealogists, and place objects play a major role in that universe. Frankly I don't find this universe to be so extensive that it needs multiple models to capture it.

As far as first and second class citizenship goes, see the model at the start of this threatd The PlaceStructure is the second class citizen -- it can appear as as attribute of any event or other attribute that requires a place; in this case the place can be self-contained at the spot it appears. Though note, please, that a second class place citizen can refer to first class place citizens through its parent pointers (places can have multiple containing places if needed for historical or ecclesiastical reasons). And PlaceMessage is the first-class place citizen, complete with a UUID that gives it independent existence.

A "place authority" could be based on a network of these PlaceMessages that could be prepared by third parties. This is the way I believe all current servers and some desktop systems do it now. A GEDCOM X transmission file could either include the place records required by the data, or, since the proper way to do this would be through a UUID-based scheme, the places would not have to be transmitted as they would have a universal definition somewhere "out there" on the web.

This notation is Google protocol buffers, and can be put into JSON, Relax NG or Schema notation at will.

@stoicflame stoicflame referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@stoicflame
Owner

Apologies for the delay in this; I'm at JavaOne, making my bandwidth limited.

Please note the proposal at #88, which allows for a to-be-defined definition (whether part of the GEDCOM X spec or not) that specifies a place as either (1) a typed literal string or (2) a separate resource or (3) both. This gives us enough flexibility to move ahead with a release of the specification without painting ourselves into a corner when/if the community determines a "place authority" or "place standard" is needed.

@ttwetmore

Add the notion of "recursion", that one place can be a part of another place, and "I'm in."

@stoicflame
Owner

Add the notion of "recursion", that one place can be a part of another place, and "I'm in."

That would undoubtedly be part of the definition of a first-class place model.

@stoicflame stoicflame referenced this pull request from a commit in stoicflame/gedcomx
@stoicflame stoicflame definition of a normalized value to provide mechanisms for issues #60,
…#83, #79, and as discussed in issue #69
92f1ec9
@stoicflame stoicflame closed this
@stoicflame stoicflame reopened this
@EssyGreen

I welcome the idea of a Place entity but as far as I can see that's not what we've got since the place details are still embedded within each Fact rather than having their own identity which is then cross-referenced by each Fact.This does not allow the Place itself to be extended and/or further details to be recorded against it without replicating and fragmenting data throughout the model.

Also, please bear in mind the fact that Places are not necessarily hierarchical and it will not be feasible to come up with a universal place labelling system (beyond latitude/longitude) ... For example, in the UK, records may refer to Parishes (ecclesiastical boundaries) and/or Districts/Sub-Districts (civil authority boundaries) and/or (parliamentary) Boundaries; UK censuses are broken up into Enumeration Districts (walkable areas for enumerators). All of these boundaries change over time and cannot be neatly boxed inside one another in a geographical hierarchy.

It seems to me that the Record and Conclusion Models have things the wrong way round ... the PlaceParts in the record model just needs to be a list of key/value pairs, (the ordering of which may or may not be hierarchical) with the labels being standardised according to the original source and/or the application. The PlacePartTypes seem to be a rather haphazard selection of labels and I'm not convinced that defining these adds anything to the model.

Conversely, the Conclusion Model uses an Original and FormalValue (both of which are open-ended) but it seems to me that this is where defined parts are useful (e.g. Latitude/Longitude) and where the Place should be a link to a Place object rather than embedded within the Fact. The Place object can then define such things as: Full name, abbreviation, postal address, Latitude/Longitude etc. It can also then have it's own embedded Facts (e.g. "Great Flood of Sheffield", 1864, "blah blah description" plus source references, illustrations etc)

@joshhansen

Places are already effectively modeled by a number of existing vocabularies. The SpatialThing class from W3C's geo vocab is widely accepted for representing longitude/latitude coordinates.

GeoNames provides a widely used extension of SpatialThing called Feature, by which is defined a hierarchical feature model like people want for GedcomX. GeoNames also provides a freely available database of 10 million place features around the world. Support has recently been added for historical places as well. If GeoNames doesn't meet users' needs, there are a number of other vocabularies that might do the job.

Attempting to implement our own place vocabulary seems foolish since so many already exist. Instead of reinventing the wheel, let's have GedcomX accept any SpatialThing as a place. If people wish to represent place hierarchies, they can do so using a vocabulary like GeoNames. GedcomX could endorse such a vocabulary or even import it into the GedcomX namespace so the genealogy community can rally around a single standard. GedcomX could also introduce new feature types for representation of political entities like duchies, kingdoms, and so on.

@EssyGreen

Attempting to implement our own place vocabulary seems foolish since so many already exist

+1

My original point was that the Place needed to be an entity rather than having its details embedded within each Fact. At the moment the Place in the Conclusion Model is still an embedded object and the Place in the Record Model inherits from a Field.

I believe that both should be the same (or should inherit from an abstract Place object in the common model) and that it should be a top-level Entity.

@DallanQ

@joshhansen There are a lot of online gazetteers: http://www.alexandria.ucsb.edu/~lhill/dgie/DGIE_website/gaz_links.htm lists several.

I looked at a number of these when creating the place database for WeRelate.org, which is now available as a free download:

http://github.com/DallanQ/Places

It includes both current and historical places, alternate names, many places list both their historical and modern jurisdictional hierarchies, and many places include coordinates.

  • Geonames: Lots of places, modern only (or mostly), most places are geographic features like lakes and rivers, but places are in a flat hierarchy -- that is, cities in England did not list the county they are in. Having a hierarchy is pretty important - how do you know which Sutton in England to match when the user says "Sutton, Bedfordshire, England"? There are a dozen different Sutton's in their database for England, and you don't have any way to determine which Sutton is in Bedfordshire, except by calculating shortest distance from each Sutton to the centroid listed for Bedfordshire - not very reliable. Because of the lack of hierarchy, I ended up not using this resource. I wasn't aware that they had included historical support, though it appears still in the very early stages. They've added an "isHistorical" flag for names that are no longer used, and are considering adding fromPeriod and toPeriod. Until they add jurisdictional hierarchies to their database, they won't have even scratched the surface of historical issues though.

  • Getty Thesaurus of Geographic Names: http://www.getty.edu/research/tools/vocabularies/tgn/ Smaller than Geonames, around 1.7M names for 992K distinct places, mostly modern, though more historical places than Geonames, most places are geographic features, places are in a hierarchy(!), data compiled from about a dozen different sources: mainly NGA/NIMA but also Rand McNally, Encyclopedia Britannica, Domesday book, generally lists places under the jurisdictional hierarchy they appeared in about 12 years ago. I got permission to include their populated places and political jurisdictions into the WeRelate place database. More information: http://www.getty.edu/research/tools/vocabularies/tgn/about.html and http://www.getty.edu/research/tools/vocabularies/tgn/faq.html

  • Alexandria Digital Library Gazetteer: http://www.alexandria.ucsb.edu/gazetteer/ContentStandard/version3.2/GCS3.2-guide.htm I obtained a license to this as well, but after reviewing it, it seemed similar to Getty so I did not use it.

  • Family History Library Catalog: The only resource I was able to find with historical places. Most (but not all) places are listed according to the jurisdictions they were in just prior to WWI. There are some duplicates: some places listed under Galicia are repeated under Poland for example. I crawled the the FHLC place database back in 2005 and included it in the WeRelate place database.

  • Wikipedia: Both current and historical places. A terrific source of information, but difficult to extract. I extracted 10's of thousands of places (certainly not all of them, but the ones that had decent templates for extraction) back in 2005 and included them in the WeRelate place database. A side-benefit of incorporating Wikipedia is that the database includes links back to the wikipedia articles, which often have helpful historical information. (Though the links aren't included in the extract on github; I'll fix this shortly.)

  • Freebase.com: http://www.freebase.com/view/location updated database of places they've extracted from Wikipedia. Includes about 80,000 current and historical places. I'd love to integrate this into the WeRelate place database, though it will be a big project (see below).

  • OpenStreetMap: http://www.openstreetmap.org/ has coordinate information for modern places, and places are arranged into a hierarchy(!), I'd like to use this to fill in missing coordinates into the place database at WeRelate.org.

  • Statoids.com: http://statoids.com/ not a place database per se, but a fantastic source of information for how jurisdictions have changed over time. I used this and wikipedia and Encyclopedia Britannica when compiling the WeRelate place database (see below).

The big challenge when creating a place database is not getting the data -- as you can see, there are many sources for that. It's merging data together from multiple sources without creating duplicates. You want to say that City X in Historical Province Y from the FHLC is the same as City X' in Modern State Z in Wikipedia. Merging duplicate places is generally harder than merging duplicate people, because place names can change dramatically after wars. Even merging Getty and Wikipedia was challenging, because of the changes European countries have made to their jurisdictional hierarchies over the past 10 years due to the EU. I spent months merging Getty, FHLC, and Wikipedia together, and WeRelate users have spent the past seven years continuing to clean it up and organize it better afterward. If you're going to try to create your own current+historical place database, take the merge-time into account. Or just use the free one I posted on github.

I recently matched 7.5M places appearing in the 7K gedcoms submitted to WeRelate over the past five years to see what kinds of problems were occurring most frequently:

  • We don't have comprehensive coverage for US townships. This is on my short-list of things to add.
  • We still have duplicate places in Eastern Europe due to FHLC having duplicates that were not caught.
  • We still don't have all of the historical and modern places in Europe merged (though many have been merged).
  • We don't have all of the historical jurisdictions listed.
  • We're missing some places (though not that many).

I just posted this a couple of weeks ago, so there may still be some rough edges. I know of at least one other organization who's using it already, and I'm talking with several other organizations who are interested. I'm making it freely available so that others don't have to go through the pain that I did.

Dallan

@stoicflame
Owner

Hey @DallanQ, what are your thoughts/recommendations about a standard model/vocabulary for defining places for genealogical research? Is there one out there that's more widely used? Is there one out there that seems to fit better for genealogical uses? Do you know much about the geo vocab that @joshhansen suggested?

@DallanQ

In my opinion, it will be easiest for genealogists to continue entering places using a simple text string as they do now, and have the computer supply an auto-complete list and/or try to match the place to a standardized locality for purposes of search, mapping, etc.

I use a fairly simple model for standardized places at WeRelate. Here's an example page. The XML is embedded in an "XML island" in the wiki pages, and is exported in the place project at http://github.com/DallanQ/Places Following is the schema in Relax NG Compact form.

element place {
element alternate_name { attribute name {text} attribute source {text}? },
element type { text }
,
element from_year { text }?,
element to_year { text }?,
element latitude { text }?,
element longitude { text }?,
element also_located_in { place {text}, from_year {text}?, to_year {text}? },
element see_also { place {text}, reason {text}? }

}

Type comes from a controlled vocabulary described here. Years and latitude/longitude are restricted to the appropriate types naturally.

It's pretty straightforward, which I believe is important when creating a crowd-sourced database. In addition to the XML on the page, the page title provides the primary name of the place as well as the jurisdiction it was located in just prior to WWI. I've thought about adding a year-range attribute to alternate_names, but other than that nobody's asked for any additional fields.

The download includes the XML information along with an unchanging ID number for each place as well as its primary name and jurisdiction. Each place page on WeRelate also includes a description and links to Wikipedia, Getty, and the FHLC. These are not included as part of the download, in order to keep the file small and because no one's asked for them yet.

I've seen some models try to define: things like: Level1 administrative subdivision, Level2 administrative subdivision, Level3 administrative subdivision, populated place, etc. I think you could go there, but I personally believe that having an explicit type (possibly multiple types actually) assigned to each place is more informative. In other words, I believe it's helpful to the user to know that Middlesex, England was both one of the historic counties of England (pre 1889) and also one of England's administrative counties (1889-1974), not simply that was a "Level1 administrative subdivision".

@EssyGreen

It's a bit of a strange example since I don't know anyone who would quote "London, London, England" but in principal I agree providing there is a Place Hierarchy/Format which the user can specify much the same as old GEDCOM PLAC.FORM. The latter is essential to cater for partial place names and those which only exist in the context of a specific source (e.g. a sub-district of a UK census)

However, how would you cater for the moving boundaries of a county whose name has stayed the same (e.g. Somerset)?

@DallanQ

The city of London used to be located within the historical county of London in the early 1900's, which is why it's named London, London, England. Today it's located within the modern county of Greater London.

One challenge in matching places is when someone enters "London, England" (or similarly "Santa Barbara, California") in their genealogy program, what place are they referring to? Do they mean the county or the city? I believe the standard suggests to include intermediate levels, so if they say "London, England" (or Santa Barbara, California) they mean the county, and if they want to reference the city they should say "London, London, England" (or Santa Barbara, Santa Barbara, California). A little odd, but it's unambiguous that way.

@joshhansen

Thanks for taking the time for such a thorough reply, @DallanQ. A few thoughts:

Changing Geographies

@EssyGreen points out a use case that seems best served not just by place names and hierarchies, but by actual shape data. Suppose County X had a certain set of boundaries prior to 1840 and another set after 1840, with some parts added from county Y and some parts removed to form County Z. For every township, city, borough, and village located within the union of the territories of counties X, Y, and Z, we want to be able to answer the question "What county did this place belong to in year N?" The way to answer this question in a spatially naive hierarchy would be to store the data separately for each place, such as with the also_located_in property in the WeRelate vocabulary. Establishing these relationships is a time-consuming and error-prone process.

On the other hand, if the position of each place is known, and the boundaries of each county at different points in time are known, then the question can be answered simply by searching for counties whose boundaries at one time contained the place in question. If spatial data are available but the also_located_in relationships are still being represented place-by-place, there is a normalization problem, because the also_located_in is functionally determined by the spatial data, and obviously the spatial data are not functionally determined by also_located_in.

Note that the LinkedGeoData project exposes OpenStreetMap shape data as RDF, using an ontology that is also linked to both GeoNames and DBPedia.

GeoNames

Even though I'm not associated with the GeoNames project, I'm going to toot their horn for them a little bit. GeoNames is hierarchical. For a demonstration of this, see Sutton in the "Geotree" interface. The proof, though, is in the RDF, where the gn:parentFeature property points upward to the next highest jurisdiction. In fact, almost the entire path in the place hierarchy tree can be found by combining gn:parentCountry, gn:parentADM1, and gn:parentADM2 in that order.

When there's an open project actively trying to expand its coverage, it's great to build up that resource whenever possible. The Web makes it possible to link datasets together. As long as the GeoNames people are doing a reasonable job and accepting contributed data, why not let them do the work of maintaining the core places database? The GeoNames people are soliciting historical place data to incorporate into GeoNames. Why not get the WeRelate data incorporated into GeoNames? For an operation such as FamilySearch that needs its users to be able to make changes to the places database, just make a copy of GeoNames, then track whatever changes you and your users make to it. Then make those changes available to the GeoNames maintainers upstream so they can incorporate the changes they think are generally useful. An Open Source model of upstream/downstream development would work brilliantly here.

Extending the Vocabulary

But this is really about what vocabulary to use in GedcomX. GeoNames' vocab suffers from being overly generic. The parentADM<number> thing is clunky and results in a lossy representation of place data. Seems like what you really need is a separate place type hierarchy, and then the ability to say that any place is a member of any number of types. Since having more specific place types and so on is not in conflict with the more general GeoNames stuff, why not have GedcomX provide an extension of the GeoNames vocab that provides greater specificity?

A place type hierarchy could look like

Place -> Point (?)
Place -> Area
Area -> Continent
Area -> Country
Area -> County
Area -> Township
Area -> InhabitedPlace
InhabitedPlace -> City
InhabitedPlace -> Village
Area -> EcclesiasticalArea
EcclesiasticalArea -> Parish

In RDF-land this would be implemented using rdfs:subClassOf

"Administrative division" strikes me as a relationship between place types:

Country administrativeDivisionOf Empire
County administrativeDivisionOf Country
Township administrativeDivisionOf County
Parish ecclesiasticalAdministrativeDivisionOf Diocese (?)

Then the actual places form their own hierarchy:

Earth -> Europe -> UK -> England -> Bedford -> Sutton

For each individual place its type(s) and relationships to other places are specified:

Sutton placeType City
Sutton placeType Parish
Bedford placeType County
Sutton locatedIn Bedford

Conclusion

And now that this "comment" has become an essay, a conclusion is in order. Geographic Information Systems are complex and require substantial engineering effort. Many little geographic details you might not think would be relevant to genealogy will end up being the vital clue that answers a researcher's long-standing puzzle. GedcomX would do a disservice to the community to endorse an underpowered places vocabulary. Because if the unsettled state of this area, I see no shame in punting by having GedcomX accept any place that inherits from geo:SpatialThing. If GedcomX must endorse a more specific place model, only do so after a thorough process of requirements review and research, and please build on what's already out there rather than creating Yet Another Places Vocabulary.

See Also

@EssyGreen

The city of London used to be located within the historical county of London in the early 1900's, which is why it's named London, London, England

I am aware, but over here we would say the "City of London, London, Middlesex" or "London, Middlesex" or "Westminster, London, Middlesex" (or even historically "London, Surrey") but we don't tend to say "London, London" in the same way that Americans say "New York, New York" (least I've never heard it or seen it that way).

But hey I was being picky - apologies :)

@EssyGreen

when someone enters "London, England" (or similarly "Santa Barbara, California") in their genealogy program, what place are they referring to? Do they mean the county or the city?

This is why I think we need to keep the PLAC.FORM hierarchy.

@DallanQ

No apology needed! :-) I just commented because I wanted to bring out the complexities of matching places. New York City is another great example: a city that's not in any county, containing 5 boroughs, each of which corresponds to a different county, some of whose names have changed over time... :-)

@DallanQ

@joshhansen that's interesting that Geonames is hierarchical. Their http://download.geonames.org/export/dump/GB.zip file doesn't contain this information. (It didn't 5 years ago, and I just checked again today - it still doesn't). Where are you finding this information? How can I export it?

If Geonames wants to incorporate historical data, that's really good. But it will take a lot more than adding an "isHistorical" flag. They'll need to change their data model to include alternate names (including abbreviations and common misspellings), and multiple parent jurisdictions that can vary over time. Are they willing to add this? And the big question: Do their volunteers care enough about the complexities of historical data that they are willing to research it and enter it? Merging the WeRelate data into Geonames without creating duplicates will take a fair amount of effort on their part. I'd be happy to work with them, but I think we'd need some kind of commitment from their community that they were willing to take on the burden of entering and cleaning up historical place data. Currently at WeRelate there is a community of people who are willing to do this, though the effort is slow-going. Are there enough people at Geonames who are interested?

An interesting test would be to have Geonames try to match places in real-world GEDCOM files, both to see how well they do as well as to gauge their interest in doing better. A list of 7M places extracted from GEDCOM files is included in the github project: https://github.com/DallanQ/Places/blob/master/tools/src/main/resources/places.txt.gz The github project includes a ComparePlaces function that they could use to compare how they standardized these places to how they are standardized in the WeRelate place database. I'd be interesting in seeing a list of the differences.

Another historical place standard is the one that FamilySearch uses internally. They've had a team of people working on it for the past 30 years, but so far they have been unwilling to make it open-content. Hopefully they will change their mind soon.

I did a quick comparison between 38 place-texts that were matched differently between FamilySearch and WeRelate and found that they were comparable in the number of mistakes each made. I agree it would be better to work together on these things than to work independently. Until now I haven't found anyone who was interested in developing and improving upon an open-content historical place database. That's why I posted the one from WeRelate.

@EssyGreen

Interesting link @DallanQ but sadly adding a date perspective doesn't necessarily solve the problem. For example, in the UK a census and/or BMD certificate might record a person in one specific county, whilst a baptism record taken at the same date and place might record a different county if the geographical location was on (or close to) a boundary. In other words how the place is recorded will depend on the wider context of the record being taken.

@DallanQ

@EssyGreen of course, you have to write the place as it was recorded in the record. There will always be exceptions. At WeRelate we allow people to override what the computer suggests and display the place exactly as they want. But I hope you're not saying that dates on jurisdictional hierarchies are unimportant.

@EssyGreen

No I'm not saying that at all - just that it won't solve all the problems since there are other complexities.

@wwjohnston

I have two concerns about the proposed spec at the start of the discussion.

First, we really have to deal with streets and addresses, as well as rural land specifications (e.g Concession and Lot in Canadian townships). The reality is that these are very important sorting and reporting and analytical levels of places. There is a slight concession in the above model to include wards within cities. But you really need to be able to represent the specific place for a person and their family as it appears in the records. Treating this as "note" information in this day and age is failing to model properly. It may have made sense in the original GEDCOM, but we really have to break the shackles of the old GEDCOM in this.

Second, parish or town reconstructions are a very important -- and growing -- part of what people are doing with records -- the Familien Bucher for example at http://www.ortsfamilienbuecher.de which has very impressive coverage. When a community is being examined and the families connected over the course of several hundred years, the higher levels of the geographic hierarchy really do not matter, nor do the political or ecclesiastical changes at those higher levels. What matters is what house people lived in or which nearby town they lived in. So these are inherently a very narrow scope, so that trying to shoehorn them into having to deal with all the external higher levels as the centuries went by really adds nothing at all to the record for an individual or a family -- and in fact only muddies things up unnecessarily. We have to be cognizant of this very important use of databases and assure that GEDCOM does not take a too-narrow view.

I bring this second issue up, since this is forcing to deal with irrelevant higher levels is in fact what has happened with FamilySearch community trees, which are being forced into a global representation, using the 4-level hierarchy. The result is that trees that are built for local communities and have very rich information but only use the town name and house number are not acceptable because they do not have the higher levels included for every reference to a location. There is no need to enforce a globalization of community trees, but that is what FS has chosen to do. The result is that they are not able to compete with sites like the Familien Bucher who truly are addressing town or parish reconstructions at the appropriate place level and treating each database separately.

@wwjohnston

A PS on my prior comment ...

The very clumsy method of specifying places in the old GEDCOM was in high violation of data normalization, due to limits of technology that prohibited true normalization in those ancient days for any system that you actually wanted to function within that technology. Whatever is done with places, we should be moving away from forcing all that baggage into every single entered reference to a place. I am not sure how this ultimately can be done. GOV is coming up with a unique place identifier, which seems the only way to truly normalize place data. This is not an easy problem, but that does not mean we should not keep it very much in mind for the long-term vision.

@EssyGreen

@wwjohnston - some excellent points there!

Coming round full circle I think that the Place Parts in the Record Model seem to fit the bill quite well (the user/application being able to define the level of detail and the type for each segment) providing that the KnownTypes are only for convenience and not recommended and assuming that the order of the Parts implies an ascending? level of detail maybe?

The additional fields for geocoding etc (which are more relevant to contemporary interpretations) can be added as optional and/or custom fields within the Place Entity.

@EssyGreen

On a slightly different tack I think it would be useful to have a PlaceEvidence object to provide a ProofStatement that Place A = Place B (in a similar way to that described elsewhere for Persons)

@DallanQ

Would it be possible to allow the user to enter exactly what they want in a place field, with the place parts populated from the full-place text as metadata for the place? That way you wouldn't have to make your place conform to the model. You could enter a complete address, and the place parts would be filled in as appropriate with pieces of your place.

@wwjohnston

The more that I think about how we actually refer to places (and thus how letters, documents, personal histories, newspaper articles, interviews, etc. would have referred to them in the past) and also about how GEDCOM has referred to places until now, it seems there is a "natural" dividing line that people use. That line in people's thinking is the town or parish or - for rural areas - the township. It is where they think of the difference between "here" and "there", the difference between "hometown" and the rest of the world. There really are two entities to place - those within the boundaries of "here" and those that group all the "heres" at higher levels. This really is how people have come to think about places. You run into this (maddeningly) when you see a clipped-out newspaper article about a soldier in some war that tells you what street his parents live on but never mentions the town because that is the other entity, the "here" that does not need to be repeated every time you reference a place that is within "here".

So what does this have to do with GEDCOM? It means separating these two entities.

1-We keep the town-and-above entity that is how GEDCOM has worked until now, and all the standardization of place names and boundaries at different periods of time that are related to all those aggregations of "heres" that go with that entity.

2-But we also have a new entity that has never existed in GEDCOM - the entity of a place within a "here". This would certainly support recording an address from a clippedpout newspaper article without having to fabricate or guess at a "here" for it. But it would also deal with the issue that I raised earlier in this thread of addresses and specification of township/section numbers (or concessions and ranges in Canada) and also the issue of dealing with town or parish reconstructions.

The shoe-horning of every type of place information into a single entity has really gnawed at me, as being somehow unnatural and unwieldy. Separating place into two entities really works a whole lot better - both in modeling the real world and in managing place standards.

@EssyGreen

That sounds rather complex to me .... The way I see it is that we (as genealogists) are trying to re-create something that happened at a definite geographical location. How that location was/is referenced changes according to naming conventions of the time vs now; the context in which it was recorded (e.g. census district numbers); and the perception of the people concerned (as in @wwjohnston's examples above); and possibly in other ways too. As a genealogist I want source derivatives to reflect the naming conventions of the time (ie to be as accurate a reflection of the original as possible), but I also want the ability to say/"prove" that place X = place Y and to have a representation of that (which I might call place Z) where I can link together the various events/facts and other bits of information I discover about it. This is what I mean by a place entity. In old GEDCOM the place details (e.g. format, coordinates and notes) were scattered throughout so pulling them together into a single entity was a nightmare when importing a file. As long as the place field is just a single text or pointer/link in the Facts then I'm happy. Anything else should be in the Place entity.

@wwjohnston

We have to be careful to separate a place from all of the various hierarchies in which it might fall. Where I sit as I write this, I am in a specific house. That house is in a political city. It is also in a whole bunch of other jurisdictions with very different boundaries - a school district, a flood control district, a utility district, a state senatorial district, a state representative district, a US representative district, a federal reserve district, the diocese of this church, the riding of that church, the region of the Knights of Columbus, etc. All of these generate records about me and may be important to know about. But they really are all a function of the location of this house. So being able to identify a specific place on a map is what we are really trying to do.

And as @EssyGreen notes, we are sometimes faced with the issue of whether two places about which we have information are in fact one and the same, and we also need some established standard way of referring to any particular place -- all of which are again tied back to identifying a specific place on a map (a map of granularity down to at least the house level and up to the world level).

There are some basic political similarities in identification of towns or parishes or rural townships, within a relatively similar political hierarchy. This seems to be the best candidate for the standard hierarchy among all the many competing hierarchies in which any one place exists. And this is what the old GEDCOM was seeking to use -- and it was trying to allow people to do so in a time-sensitive way so that when political boundaries changed, the hierarchy changed. But the house never moved and was still at that same point on the map. The mail still got delivered there. People could still find it.

We need some sort of universal location, that is really independent of any hierarchy -- an identifier, which is the link that binds all references to it. This is what is so attractive about the GOV project, which fabricates such an identifier - at the village level. Something like this, on a worldwide scale, would once and for all nail down specifically what place is being referenced.

And yet, even that project still only considers the village as the bottom level of any place information, which of course is not an accurate reflection of how we really consider places in our lives. The village -- the "here" that I was referring to in my earlier note is a dividing line between two very different ways that we consider places. Within the village, we refer to places in the village. When we make an appointment to meet someone outside of town, the first thing we do is identify what town we will meet in. Then we start talking about the location within that town. When we are going on a trip, the airline books us to a specific airport, but once we are on the ground there we consider the places in a different way.

This is what I am seeing as necessary within GEDCOM. We need to handle the duality of places at a city and up level and also within a city (or township or parish). The lower levels can be quite dis-similar ... any westerner who has ever tried to find an address in a Japanese town has learned this lesson. So the two different clusters of locational information - within a town and the town itself as one among many - really have to support very different representations.

@EssyGreen

So being able to identify a specific place on a map is what we are really trying to do.

Absolutely :)

We need some sort of universal location, that is really independent of any hierarchy -- an identifier, which is the link that binds all references to it

Er ... isn't that just latitude and longitude? (Which I have no objection to in the Conclusion Model but have yet to see in any source Record).

We need to handle the duality of places at a city and up level and also within a city

I think the duality is a false construct. I understand what you are saying but I can't see how it would help

As a side note and as a genealogist/user I have toyed with the idea of assigning latitude and longitude coordinates to all my places. Several programs have maps built in and try to make it easy but it takes an age (given the thousands of places in my tree). At the end of the day I find the simplest thing is to go to Google Maps, find the place by examining the maps (and/or any historical maps I can), sticking a pin in and then copying the URL. If you can find an easier way I'll be impressed, meanwhile just gimme an entity I can paste my link into and I'll be happy :)

@jralls

So being able to identify a specific place on a map is what we are really trying to do.

The catch is that we seldom have a specific place on a map in our sources. We usually have what the Geographical Info Sys folks call a shape -- and with a few notable exceptions we don't even have the exact boundaries of that shape in machine-readable form that applied at the time of our source. @DallanQ touched on this when he noted that the coordinates given in the various place databases are centroids. That's good enough for showing how your family migrated over the centuries but not for deciding if two records from different times and jurisdictions refer to the same place.

@EssyGreen

The catch is that we seldom have a specific place on a map in our sources.

That's why we need investigative research ... not much different from trying to prove person x is person y it's just a place instead of a person .... and of course it could be that the researcher is researching their house not their ancestors at all ... or it could be that the researcher isn't interested in the place details at all. One man's bath water and all that.

@jralls

One man's bath water and all that.

Which seems to me to be an argument in support of @wwjohnston's split place model, with separate hierarchical/jurisdictional and "local" (e.g. street address) objects.

@EssyGreen

One man's bath water and all that.
Which seems to me to be an argument in support of @wwjohnston's split place model

It's an argument for a multiplicity of formats, one of which might be that one.

The important thing is that this sort of detail is managed in a Place entity rather than embedded within each fact.

@tfmorris

Places aren't always used just to find a map location. Often they're used to identify a jurisdiction of one type or another where one would look for a record.

I agree that a robust place model is needed, but I'd hate to see one developed from scratch solely for the genealogical community. This seems redundant and a waste of effort, in addition to making data sharing with other disciplines (local history, sociology, etc) difficult. The genealogy-specific piece is being able to store both the place name as given in the record and the researchers, possibly wrong, interpretation of it.

@DallanQ if you ever want to tackle integrating Freebase's millions of places, let me know. I'm very familiar with the data and many of the places have strong identifiers allowing trivial matching with GNIS, Wikipedia, NY Times, etc.

@EssyGreen

@tfmorris - excellent points! all of them!

@joshhansen

I have created a wiki page where we can track the current state of the discussion: https://github.com/FamilySearch/gedcomx/wiki/Place-Model

So far it has some use cases, requirements, and evaluation of one proposed solution. Please correct errors, make expansions, etc.

@wwjohnston

I think it is very important to explicitly state that "place" does not necessarily mean "town", since that is the assumption in the existing 4-level hiearachy. A place can be a house, a hospital, a cemetery, a ward, a town, etc.

@EssyGreen

@ joshhansen

Can I add another couple of use cases:

  • A researcher finds her ancestors in the 1851 Census in a place called X. The same family appears to be in the same place in the 1841 Census but the place is now called Y. Further researcher uncovers that Y was the old place name for X (no longer used). The researcher wants to document the sources as evidence/proof that place X = Place Y and be able to reference the place by either name in future records. She also wants to narrow down the time period when the name change occurred.

  • A researcher is told by an ageing relative that she remembers her great grandmother living in a named house in a specific town but the relative can't remember exactly where it was or if it still exists. The researcher wants to keep the house and town details (albeit with the missing street details) so she can look for further evidence which might clarify where it is.

@DallanQ

@tfmorris thank you for the offer. I'll take you up on it when I get around to incorporating that data.

I wish there were another organization who was tackling the problem of creating a database of both current and historical places and who was interested in making the information open-content. Maybe GeoNames will get there eventually. Until then, I'll continue to work on and freely provide the historical place database contributed by users of WeRelate.

@DallanQ

@EssyGreen if GedcomX were to support statements like Place X = Place Y, it seems that it would need to allow each reference to a place object to store a possibly-different name for that object. So the place reference for the 1851 census event would be {name: X, id: 123}, while the place reference for the 1841 census event would be {name: Y, id: 123}, with the Place object having an id of 123 along with other structured information. That seems like a good idea.

@wwjohnston

@DallanQ re "I wish there were another organization who was tackling the problem of creating a database of both current and historical places and who was interested in making the information open-content."

I hope that "someday" the new FHISO will be the home for that ... seems like the right place. But "someday" is probably well off into the future.

@EssyGreen

@ DallanQ

if GedcomX were to support statements like Place X = Place Y, it seems that it would need to allow each reference to a place object to store a possibly-different name for that object

The way I see it is that a Place is very much like a Person in being the subject of research and hence should be able to have Evidence for its attributes in the same way that a Person has evidence against their Facts.

@ranbo
Owner

@DallanQ said: ...{name: X, id: 123}, {name Y: id: 123}

Another variant on that idea is, rather than just have different strings and same place id, actually have different "sub-place IDs". "Constantinople" probably has many name variants in different languages, so it would be good to have an ID for it that allows us to know we're talking about the same "historical place". "Istanbul" also has a set of its own name variants and probably deserves its own "historical place ID". Yet a place authority also needs to know that these two "historical places" are the same "real place" (or "abstract place").

This case could be handled with 3 IDs: two for the historical places, and one for the abstract place; with proper relationships showing that the first two are historical variants of the third one.

Or it could be handled with 2 IDs, where the relationship shows that the first one became the second one. Either way, when building a search system, searching for one should find records with the other in it.

And, by the way, a new "historical place ID" might be warranted not only when the name of the place changes, but when its "ancestor chain" changes. For example, "Provo, Utah Territory, USA" became "Provo, Utah (county), Utah (state), USA". The name of the place didn't change, but the ancestor chain did, and we sometimes want to use one instead of the other, and may want to translate it, etc. So it would be helpful at times for these two "historical places" to have their own IDs; but then have one you can use for search/match/comparison to know that you're talking about the same "real"/"abstract" place (Provo). Again, this could be the "latest historical place ID" for this place, or there could be a separate ID for the abstract place that doesn't imply any particular time period (e.g., when we aren't sure which time period we're talking about).

@DallanQ

@ranbo I agree that for a global "place authority", we need a single id with multiple names.

I think a question that could be answered here is: is GedcomX going to

(1) require that a Place object contain a global id from a place authority,
(2) allow a Place object to optionally include a global id from a place authority, or
(3) not allow a Place object to include a global id from a place authority?

Of the three options, (2) is probably my personal preference, but I'd be ok with (3) as well.

Having said that, I think a Place object also needs to have a local id that's specific to the particular file, like the current GEDCOM person and family ids. The local id is what I was thinking about when I wrote my comment. Even though a gedcom has a Place object, if we want to say within the GedcomX file that two place-texts refer to the same place (and I'm not sure how important that is especially if the GedcomX Place object can refer to a global authority as in option (2)), then place references need to include both a name and a (local) id referring to the Place object.

At the risk of making this comment too long, I think we could either have GedcomX place references be of the form:

(a) { name: Constantiople, id: local_place_id} and {name: Istanbul, id: same_local_place_id}, so both place references pointed to the same GedcomX Place object, and that object did not include a reference to an id from a global place authority, or

(b) {id: local_place_id1} and {id: local_place_id2}, where local_place_id1 was {name: Constantople, global_place: {id: 123, provider: Geonames}} and local_place_id2 was {name: Istanbul, global_place: {id:123, provider: Geonames}} so by having both GedcomX Place objects reference the same global place, that's how you'd determine that both referred to the same place.

@EssyGreen

allow a Place object to optionally include a global id from a place authority

+1

I think a Place object also needs to have a local id that's specific to the particular file, like the current GEDCOM person and family ids

+1

place references need to include both a name and a (local) id referring to the Place object.

We don't need the name unless you are trying to capture the "original" as well as the "concluded/processed" (place object) and if this is the case then I would argue that it would be better to allow the place object to have source references (evidence) and proof statements in the same way that a Person can.

@protozoan

GOV - Das genealogische Ortsverzeichnis (GOV - The Genealogical Gazetteer) for those (including me) who did not get the reference.

@joeflint

There are some places that can be meaningfuly be described as a point (Lat/Lon)., for example grave markers, houses. Most of what we call places can only be meaningfully descreibed as polygons, a specific area of the planet. for example the state of Utah, the Kindom of England. The problem that genealogist and historical researchers have is that in many case the exact shape of the polygons are often vauge and can certainly change over time.

Besides the open source and genealogical communities that have already been menetioned I know of at least one academic community (Historical GIS) working on this issue. I would suggest as this project decides what to do in the short term that we keep in mind that there will be advancements in this area that we will want to incorpoate into the standard latter.

@protozoan

Is there a place for the concept of a line? I can think of migrations by road or river. For instance, some of my family headed for Illinois, but stopped when they got to Indiana. Also, is it contemplated that point, line, or polygon data be stored within the GedcomX itself?

@thomast73

We’ve been considering the feedback we have received here and elsewhere. We've tried to incorporate this feedback. I'd like to propose the following model as a replacement for the representation of places in the GEDCOM X model:

UML diagram of the parts of the GEDCOM X model necessary to propose a new representation for places.

I would define the members of PlaceReference something like this:

  • original – if provided, the original (or what-is-known-but-not-yet-identified) value recorded with the fact/event
  • normalized – if provided, a list of selected normalized versions of the original value
    • Normalized values should include what is known of the jurisdictional hierarchy.
    • Normalized values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • standardPlace – if provided, a reference to an instance of Place

In this model, Place would be used to represent a “standard” place. Place is a conclusion so that research and conclusions can be included via its inherited members (from Conclusion) or by providing PlaceDescriptions.
I would define the members in Place as follows:

  • normalized – if provided, a list of normalized/standardized place names for this place
    • It is RECOMMENDED that instances should have at least one name, or they should have at least one PlaceDescription and the description has at least one name.
    • Normalized values should include what is known of the jurisdictional hierarchy.
    • Normalized values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • latitude – latitude
  • longitude – longitude
  • identifiers – if provided, known identifiers for this place (e.g., place authority identifiers)
  • placeDescriptions – a list of descriptions (snapshots) of this place including possibly its name and other details.
    • It is RECOMMENDED that instances should have at least one name, or they should have at least one PlaceDescription and the description has at least one name.
    • PlaceDescription values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • attribution –if provided, the contributor of the information about this place

A PlaceDescription is used to describe the details of a place – possibly its name, type, time period, a geospatial description, and/or its jurisdictional parents. I think of PlaceDescription as a snapshot in time (though the temporal description is not a required aspect of the snapshot). I would describe the PlaceDescription members as follows:

  • names – a list of names for this place applicable to this snapshot
    • Values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • type – if provided, a type identifier (e.g., address, city, etc.)
  • temporalDescription – if provided, a description of the time period to which this snapshot is relevant
  • spatialDescription – a reference to a geospatial description of this place
    • We are RECOMMENDING that these descriptions resolve to a KML documnt.
  • parent – if provided, a list of jurisdictional parent place names for this place
    • Values should include what is known of the jurisdictional hierarchy.
    • Values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
@daveyse

@jralls, in terms of multiple language support in @thomast73's proposal above, a TextValue has a lang attribute, so multiple, normalized names in different languages and scripts can be represented. Likewise, if temporalDescription uses Date, it can be represented in an I18n-compliant manner.

@thomast73, this is well thought out and provides both the robustness of the Conclusion objects and the potential simplicity for the "lightweight" utilization. Kudos!

@justincy

@thomast73 Why do both PlaceReference and Place have a normalized list attribute? What is the difference between them?

@thomast73

Place could describe a given place over quite a period of time and included many normalized versions of a place's name (e.g., a Place instance about Orem might include both of the following normalized names: "Orem, Utah, Utah Territory, United States" and "Orem, Utah, Utah, United States"). For a place's association with a particular event/fact (a PlaceReference), probably not all of the normalized place names held in the standard Place instance are applicable (e.g., for my 1932 death record, only the "Orem, Utah, Utah, United States" name applies). Thus, we include the option in PlaceReference to include specific normalizations so that someone can indicate which normalization(s) was applicable to the given reference.

@mikkelee

Shouldn't a place reference refer to a specific place description (ie your death certificate pointing to the "version"/description of Orem as it existed in 1932)? Orem of 1932 and Orem of today would then both link to the "abstract idea of Orem".

You can then obtain all entities linking to the place by examining what links to its place descriptions, and entities linking to a place avoid name ambiguities by only referring to that specific version of a place.

@jralls

This seems in part to have the same problem we beat to death and beyond in #144. Reference objects should be as lightweight as possible. What's more, it's a fundamental precept of data design that duplication is bad, because not only does it waste memory but also requires significant effort to keep duplicated data synchronized.

So I suggest that PlaceReference should contain only original, PlaceDescription, and Attribution properties. Place should be a container, having only a list of PlaceDescription and an Id of some sort. PlaceDescription should provide:

TextValue name [1..*] tagged with lang
PlaceDescription* parent [1] see note below
Float latitude [0..1]
Float longitude [0..1]
Polygon spatialDescription [0..1]
Timestamp date range [0..1]
Attribution attribution [1]
URI externalReference [0..*]

IMO either name contains a hierarchy (as in Thad's "Orem, Utah, Utah, United States" example) and the importing application extracts the hierarchy if it wants to by scanning or we have a hierarchy of place references and the name contains only the local name (e.g. "Orem"). Either way, a change to the parent regardless of how it's encoded requires a new PlaceDescription.

ISTM one could combine latitude/longitude with spatialDescription: A polygon is usually encoded as a set of lat/long pairs which mark the vertices: Having a single pair would serve for the lat/long element, and it seems unlikely that one would want to have both a lat/long and a polygon. Earlier in this discussion we talked about "authoritative" place databases, so I've added a URL element to allow PlaceReference to point at one or more of them. I think that in most cases it's more likely that a researcher will use such a resource than to find historical boundary surveys and encode them.

@mikkelee

+1 on all above points by @jralls

ISTM one could combine latitude/longitude with spatialDescription (...)

Agreed, latitute and longitude would be redundant if there's a spatial polygon. If needed, the software can calculate the centroid of the polygon to obtain a lat/long pair (or use the single pair if only one exists in the polygon).

Another thing to consider is noncontinuous regions (for instance, some parishes have been "cut in half" by others); the place description polygon should have a way of supporting that.

@jralls

Another thing to consider is noncontinuous regions (for instance, some parishes have been "cut in half" by others); the place description polygon should have a way of supporting that.

Good point. There are bunches of English parishes and dioceses like that.

@thomast73

Another thing to consider is noncontinuous regions (for instance, some parishes have been "cut in half" by others); the place description polygon should have a way of supporting that.

Good point. There are bunches of English parishes and dioceses like that.

I was able to create such a region in a KML document.

@thomast73

ISTM one could combine latitude/longitude with spatialDescription (...)

Agreed, latitute and longitude would be redundant if there's a spatial polygon. If needed, the software can calculate the centroid of the polygon to obtain a lat/long pair (or use the single pair if only one exists in the polygon).

It is true that I can store a coordinate (lat,lon) in a KML document -- and for that matter, a lot of other information if I so desired. But are we going to require a KML document for every point?

Most of the current "authoritative" place databases (the one's I have used as a researcher) store a single coordinate for a place. If they store historical information about the place, they still only store one coordinate. If the place would have been more accurately represented as a boundary, they still only store the one coordinate. As this seems to be the typical implementation, we wanted to represent this coordinate independent of the more complex KML document, and we did not want to require the coordinate to be repeated in the data. Thus it is called out and made part of Place. We do want the option for contributors (and for "authoritative" place database implementers) to do something a bit more complex. We think the option to create PlaceDescription instances that reference KML documents allows for this. However, our thinking was that we did not want to force implementers into storing their single coordinate -- the most common case in the current industry landscape -- in a KML document.

Additional thoughts?

@mikkelee

It seems to me that the choice would be between either a KML document containing the points/polygons needed to depict the place description (where software can then extract/calculate the needed info from the KML); or an abstraction where the same data is represented directly on the place description (ie a list of polygons/points that can be serialized to KML in software if needed).

The former is probably the simplest - it can be extracted directly from the file without modification.

Most of the current "authoritative" place databases (the one's I have used as a researcher) store a single coordinate for a place (...)

In my opinion, the limitations of current software/data sets should not impose limitations or extra fields on the GedcomX format. If a certain place authority can only supply lat/long points, that's a problem on their end, and it can be accommodated in both options outlined above.

As an aside, the newly launched http://digdag.dk (Historical atlas of geographical & administrative subdivisions of Denmark) has polygons for each subdivision (parish, commune, county, police district, etc) at any date from ca1600 to present day; these are accessible via browser or (free) API.

@jralls

My thought on authoritative place databases is that one should use a URI, though I suppose not all of them will make one available. I'm not in the least surprised that some place databases have just a lat/long, probably for the main post office or some government building. It's a lot of work to get survey data and create polygons. There do exist http://publications.newberry.org/ahcbp/downloads/index.html http://www.census.gov/geo/www/tiger/ but they aren't always easy to find.

I wonder if we really want to go down the road of including GIS data, KML or otherwise, directly in GedcomX data. It might make more sense to say that it's a media type and use that URI field I proposed to point at it. Real-world polygons can be pretty big, and there's no point in requiring the GedcomX parser to have to deal with it. What's more, analyzing the polygons for Foo, Indiana in 1860 and Bar, Indiana in 1870 to obtain the intersection polygon which tells us where Grampa's farm was (having been enumerated in Foo in 1860 and Bar in 1870) is pretty much beyond the scope of Genealogy programs. Heck, I don't know of any that directly support even platting: One generally has to get a separate program for that.

OTOH, quite a few existing programs do include lat/long in their place descriptions, so I guess we should include that in GedcomX as well.

@thomast73

My thought on authoritative place databases is that one should use a URI...

This is the purpose of the identifiers field. We are hoping implementers can identify an authority using a type URI and identify the "place" within the authority using the value.

I wonder if we really want to go down the road of including GIS data, KML or otherwise, directly in GedcomX data. It might make more sense to say that it's a media type and use that URI field I proposed to point at it.

I think this is our intention in specifying spatialDesc as a ResourceReference: that people think of it in the manner you have articulated here.

@thomast73

@mikkelee ...the limitations of current software/data sets should not impose limitations or extra fields on the GedcomX format.

I do not disagree.

But I do not wish to make it clumsy and difficult for the existing, ubiquitous use cases either. I would like to see new place authorities like the one @mikkelee has cited, but they are currently rare. I am hopeful that the specification provides functionality sufficient for these newer players to develop and exchange more complex data. But I also want it to be easy to exchange the simpler existing data.

I see the current proposal as supporting three levels of complexity.

The simple case is the case where no place authority is referenced. Without at least a local place authority, there is no need to include instances of Place and/or PlaceDescription. Everything could be exchanged via original and normalized in PlaceReference.

The next level of complexity assumes places are represented only by their name and perhaps a point location (lat,lon). Implementations at this level use Place, but do not need PlaceDescription. I can build a very typical place authority using just the concepts made available via Place and explicitly ignoring PlaceDescription

The third level of complexity introduces the complexity of PlaceDescription. It is our hope that this could support the more sophisticated place authority’s needs. We think it can also satisfy a sophisticated researcher’s needs.

It seems to me that Place and PlaceDescription classes are for implementations where, conceptually, places are being managed by a place authority – a place database. In the case a system is going to allow users to contribute conclusional place entities, I am thinking that the resulting conclusional Place entities will become entries in a "local" place authority.

I do not have an issue saying that either Place.normalized is populated, or Place.placeDescriptions is populated, but never both. That would force implementers to be in one camp or the other.

Even with all of this, I think there is a strong case for leaving latitude and longitude as raw data members of Place.

@mikkelee Shouldn't a place reference refer to a specific place description....

@jralls Reference objects should be as lightweight as possible. What's more, it's a fundamental precept of data design that duplication is bad....

This feedback would lead to a design something like the following:

UML diagram of the parts of the GEDCOM X model necessary to propose a new representation for places.

The three tiers of complexity mostly go away with this design. Either you store a single name value (in original), or you immediately bite off the complexity of PlaceDesription. You could choose not to collect various descriptions (snapshots) about a place into historical Place groupings. But even a simple normalization scheme would end up being represented via PlaceDescription instances.

@jralls

This feedback would lead to a design something like the following:

Almost:

  • Parent should be a Place Description*, not a TextValue.
  • Place.names seems redundant with PlaceDescription.names.
  • PlaceDescription needs an attribution.

"about" is new. What's it for?

The three tiers of complexity mostly go away with this design. Either you store a single name value (in original), or you immediately bite off the complexity of PlaceDesription. You could choose not to collect various descriptions (snapshots) about a place into historical Place groupings. But even a simple normalization scheme would end up being represented via PlaceDescription instances.

Um, PlaceReferences don't support the "minimal complexity level" because they don't have a normalized value. If one source abbreviates Pennsylvania as Penn. and another spells it out, there's no way to connect them unless the importing application maintains a separate table of equivalent places. I think that's OK, because most applications do in fact have place objects and can easily enough create a hierarchy of PlaceDescriptions on export and figure out what to do with it on import.

I'm a bit divided about PlaceReference being a separate object incorporated by reference into Fact and Event. On the one hand I think that it separates the PlaceReference from the conclusion a bit too much; on the other hand, it might be that several conclusions derive from the same set of documents which tie them to a Place (several deeds from the same Deed Book, perhaps) and in such a case it would make sense to use one PlaceReference for all of them. Maybe that could be resolved by having PlaceReference extend Conclusion as well as the others.

@thomast73

Parent should be a Place Description*, not a TextValue

I have received push-back on this concept. So I will ask you the question that caused me to use TextValue and move away from the hierarchical structure you are desiring: What is the use case for including the hierarchy as a hierarchy? It's not likely a place authority would directly import and update their database with user contributed hierarchies. So what would an application need it for?

Place.names seems redundant with PlaceDescription.names.

If I were to list/plot Place instances, I would think it would be desirable to show a name. And we naturally want to call it by a name. For example, Salt Lake City was also called Great Salt Lake City, New Jerusalem and City of the Saints. I might have PlaceDescription instances for each of these names, but would probably still call my Place Salt Lake City and plot it as such, etc.

PlaceDescription needs an attribution.

What is your thinking here?

"about" is new. What's it for?

It is PlaceDescription's reference to its historical Place grouping -- the thing that associates a PlaceDescription with a Place.

@thomast73

Um, PlaceReferences don't support the "minimal complexity level" ...

True enough...at least not in the same way as the other design does. So that is one of the factors that needs to be considered in contrasting the two proposals. The second design does not offer the same points of flexibility. It seems to me that to be successful with the second design, I have to buy into the whole of it. To be successful using the first, I have several points of flexibility.

It's not clear what you prefer?

@thomast73

I'm a bit divided about PlaceReference being a separate object incorporated by reference into Fact and Event....

I'm not following you here.

Fact and Event would have members of type PlaceReference; they would no longer have members of type Place. Part of your comment here made me think you misunderstood this aspect of things, but I suspect I am wrong about that as well.

So perhaps you could try to re-articulate the ideas you were trying to express there?

@jralls

Parent should be a Place Description*, not a TextValue

I have received push-back on this concept. So I will ask you the question that caused me to use TextValue and move away from the hierarchical structure you are desiring: What is the use case for including the hierarchy as a hierarchy? It's not likely a place authority would directly import and update their database with user contributed hierarchies. So what would an application need it for?

Place names are hierarchical (e.g., town, county, state) in pretty much any culture advanced enough to have records. You record the hierarchy every time you address an envelope (assuming that you still do address envelopes ;-) ). There are two ways to model that hierarchy in GedcomX: It can be serialized in the name of the PlaceDescription, just as an address on an envelope does, or it can be broken down into parts with pointers. Given that PlaceDescription allows multiple names, the latter is a lot more efficient because for the former you need to permute all of the alternate names all the way up the hierarchy, which gets a bit unwieldy.

Why does a genealogy program need to keep track of the place hierarchy? Because records are created at different levels of the hierarchy, and those at the county or state level don't always record lower-level details. Different records may also use different hierarchies (e.g., parish and diocese vs. town and county). Genealogists often base proof arguments on a person being the only one of that name in a particular jurisdiction based on a comprehensive enumeration of the jurisdiction -- a census or a tax roll. Understanding the place hierarchy of the jurisdiction is critical to making that argument.

But in having a Parent property you're acknowledging the need for recording the place hierarchy. Why on earth would you do that with free text instead of a pointer to the parent object?

@jralls

I'm a bit divided about PlaceReference being a separate object incorporated by reference into Fact and Event....

I'm not following you here.

Fact and Event would have members of type PlaceReference; they would no longer have members of type Place. Part of your comment here made me think you misunderstood this aspect of things, but I suspect I am wrong about that as well.

So perhaps you could try to re-articulate the ideas you were trying to express there?

You drew PlaceReference as a class and used an association to connect it to Fact and Event. UML says that means that Fact and Event have PlaceReference* members, and that a particular PlaceReference can have 1 or more Facts or Events pointing at it. An XML example might be:

<PlaceReference id="prFoo">
  <...>
</PlaceReference>

<Event id="eventBar">
  <...>
  <place reference="prFoo"/>
</Event>

<Event id="eventBaz">
  <...>
  <place reference="prFoo"/>
</Event>

While having members of type PlaceReference would be expressed as:

<Event id="eventBar">
  <...>
  <place>
    <original>Somewhere, Anywhere</original>
    <standardPlace placedesc="placeDescriptionFoo"/>
  </place>
</Event>

<Event id="eventBaz">
  <...>
  <place>
    <original>Somewhere, Anywhere</original>
    <standardPlace placedesc="placeDescriptionFoo"/>
  </place>
</Event>

There are advantages and disadvantages to each approach, and I'm not sure which is better.

@jralls

Um, PlaceReferences don't support the "minimal complexity level" ...

True enough...at least not in the same way as the other design does. So that is one of the factors that needs to be considered in contrasting the two proposals. The second design does not offer the same points of flexibility. It seems to me that to be successful with the second design, I have to buy into the whole of it. To be successful using the first, I have several points of flexibility.

It's not clear what you prefer?

I prefer a single data model. To successfully import the first design, you have to be able to parse every permutation of the flexibility. That's a lot of extra code for AFAICT zero gain.

@jralls

PlaceDescription needs an attribution.

What is your thinking here?

That PlaceDescription is an important conclusion, and the association between PlaceDescription and Place an even more important one. A single Place is likely to have many PlaceDescriptions, so it makes more sense to put the attribution on the PlaceDescription than the Place.

@jralls

"about" is new. What's it for?

It is PlaceDescription's reference to its historical Place grouping -- the thing that associates a PlaceDescription with a Place.

Then call it place and make it a Place*.

@thomast73

But in having a Parent property you're acknowledging the need for recording the place hierarchy. Why on earth would you do that with free text instead of a pointer to the parent object?

Yes! I do acknowledge the need for recording the place hierarchy. So the question is really about the need to record a hierarchy in a hierarchy.

In all of the applications I can think of, recording a place involves either typing free text, or making a selection from a list provided by that application's place authority. The data is not provided/selected one node in the hierarchy at a time; rather, they type or select a string that express the entirety of the desired hierarchy. If the hierarchy was typed, to convert it to a hierarchy requires an expert system to parse the string and build the hierarchy -- with different systems producing different answers. If the hierarchy was selected, the implementer already has a proprietary representation -- maybe a hierarchy, maybe not. Any hierarchical structure in the back end is generally not expressed and/or understood at the front end -- the user typed a string and sees a string. If it is, it is meaningful only in the context of that application (e.g., its search form) and only relative to its own place authority.

When data is exchanged, what does it mean to exchange a hierarchy? On the receiving end, I am interested in classifying incoming places within my own applications hierarchy. The incoming identifiers would be key to to this mapping. But would I really want to update my place authority based on data I received if it did not map successfully to a know place in my place authority? I have received input from two potential implementers saying "No way!" and suspected that most or all would agree. At most, they would put the places that did not map into their authority into a "local" authority.

Then there is the fact that hierarchies cannot be navigated in a deterministic way. If I point my event at a description of Orem, what hierarchy do I mean? As I walk the hierarchy to Utah county I am confronted with a choice of parents in Utah county: either Utah Territory or Utah state; how do I know which parent is applicable to my event? And what about overlapping hierarchies (e.g., overlapping civil and ecclesiastical hierarchies)? If I constrain the hierarchical structure a bit more (change parents[0..*] to parent[0..1] so there is no possibility of multiple parents), I can make hierarchies that can be navigated deterministically and therefore express original intent; but all nodes that have overlapping hierarchies would have to be duplicated. Is that desirable?

I am hoping that the model (using TextValue) focuses on recording places with their hierarchies without requiring applications to express hierarchies (and their complexities) in a hierarchy.

@thomast73

Referring to @jralls post

You drew PlaceReference as a class and used an association to connect it to Fact and Event. UML says that means that Fact and Event have PlaceReference* members, and that a particular PlaceReference can have 1 or more Facts or Events pointing at it.
...
While having members of type PlaceReference would be expressed as
...

Your example made the ideas clear.

It has been my understanding that a dashed, directed line in a UML diagram is meant to represent a "depends on" relationship. I did not think it was meant to represent an association.

By declaring place in Fact and Event to be of type PlaceReference, I am trying to represent the second type of relationship.

By declaring standardPlace in PlaceReference to be of type ResourceReference (and not of type PlaceDescription), I am trying to represent the first type of relationship. The same applies to PlaceDescription.about.

You seem to distinguish the two cases by appending an '*' to the end of the name. I am not familiar with this notation. Where does this come from?

Hopefully, you now understand what I was trying to express a bit better, though apparently our vocabularies are a bit out of synch. Sorry for the confusion.

My reasoning for wanting Event and Fact to have members of type PlaceReference is that original seems very closely related to the record that produces that fact or event, and that if I were to record an original, it would be because I was creating a fact or event that was "extracted" from that record. It is also seemed less likely that the same original value would need to be repeated over and over in the data -- that it would be recorded only rarely and only in the record extraction case.

@jralls

Yes! I do acknowledge the need for recording the place hierarchy. So the question is really about the need to record a hierararchy

Yup, that's the question.

In all of the applications I can think of, recording a place involves either typing free text, or making a selection from a list provided by that application's place authority.

Reunion and RootsMagic fit the former, and Family Tree Maker the latter. The Master Genealogist and Gramps use a form with fields for each place-part.

If the hierarchy was typed, to convert it to a hierarchy requires an expert system to parse the string and build the hierarchy

Yes.

with different systems producing different answers

That would be strange. Few people even think beyond the postal hierarchy of (in the US) of City, County, State. (More experienced genealogists will also recognize Enumeration District, Supervisory District, Township, County, State of the Census, and a bunch of variations on Parish, Diocese for churches.)

I am hoping that the model (using TextValue) focuses on recording places with their hierarchies without requiring applications to express hierarchies (and their complexities) in a hierarchy.

If you want to go that way, that's fine. The Parent field is then redundant, and the entire hierarchy should be represented in the name TextValue.

@jralls

You seem to distinguish the two cases by appending an '*' to the end of the name. I am not familiar with this notation. Where does this come from?

C originally, though it's used in quite a few languages. Foo* means "pointer to object of type Foo". C++ has a specialized version, Foo& meaning " reference to object of type Foo".

@jralls

My reasoning for wanting Event and Fact to have members of type PlaceReference is that original seems very closely related to the record that produces that fact or event, and that if I were to record an original, it would be because I was creating a fact or event that was "extracted" from that record. It is also seemed less likely that the same original value would need to be repeated over and over in the data -- that it would be recorded only rarely and only in the record extraction case.

Yeah, that's what I'm of two minds about. The "original" string needs to be tied closely to a source, which in turn argues for it to be in a PlaceDescription that's a single-source conclusion and nowhere else. Attaching it to a Fact or Event which may or may not be single-source increases the distance, especially if it's a separate object. But I don't think we want to go down the road of aggregating PlaceDescriptions into a hierarchy á la #134, so I can see having a lower-order class to represent it.

I'm not so sure about the "recorded only rarely" part. I suppose if you're working through a Census page and recording all of the inferred events and explicit facts then you have a single source referring to a single place, and you'd want to point to the PlaceReference from each of those events and facts. After all, "County of Utah, City of Orem" is written only once, at the top of the page. On the other hand, if you take a family-full of birth certificates, each of them might spell "Orem, Utah Co., Utah" the same way, but is it correct in that case to use the same PlaceReference for all 14 birth certificate sources? What if one of them has a typo so it's "Oerm, Utah Co., Utah"? It gets messy.

The other face is that a lot of facts and events are documented by multiple sources, and those different sources may record the place name differently. In that case the careful researcher will write an AnalysisDocument to go between the sources and the event, but then there won't be an "original" value for the PlaceReference. In that case (which I think in the application data interchange usage will be much more common) the PlaceReference is an extra layer of indirection.

@jralls

It has been my understanding that a dashed, directed line in a UML diagram is meant to represent a "depends on" relationship. I did not think it was meant to represent an association.

Technically correct, but in practice a non-association dependency implies that the dependent class knows too much about the other class's implementation. That's not a feature.

@jralls

When data is exchanged, what does it mean to exchange a hierarchy? On the receiving end, I am interested in classifying incoming places within my own applications hierarchy. The incoming identifiers would be key to to this mapping.

Yes, that's correct. GedcomX would specifiy per-locale and per-hierarchy identifiers, or perhaps punt to the "scheme" thing from... well, I can't find it, as usual. Somewhere we decided to let implementers define parts of something-or-other and publish their scheme via RDF. That probably went away with the rest of RDF. Anyway, that approach is going to be the basis for all importer-exporters: Converting native data structures to GedcomX data structures and back again.

But would I really want to update my place authority based on data I received if it did not map successfully to a know place in my place authority? I have received input from two potential implementers saying "No way!" and suspected that most or all would agree. At most, they would put the places that did not map into their authority into a "local" authority.

That's independent of the representation of the hierarchy: They're rightly concerned about polluting a public resource with input from an untrusted source. Family Tree Maker has an out-of-band procedure for submitting changes to the "authority", which is an appropriate way of handling it.

@mikkelee

I prefer a single data model. To successfully import the first design, you have to be able to parse every permutation of the flexibility. That's a lot of extra code for AFAICT zero gain.

+1 A major annoyance of Gedcom 5.5 is that things can be represented multiple ways (embedded NOTE vs NOTE reference, etc).

The second design above is much cleaner in my eyes.

@thomast73

@jralls: I prefer a single data model. To successfully import the first design, you have to be able to parse every permutation of the flexibility. That's a lot of extra code for AFAICT zero gain.

@mikkelee: +1 A major annoyance of Gedcom 5.5 is that things can be represented multiple ways (embedded NOTE vs NOTE reference, etc).

The second design above is much cleaner in my eyes.

I understand the arguments proffered in favor of a design that can't be interpreted in lots of different ways. I also agree that the second design has less room to wiggle; it seems to be more of an all-or-nothing design. I'm being swayed...but I do worry implementers will feel a bit strong-armed or constrained by the design. I will test the idea with a few other resources on this end.

@thomast73: I am hoping that the model (using TextValue) focuses on recording places with their hierarchies without requiring applications to express hierarchies (and their complexities) in a hierarchy.

@jralls: If you want to go that way, that's fine. The Parent field is then redundant, and the entire hierarchy should be represented in the name TextValue.

I hear you. I will test a design that omits parents.

@jralls: PlaceDescription needs an attribution.

@thomast73: What is your thinking here?

That PlaceDescription is an important conclusion, and the association between PlaceDescription and Place an even more important one. A single Place is likely to have many PlaceDescriptions, so it makes more sense to put the attribution on the PlaceDescription than the Place.

I will add an attribution to PlaceDescription; an attribution will continue to be part of Place.

@thomast73

You seem to distinguish the two cases by appending an '*' to the end of the name. I am not familiar with this notation. Where does this come from?

C originally, though it's used in quite a few languages. Foo* means "pointer to object of type Foo". C++ has a specialized version, Foo& meaning " reference to object of type Foo".

Well. It seems that I am familiar with that notation. I just haven't used it or seen it in so long that its no longer part of the syntactical and semantical interpreter in my head. :-)

@jralls

I'm being swayed...but I do worry implementers will feel a bit strong-armed or constrained by the design

Strong-arming is what specifications are for. ;-)

Design is all about balancing constraints. There are lots of bad designs

@jralls

Well. It seems that I am familiar with that notation. I just haven't used it or seen it in so long that its no longer part of the syntactical and semantical interpreter in my head.

Time to crawl out of your Java cocoon and do some real coding! ;-)

@thomast73

@jralls: More experienced genealogists will also recognize Enumeration District, Supervisory District, Township, County, State of the Census, and a bunch of variations on Parish, Diocese for churches.

I want to see address become a more accepted and tracked "place" as something regularly tracked at a finer-grain than city. I am hoping implementers can do any of the above with the defined structure.

@jralls

I want to see address become a more accepted and tracked "place" as something regularly tracked at a finer-grain than city. I am hoping implementers can do any of the above with the defined structure.

I don't see any reason why they can't. It's just a comma-separated string. It can hold any sort of address... the hard part will be parsing it back out when it comes from unprompted human input rather than from pick-lists based on a place authority.

@thomast73

Here is the latest iteration of the proposal for representing place data in GEDCOM X.

UML diagram of the parts of the GEDCOM X model necessary to propose a new representation for places.

The members of PlaceReference:

  • original – if provided, the original (or the known-but-not-identified) value
  • standardPlace – if provided, a reference to the PlaceDescription that best represents this place (the "standard" place) in the context of the given reference
    • if provided, MUST resolve to an instance of PlaceDescription

The members of PlaceDescription:

  • about – if provided, a reference to the place that this description is about
    • if provided, MUST resolve to an instance of Place
  • names – a list of normalized/standardized, fully-qualified names for this place applicable to this snapshot
    • The list MUST include at least one value
    • It is RECOMMENDED that instances include a single name and any equivalents from other cultural contexts; name variants ought to be described in separate PlaceDescription instance
    • Normalized values are expected to include what is known of its applicable jurisdictional hierarchy.
    • Normalized values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • type – if provided, a type identifier (e.g., address, city, etc.)
  • temporalDescription – if provided, a description of the time period to which this snapshot is relevant
  • spatialDescription – if provided, a reference to a geospatial description of this place
    • It is RECOMMENDED that these descriptions resolve to a KML document.
  • attribution –if provided, the contributor of this place description

The members of Place:

  • names – if provided, a list of normalized/standardized place names associated with this place
    • It is RECOMMENDED that instances provide at least one name.
    • Normalized values are assumed to be given in order of preference, with the most preferred value in the first position in the list.
  • latitude – latitude
  • longitude – longitude
  • identifiers – if provided, known identifiers for this place (e.g., place authority identifiers)
  • attribution –if provided, the contributor of the information about this place
@mikkelee

I like this version so far.

One thing that comes to mind is identifiers on PlaceDescriptions. DigDag that I mentioned earlier has unique idents for each variation on a place. I would like to be able to link a specific PlaceDescription to a place authority in addition to linking the abstract Place to an authority.

@thomast73

We have converted this issue into a pull request, and we have started adding code to implement these changes. I will be working on this over the next several days. Thanks to all of you for your input on this issue.

@mikkelee I would like to be able to link a specific PlaceDescription to a place authority in addition to linking the abstract Place to an authority.

Okay. We can be swayed on this. :-) I've added it.

@thomast73

I have added an initial cut at updating the conceptual model specification document to reflect the proposed changes.

@jralls jralls commented on the diff
specifications/conceptual-model-specification.md
((57 lines not shown))
-`http://gedcomx.org/v1/Place`
+`http://gedcomx.org/v1/PlaceDescription`
+
+### extension
+
+This data type extends the following data type:
+
+`http://gedcomx.org/v1/Conclusion`
+
+### properties
+
+name | description | data type | constraints
+------|-------------|-----------|------------
+about | A uniform resource identifier (URI) for the place being described. | [URI](#uri) | OPTIONAL.
@jralls
jralls added a note

about needs a "must resolve to an instance of type Place" restriction.

Fixed. Thanks for the point-out! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
specifications/conceptual-model-specification.md
((58 lines not shown))
-`http://gedcomx.org/v1/Place`
+`http://gedcomx.org/v1/PlaceDescription`
+
+### extension
+
+This data type extends the following data type:
+
+`http://gedcomx.org/v1/Conclusion`
+
+### properties
+
+name | description | data type | constraints
+------|-------------|-----------|------------
+about | A uniform resource identifier (URI) for the place being described. | [URI](#uri) | OPTIONAL.
+names | A list of standardized (or normalized), fully-qualified (in terms of what is known of the applicable jurisdictional hierarchy) names for this place that are applicable to this description of this place. | List of [http://gedcomx.org/v1/TextValue](#text-value). Order is preserved. | REQUIRED. The list MUST contain at least one name.
+type | A uniform resource identifier (URI) identifying the type of the place as it is applicable to this description. | [URI](#uri) | OPTIONAL.
@mikkelee
mikkelee added a note

Where are valid type values enumerated?

We do not plan to identify this vocabulary as part of this work. I would invite you to open an issue to discuss this once we have closed this issue. Thank you for your attention and input on this issue. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mikkelee

Okay. We can be swayed on this. :-) I've added it.

Thanks :-)

@thomast73

I updated the serialization specifications (for JSON and XML).

I think all of the changes required to implement the proposed model modifications are now part of this pull request. Anything else I am missing? I hope to merge this pull request on Monday.

@stoicflame
Owner

+1

Nice job, @thomast73!

@jralls

Looks pretty good. The longitude description is a bit tortured, maybe you could make it look like the one for latitude. They both appear to specify integer degrees. I think for simplicity of serializing we could specify decimal degrees (i.e., a double). You could easily imply that by changing, e.g., -90 to -90.0.

@thomast73 thomast73 merged commit 0fc147b into master
@thomast73

We have received some additional feedback that has caused us to modify our design slightly. The result: we will move the members that were part of Place into PlaceDescription -- denormalizing the model in hopes we are making the model simpler and easier to use. The resulting model looks something like this:

UML diagram of the parts of the GEDCOM X model necessary to propose a new representation for places.

I will not open another pull request to manage this change, but will implement it directly on the master branch. Thanks again!!!

@stoicflame
Owner

A summary of the proposed change is to eliminate the Place data type and denormalize its data on the PlaceDescription.

While I don't have any objection to working off master, that doesn't mean we're closing off discussion. Any comments or concerns with the simplification proposal are welcome.

@jralls

Technical comment: PlaceDescription.about no longer has anything to point to.

The main problem I see with this is that it loses the ability to record the conclusion that e.g., Orem, Utah, Utah and Orem, Utah, Utah Territory are the same place unless the id field is allowed to be not unique so that both PlaceDescriptions carry the same id. OK, I suppose one could also link on lat/long, but for that to work one must include it in every place description, and it must either be exactly the same or there must be some sort of heuristic for "close enough to be considered a match". That could get ugly.

Can you summarize the reasoning that convinced you that this is really a significant and worthwhile change?

@thomast73

Technical comment: PlaceDescription.about no longer has anything to point to.

The main problem I see with this is that it loses the ability to record the conclusion that e.g., Orem, Utah, Utah and Orem, Utah, Utah Territory are the same place unless the id field is allowed to be not unique so that both PlaceDescriptions carry the same id. OK, I suppose one could also link on lat/long, but for that to work one must include it in every place description, and it must either be exactly the same or there must be some sort of heuristic for "close enough to be considered a match". That could get ugly.

True -- it is no longer a reference. Instead it will be an identifier; perhaps one could consider it to be like a place "tag". PlaceDescriptions that shared the same about identifier could be aggregated as being the same place. Thus, the description for Orem, Utah, Utah can have a unique id, the description for Orem, Utah, Utah Terretory can have a unique id, but both descriptions would have the same about value.

The de-normalization of the Place latitude and longitude values into PlaceDescription does result in the possibility that same latitude and longitude could appear in multiple PlaceDescription instances, but it is not our intent that latitude and longitude become an identifier.

Can you summarize the reasoning that convinced you that this is really a significant and worthwhile change?

We received feedback that indicated that the model was more complex that perhaps it needed to be. This change eliminates a class with a cost that latitude and longitude may need to be duplicated. We also received feedback that this change would impose fewer requirements on implementers. It should not be hard to generate the data from a more "normal" forms, and the normalization should be easily re-established on the receiving end.

@mikkelee

Technical asides about the changes while I digest the implications:

  • There should be restrictions and/or a standard method of resolution when there are multiple PlaceDescriptions with the same about identifier and which have different/missing lat/long-pairs. Is this invalid? Should the first/last one in the file be used?
  • about should IMO probably be renamed to something like placeID or similiar to reflect its new meaning.
@stoicflame
Owner

There should be restrictions and/or a standard method of resolution when there are multiple PlaceDescriptions with the same about identifier and which have different/missing lat/long-pairs. Is this invalid? Should the first/last one in the file be used?

Can you explain why an application would care? Why can't descriptions of the same place have different lat/longs? What problems does this cause? What's the harm in just leaving that resolution undefined in the spec?

about should IMO probably be renamed to something like placeID or similiar to reflect its new meaning.

So, just to be clear, it's not really a new meaning. But I don't mind reconsidering the naming of that property, either. On the other hand, there is precedent for the use of the name about. It's the term that RDF uses to identify resources being described.

@jralls

It's the term that RDF uses to identify resources being described.

Another reason to change it! ;-)

Seriously, RDF was at best an implementation detail that never belonged in the conceptual spec. Bsides, class member names should indicate the intent of that particular object or parameter, not parrot the role name from some abstract pattern. The abstract pattern information goes in the documentation.

@jralls

So, just to be clear, it's not really a new meaning.

Yes, it is. It used to point to an object of class Place, and now it's just a tag that can be used to group PlaceDescriptions. A URI made sense for the former, but does not for the latter, and "about" does nothing to convey what the member is supposed to contain.

@stoicflame
Owner

Yes, it is. It used to point to an object of class Place, and now it's just a tag that can be used to group PlaceDescriptions.

Okay, so I concede a slight change of the meaning of the property. It used to be an identifier that resolves to a Place that MAY be used as a tag to correlate PlaceDescriptions. Now it's a tag to correlate PlaceDescriptions that (as a URI) MAY be used to resolve to an instance of a place, the model and representation of which is not defined in the spec.

Look, I'm fine with changing the name. All I'm saying is that a group of people smarter than me had the same concept that they needed to support and they used the term about to define it. There is something to be said about re-using terms across industries to reduce communication friction with people who are new to our design and model.

And if we do change the name, I want it changed for SourceDescription, too.

@thomast73

There should be restrictions and/or a standard method of resolution when there are multiple PlaceDescriptions with the same about identifier and which have different/missing lat/long-pairs. Is this invalid? Should the first/last one in the file be used?

Can you explain why an application would care? Why can't descriptions of the same place have different lat/longs? What problems does this cause? What's the harm in just leaving that resolution undefined in the spec?

I had been thinking about this before @mikkelee posted these questions. It is the logical conclusion having known the normalized version of the design versus the current incarnation. The current design de-normalized latitude and longitude; but the primary reason it was in Place in the previous design was our expectation that the majority of system implementers store a single latitude and longitude for a given place, regardless of the amount of historical metadata they are associating with that place. In order to re-normalize that data about a place after transmission, we would want to assume (or need to require) that latitude and longitude be the same across all PlaceDescription instances associated with that place. I think most implementations will want to re-normalized those two values. The spatialDescription is the infrastructure designed to allow variance in thumb-tack location if that is the desired outcome.

+1

I would be in favor of saying something like:

  • "It is assumed that all instances of PlaceDescription that have identical place identifiers -- i.e., PlaceDescription.about is the same value among these instances -- also have identical latitude and longitude values.
@mikkelee

Yeah my point was pretty much what @thomast73 outlines above. I'm fine with that text as it makes it clear how to serialize which means mismatches will only happen in mangled transmissions that implementers can make special-cases for if they need.

@jralls

And if we do change the name, I want it changed for SourceDescription, too.

+1

@thomast73

I have updated the constraints for latitude and longitude.

Regarding the naming of the about member: We hear you. For now, we are leaving the about name as it is. We would like to request that one of you open a separate issue, and would suggest that you seed the discussion of that issue with a naming proposal that includes the naming of both of the about members (one in PlaceDescription and and one in SourceDescription), and that your proposal expresses the related semantics that belong to those class members.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Nov 6, 2012
  1. @thomast73
  2. @thomast73
Commits on Nov 7, 2012
  1. @thomast73
  2. @thomast73
  3. @thomast73

    Adjustments to naming and serialization to conform to patterns alread…

    thomast73 authored
    …y in use in other parts of the model.
Commits on Nov 8, 2012
  1. @thomast73

    First cut at updating the conceptual model specification to reflect t…

    thomast73 authored
    …he changes we are making to the representation of places in GEDCOM X.
  2. @thomast73

    Updated the diagram to reflect a change in the name of the reference …

    thomast73 authored
    …to a place description in PlaceReference.
  3. @thomast73
  4. @thomast73
Commits on Nov 9, 2012
  1. @thomast73

    Updates to the serialization specifications to reflect the changes/ad…

    thomast73 authored
    …ditions being made to the representation of place metadata in GEDCOM X.
Commits on Nov 10, 2012
  1. @thomast73
Something went wrong with that request. Please try again.