Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record model: align the notion of "fact" as applied to the conclusion model #97

Closed
wants to merge 1 commit into from
Closed

record model: align the notion of "fact" as applied to the conclusion model #97

wants to merge 1 commit into from

Conversation

stoicflame
Copy link
Member

Since the merge of Event and Characteristic to Fact in the conclusion model, a significant schism has existed between the conclusion model and the record model that needs to be mended. This pull request proposes the following concepts to realign the two models.

  • A Fact is a generic piece of historical data applicable to a person or relationship. The fact applies only within the context of the person or relationship that contains it, although it may refer to external resources such as dates, places, and events.
  • An Event is a historical event. It's its own entity, so resources can reference it to model, for example, persons who participated in the event.
  • The notion of an event is really important in the record model because events are core to the purpose of a record. Records are created to chronicle events.
  • We don't have the notion of an event in the conclusion model--yet. At least, we don't have use cases that are enough well-defined to confidently define them. We'll add an Event entity to the conclusion model when those use cases are fleshed out and be satisfied with facts to model things that occurred in the life of a person and relationship.
  • In the record model, we collapse Characteristic and EventRole to Fact, and allow the Fact to reference a record Event to describe the event-specific concepts (e.g. date, place). The value of the Fact is used to describe the role or other relation to the event. We add the notion of principal to Fact in the record model to describe whether the fact is a principal fact of the record.

@ranbo
Copy link
Contributor

ranbo commented Nov 29, 2011

So when we want to list the events of a person in a record, we would look at their facts, and either show the type, date and place that are right in the fact; or else follow an event role (stored in the Value) to find a shared Event, and get the type, date and place from that?

Does Event really need to be a globally-addressable "entity", or could it be a record-local shared object like it has been in the past? Is a Value well suited for pointing to an Event, or would it be better to revert to using EventRole when a person plays a role in a shared Event?

It doesn't seem like we would want to create an Event even for primary events (like a birth in a birth record), but only for events that are really shared (like census events for a household, although one could argue that each line is its own "mini-event" and doesn't need to be shared). Other than census, we haven't found a need for shared events yet, though I wouldn't be surprised if one came up.

We wrestled with this before: should events be in the person or couple relationship, allowing the only role in the event to be "principal" (but conveniently putting the event right where it is almost always needed); or do we put events out at the record level and have everyone play a "role in event" (less convenient, but more flexible); or allow both (usually convenient, but then code has to look both places for event).

@ranbo
Copy link
Contributor

ranbo commented Nov 29, 2011

My biggest concern here is the self-inconsistency. We're telling the world that events and characteristics have now been combined into facts; and then introducing a new concept called Event (not to be confused with the old one we used to call Event).

Is "Event" supposed to be used only when we think it will be shared? Or is the idea to use it for the primary event in most records?

I still wouldn't call the decision to combine Event and Characteristic into Fact to be final until this issue is settled. My vote would still be:

  1. Revert to having Event and Characteristic separate. Allow both to have date and place (often empty for characteristic), in both the conclusion and record model. Use GENTECH's definition to distinguish them: "A 'characteristic' describes a person, and can be a physical characteristic, a personality trait, or more diffuse data such as occupation. A characteristic is generally a descriptive fact that applies to the person in question over a reasonably long period of time." And buck up and put things in one or the other category when it's fuzzy, or even allow a few types to go in both categories if absolutely necessary.
  2. Store events on the person and couple relationships in most cases and let the role of "principal" be implied.
  3. If (or when) shared events are important, re-introduce "role in event" and allow it to point to an event in the record. Don't bother making events globally addressable from outside the record until a really strong use case requires it, and then revisit how to do that.

I realize there was some grief in how to classify events vs. characteristics, but the strangeness of having "value" for events seems worse; we have to decide which ones are events anyway to decide on the right UI to display; it is much more natural to talk about events and characteristics in everyday conversations; and having "shared events" in the model makes way more sense if there are also "regular events".

@stoicflame
Copy link
Member Author

My biggest concern here is the self-inconsistency. We're telling the world that events and characteristics have now been combined into facts; and then introducing a new concept called Event (not to be confused with the old one we used to call Event).

Actually, I think we're telling the world that there have always been two things that we've called "events". One is the thing that is shared and one is the thing that describes how the thing that is shared applies to a person and relationship. We're proposing disambiguating the two concepts by naming the former "event" and using "fact" to model the latter.

I don't think that is self-inconsistent.

So when we want to list the events of a person in a record, we would look at their facts, and either show the type, date and place that are right in the fact; or else follow an event role (stored in the Value) to find a shared Event, and get the type, date and place from that?

No.

In the record model, there is no date and place on fact. Everything that has a date and a place on a record is considered to be an event. If a person plays a particular role in an event, then a fact is applied (referencing the event) to account for the role the person played in the event.

Does Event really need to be a globally-addressable "entity", or could it be a record-local shared object like it has been in the past?

In the record model, Event is not a globally addressable entity. It's record-local, but shared. Like SoRD 1.0.

Is "Event" supposed to be used only when we think it will be shared? Or is the idea to use it for the primary event in most records?

Event is used to model any events on the record.

Consider, for example, a christening record that contains the name of the child, the names and birth dates of both the father and mother, and a name of a godfather. There would be four personas and three events (christening, birth of the father, and birth of the mother). The christening event would be the "primary" event. All four personas would reference the christening event, but only the child would reference the christening as the "principal" event for the persona. Only the father would reference the "birth of the father" event. Only the mother would reference the "birth of the mother" event.

It doesn't seem like we would want to create an Event even for primary events (like a birth in a birth record), but only for events that are really shared (like census events for a household, although one could argue that each line is its own "mini-event" and doesn't need to be shared).

So I think what you're saying is that you don't like the way that SoRD 1.0 did it. We tried it with SoRD 1.0 and it didn't work out very well. Is that right?

@ranbo
Copy link
Contributor

ranbo commented Nov 29, 2011

So in the record model:

  • events are Events and are attached to people via a Fact that serves as a "role in event";
  • and characteristics are Facts, and characteristics/Facts can't have date or place.
    But in the conclusion model:
  • events and characteristics are both Facts, and both can have dates and places.

Is that the proposal?

@stoicflame
Copy link
Member Author

Is that the proposal?

Yeah, pretty much. At least, that's the practical application of the proposal.

My hesitation is that part of the proposal is the story that we're trying to tell. What we're really trying to say in the conclusion model isn't that "events are facts", but that "facts" are used to describe stuff that occurred in the life of the person. We haven't (yet) modeled the notion of a "conclusion event".

@carpentermp
Copy link

In closing #85 @stoicflame wrote:

Since it's become clear that this specific pull request isn't viable, I'm closing this and opening #97.

How was it determined that it wasn't viable? There didn't appear to be any discussion on the thread--it must have all happened out of band. This does a disservice to those following the threads, prevents full involvement, and fails to keep a complete record of how we arrived at decisions.

I would like to challenge the decision that #85 is not viable and invite someone to document the rationale here. If we can get all the reasons outlined, that will give people a chance to answer them. @stoicflame alluded to the reasons here:

An Event is a historical event. It's its own entity, so resources can reference it to model, for example, persons who participated in the event.
The notion of an event is really important in the record model because events are core to the purpose of a record. Records are created to chronicle events.

Important in what way? Every model is an abstraction and a simplification. What we choose to model, and how we choose to model it, depend on how we intend to make use of the information. In a system that is all about places, (e.g. our place authority) places would be entities. In the Record model, they are not.

So to say, "events are really important--they ought to be entities" is not enough. You have to be able to make the case for why. I am currently not aware of any use cases that would argue for events being entities. On the contrary, the way we want to consume event data makes it generally more convenient to consume as information "about a persona" or "about a relationship."

I made this case in #84. I'll include parts of it here to refresh everyone's memory:

I can see how you might see an advantage to having Events out at the Record level where you can get a the complete list of events all at once. However, it's more common to consume events from the point of view of the person(s) who played the principal role(s) in the event. For example, the birth event isn't very interesting in isolation. It's only interesting as the birth of a given Persona. So, for these Events, it would actually be more handy to have the Event logically part of the Persona. The same goes for nearly all other events--there is actually an advantage to having them logically part of the Persona or Relationship that they apply to.

If you remember, when we were coming up with SoRD we vacillated over this very issue and came down--very narrowly--on the side of Events at the Record level with Persona references to them. At the time we had no idea how often event sharing like this might occur and so we didn't know how much weight to assign to this argument, but it seemed safer to take this approach. In the years since that decision we have modeled more than a billion records from more than 500 collections. To my knowledge, in all that time we have only ever shared one kind of event among a group of people--census events. Ironically, we weren't even sharing these events until about a month ago when we fixed a bug in the census record stitching code. Prior to that time, each Persona had his own census event and no one noticed or cared enough to report the issue.

Now, let me give my thoughts about the pull request itself.

As a preface, let's go over how these changes envision the model working. There are a couple of possible approaches (which actually highlights one of the main problems that exist with the proposed model):

  1. Approach 1: All events are modeled as Events; all characteristics are modeled as Facts.
  2. Approach 2: All "primary events" (events that caused a record to be created) are modeled as Events; all other events and characteristics are modeled as Facts.

@stoicflame, I presume you intended Approach 2? The stated purpose of the change is an attempt to bring the Record model and Conclusion model into alignment with each other. Alas, whichever of the two above approaches is taken, this purpose is not accomplished.

Now, it may be argued "at least the models are closer; Characteristics are now Facts (and possibly many Events as well)." Unfortunately, the model still suffers from most of the same problems as before, and has a very significant new problem:

  • Before: EventType + CharacteristicType = FactType. Now: FactType is a superset of EventType. Whenever we add an EventType we have to add a corresponding FactType. Consumers of the data are still forced to map between these two types. Also, now that the Record model has Facts, this funky relationship between EventType and FactType allows for both approach 1 and approach 2, or any mix of the 2. Yuck.

  • Fact = Event + Fact. When either an Event is brought from the Record world into the Conclusion world, it has to be converted to a Fact. The same is necessary when comparing information between the two models. Suppose the system wants to answer this question for the user, "Does this Record support the "birth conclusion" on this Person? To answer this the system will have to look in two places. It will have to look for birth Facts on the person (assuming approach 2), and it will have to look for "role in event" facts. When the latter, the system will have to compare the Record's birth Event with the Person's birth Fact, but a birth Event has no "value", where a birth Fact may. So, (just as before these proposed changes), we have created two models where the core building blocks of information are different.

  • Conceptual dissonance. We had this before the proposed changes, and we have it still. What is new is that (assuming Approach 2) the Record model is not even consistent with itself. The same kind of information is modeled in two different ways e.g. birth events are sometimes Events, sometimes Facts. This is reeeeally ugly and confusing.

  • Fact in Record model now has ResourceReference. This is problematic for a couple of reasons:

    • There is no ResourceReference on Fact in the Conclusion model, so this creates an additional incompatibility.
    • The ResourceReference only makes sense for Facts that are actually "roles in an event". For other facts, it is just a wart. We already warts on Fact (some Facts don't have values, some don't have a date and place). Fact is becoming a kitchen sink.

    Bottom line: it is hard for me to see how the proposed changes improve the situation. The Record and Conclusion models are not brought into alignment, and the Record model now suffers from split personality disorder.

@stoicflame
Copy link
Member Author

How was it determined that it wasn't viable? There didn't appear to be any discussion on the thread--it must have all happened out of band.

Woh there. I was commenting about the original pull request, not the commit at df0f216. I assure you that there was no "out of band" decision made. If such a decision had been made, wouldn't it have been just committed to master? If it makes you feel better, I'll open up a pull request for df0f216, but I was trying to consolidate the latest ideas and proposals into a single, newer thread.

I presume you intended Approach 2?

Umm... no... I think Approach 1 is more accurate, with the addition that all "roles in events" are described as facts.

FactType is a superset of EventType. Whenever we add an EventType we have to add a corresponding FactType. Consumers of the data are still forced to map between these two types.

Umm... no... FactType is its own set. EventType is its own set. Sure, they share a lot of types, but that doesn't imply any superset/subset relationship.

When either an Event is brought from the Record world into the Conclusion world, it has to be converted to a Fact.

Umm... no... there's no such thing as "bringing an event from the record world to the conclusion world". We haven't (yet) modeled events in the conclusion world.

Suppose the system wants to answer this question for the user, "Does this Record support the "birth conclusion" on this Person? To answer this the system will have to look in two places. It will have to look for birth Facts on the person (assuming approach 2), and it will have to look for "role in event" facts.

Umm... no... just look for the birth facts on the person. Done. What's a "role in event" fact?

When the latter, the system will have to compare the Record's birth Event with the Person's birth Fact, but a birth Event has no "value", where a birth Fact may.

Umm... what's "the latter"?

The same kind of information is modeled in two different ways e.g. birth events are sometimes Events, sometimes Facts.

Umm... no... birth events are events. Birth facts are facts. Where's the conceptual dissonance?

Fact in Record model now has ResourceReference.

Umm... just to be more clear, fact in record model now has a field named "event" that is a resource reference.

There is no ResourceReference on Fact in the Conclusion model, so this creates an additional incompatibility.

What's incompatible? We could put an event reference in the conclusion model, too, but there's nothing for it to point to (yet). So why not wait to add it until we have added the notion of a "conclusion event"?

The ResourceReference only makes sense for Facts that are actually "roles in an event". For other facts, it is just a wart.

The majority of facts will be "roles in an event". And to call it a "wart" for the others is a bit exaggerated, I think. It's just an unused field.

some Facts don't have values, some don't have a date and place... Fact is becoming a kitchen sink.

Actually no facts have date and place. And, again, the large majority have values. for the others, it's just an unused field. Fact is hardly a "kitchen sink".

@ranbo
Copy link
Contributor

ranbo commented Nov 29, 2011

I think Approach 1 is more accurate, with the addition that all "roles in events" are described as facts...

and then

What's a "role in event" fact?

It sure seems like the first statement answers the question.

birth events are events. Birth facts are facts. Where's the conceptual dissonance?

I think the conceptual dissonance is self-evident in those two statements, because both "birth events" and "birth facts" are talking about the same real birth event. It is probably helpful here to use qualifiers on the word "event" to avoid talking past one another, since we have 3 types of events we're talking about: "shared record events", "facts on persons that tell what role that person played in a shared record event" and "facts on conclusion persons that represent events". Using the word "event" to mean only "shared events in records" is probably contributing to the confusion.

@carpentermp
Copy link

Thanks @ranbo for chiming in. You have hit upon the crux. A way of asking it is, "What do I call the information that describes where and when a person was born?" In the Conclusion model it is called a Fact of FactType.Birth. In the Record model it is called "Event" of EventType.Birth. Two different model concepts for the same real-life happening. When transversing the two models a mapping has to be made.

I think Approach 1 is more accurate

Sorry @stoicflame, it appears that I guessed wrong about which approach you meant with the proposed model change. (Again, the fact that such a misunderstanding was possible, is a problem by itself.) In that case, the issues I have with the proposed changes are mostly the same ones I have with things as they stand today. By changing Characteristic to Fact you have alleviated some of the problem, but introduced another: Many FactTypes should never be used in the Record model, specifically those for which there are EventTypes. (Before none of the FactTypes were used because the Record model had no Facts) It will likely be very confusing to users looking at the FactType list to have to grok that, while all the FactTypes are used in the Conclusion model, only some of them are used in the Record model.

Actually no facts have date and place.

I believe you meant that no facts in the Record model have date or place. In the Conclusion model they all do--another incompatibility between the two models. (I didn't catch this until your comment--I assumed Fact would have date and place in both models)

Woh there. I was commenting about the original pull request, not the commit at df0f216. I assure you that there was no "out of band" decision made. If such a decision had been made, wouldn't it have been just committed to master? If it makes you feel better, I'll open up a pull request for df0f216, but I was trying to consolidate the latest ideas and proposals into a single, newer thread.

Again, I misunderstood what you were saying was not viable. You can understand my misapprehension: at one point you were saying you would add a "primary" flag to Fact to cover the "primary event" case, the next thing I saw, you had ressurected Event in the model. My other misapprehension--that Events were only for "primary events"--is also understandable because of the rationale you gave for the pull request:

The notion of an event is really important in the record model because events are core to the purpose of a record. Records are created to chronicle events.

Records are created to chronicle primary events, not other events. Perhaps it would be a good idea to submit a pull request for df0f216 since it is really a very different and competing proposal to this one.

@stoicflame
Copy link
Member Author

Please see #99 as the proposal for putting this issue to rest.

@stoicflame
Copy link
Member Author

This issue has been put to rest with application of #99. Thanks for everybody's efforts to get us moving forward again.

@stoicflame stoicflame closed this Dec 8, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants