proposal to refactor "extracted" flag to accommodate additional states #251

stoicflame · 2013-05-08T20:39:31Z

In an effort to identify potential backwards-incompatible changes that may need to be applied in preparation for our milestone 1 release, I keyed in on the extracted flag. The intent of the extracted flag is to identify a conclusion as "extracted" from a single source. I think this is an important state to identify, but I'm wondering if there are more than just two "states" of a conclusion--I'm thinking there are three: "extracted," "hypothetical" (the conclusion is working hypothesis), and "accepted" (the conclusion has been formally accepted based on the evidence we have, and there should be a proof statement in the form of an analysis document).

This pull request is a proposal to refactor the extracted flag to be a property named researchState. The data type of the researchState property is an enumerated value. Section 4 I renamed to be "Research States" and now includes (in addition to the "extracted conclusion" constraints) the "known research states" enumeration, which is defined as follows:

URI	description
`http://gedcomx.org/Extracted`	The conclusion was extracted from a single source. The conclusion MUST conform to the Extracted Conclusion Constraints.
`http://gedcomx.org/Hypothetical`	The conclusion is a working hypothesis.
`http://gedcomx.org/Accepted`	The conclusion has been formally accepted. The conclusion SHOULD provide a reference to an analysis document that is used to provide the genealogical proof statement.

Comments are welcome.

jralls · 2013-05-13T19:10:42Z

+1

mikkelee · 2013-05-13T20:54:08Z

+1

thomast73 · 2013-05-14T19:43:09Z

Initially, I thought I agreed with this proposal. However, I was unhappy with the proposed property name.

To me, we are not talking about research states. It is true that research can be in a hypothetical (or working) state or in an accepted (or proven) state, but what does it mean to have research in an extracted state?

Thus, my search for a better property name....which I have yet to discover.

A researcher might create a conclusional Subject for one of several research purposes. This issue has identified three such purposes:

to represent extracted information found in a single source
to represent a working hypothesis— research in progress; conclusional data still being analyzed and tested
to represent an accepted conclusion—completed (presumably proven) research

As I have thought about this further, however, I am inclined to revert toward only two Subject types:

subjects to represent extracted information found in a single source
subjects that represent potential answers to research questions — hypotheses

Right now, the model distinguishes these two cases via the state of the extracted flag.

In the case of a hypothesis, perhaps we can talk about research state. I could say my hypothesis is a working or an accepted hypothesis. But consider the following use case:

I have a Person for which I have identified a name, birth, and death. I feel that the death is “proven”. I am still working on the birth. An exact name might be un-provable as neither the person nor the parents were literate and the records identified thus far are inconclusive.

I know the person exists, but not every aspect of the person ought to be “accepted”. But perhaps I am personally “done” working on it, so it is “acceptable” to me. Using the proposed scheme—a Subject-based scheme, is this person in a Hypothetical or Accepted state? The answer is not straightforward.

Probably what I would really wish for Subject is some sort of publishable status (e.g., published, ready-to-publish, working, private). Perhaps such a status is related to concerns raised in #175?

If instead we make the scheme Conclusion-based (meaning we add a property to Conclusion), we can state the death to be proven, the birth to be a working conclusion, and give the name some other status (maybe untenable?). The researchState property name makes sense for such a per-Conclusion status, but the property should only be set in cases when the Conclusion is part of a hypothesis; it would not make sense when I am modeling extracted information.

I still wonder if we would be better off turning extracted from a boolean to an enumeration in case there is some need we are not seeing clearly, but I would start with a two-state enumeration: one to indicate the Subject contains extracted information and one to indicate it represents a hypothesis. But I still have a property name issue for this property...

nilsbrummond · 2013-05-21T16:09:47Z

I think research state is an important idea but I think it should apply to the specific work subset (e.x. a single source analysis, a GPS analyisis, Final Conclusions) including both the Document, the extracted conclusions, and source list.

I have generally been thinking of an N-Tier approach where there are 3 tier in the common case. T1: Single Source Analysis, T2: GPS Analysis using T1, T3: Conclusion space as the union of all T2.

I see 3 research states:

Started
Unresolved
Completed
New_Evidence_Requires_Re_Evaluation_Of_The_Analysis

When you create a new source analysis or GPS analysis it should be in the started state. If there is NOT enough information to make conclusions then the user sets it to Unresolved (Forever or until more evidence turns up). If there is enough evidence to make conclusions then the user set the state to Completed when finished with the work subset. Then at anytime if a work subset is modified - then it and all other "things" that depend on it shall be state changed to New_Evidence_Requires_Re_Evaluation_Of_The_Analysis by the system.

New_Evidence_Requires_Re_Evaluation_Of_The_Analysis may not be an actual state, but it is a least a theoretical state that must be possible, either by it being explicit or by modification timestamp analysis of the dependency chain.

An analysis is only valid if it is completed after all work products that it depends on.

stoicflame · 2013-05-21T19:47:32Z

Okay, after the explanations and comments submitted by @thomast73 and @nilsbrummond, I'm going to back off this proposal. @thomast73 offered a reasonable argument to keep the notion of "extracted" separate from the notion of "research states".

At this point, I have no intention of proposing the formalization of the concept of research states to the first version of the conceptual model because:

I don't think we have enough certainty that we'd get it right given the current state of the industry.
There isn't enough demand for the concept in existing products.
I think they can be added in a later version without backwards-incompatible changes.

defining a set of research states of a conclusion

ba95162

stoicflame closed this May 21, 2013

stoicflame mentioned this pull request May 21, 2013

support for modeling "negative" statements #127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal to refactor "extracted" flag to accommodate additional states #251

proposal to refactor "extracted" flag to accommodate additional states #251

stoicflame commented May 8, 2013

jralls commented May 13, 2013

mikkelee commented May 13, 2013

thomast73 commented May 14, 2013

nilsbrummond commented May 21, 2013

stoicflame commented May 21, 2013

proposal to refactor "extracted" flag to accommodate additional states #251

proposal to refactor "extracted" flag to accommodate additional states #251

Conversation

stoicflame commented May 8, 2013

jralls commented May 13, 2013

mikkelee commented May 13, 2013

thomast73 commented May 14, 2013

nilsbrummond commented May 21, 2013

stoicflame commented May 21, 2013