Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal to refactor "extracted" flag to accommodate additional states #251

Closed
wants to merge 1 commit into from

Conversation

stoicflame
Copy link
Member

In an effort to identify potential backwards-incompatible changes that may need to be applied in preparation for our milestone 1 release, I keyed in on the extracted flag. The intent of the extracted flag is to identify a conclusion as "extracted" from a single source. I think this is an important state to identify, but I'm wondering if there are more than just two "states" of a conclusion--I'm thinking there are three: "extracted," "hypothetical" (the conclusion is working hypothesis), and "accepted" (the conclusion has been formally accepted based on the evidence we have, and there should be a proof statement in the form of an analysis document).

This pull request is a proposal to refactor the extracted flag to be a property named researchState. The data type of the researchState property is an enumerated value. Section 4 I renamed to be "Research States" and now includes (in addition to the "extracted conclusion" constraints) the "known research states" enumeration, which is defined as follows:

URI description
http://gedcomx.org/Extracted The conclusion was extracted from a single source. The conclusion MUST conform to the Extracted Conclusion Constraints.
http://gedcomx.org/Hypothetical The conclusion is a working hypothesis.
http://gedcomx.org/Accepted The conclusion has been formally accepted. The conclusion SHOULD provide a reference to an analysis document that is used to provide the genealogical proof statement.

Comments are welcome.

@jralls
Copy link
Contributor

jralls commented May 13, 2013

+1

1 similar comment
@mikkelee
Copy link

+1

@thomast73
Copy link
Contributor

Initially, I thought I agreed with this proposal. However, I was unhappy with the proposed property name.

To me, we are not talking about research states. It is true that research can be in a hypothetical (or working) state or in an accepted (or proven) state, but what does it mean to have research in an extracted state?

Thus, my search for a better property name....which I have yet to discover.

A researcher might create a conclusional Subject for one of several research purposes. This issue has identified three such purposes:

  • to represent extracted information found in a single source
  • to represent a working hypothesis— research in progress; conclusional data still being analyzed and tested
  • to represent an accepted conclusion—completed (presumably proven) research

As I have thought about this further, however, I am inclined to revert toward only two Subject types:

  • subjects to represent extracted information found in a single source
  • subjects that represent potential answers to research questions — hypotheses

Right now, the model distinguishes these two cases via the state of the extracted flag.

In the case of a hypothesis, perhaps we can talk about research state. I could say my hypothesis is a working or an accepted hypothesis. But consider the following use case:

I have a Person for which I have identified a name, birth, and death. I feel that the death is “proven”. I am still working on the birth. An exact name might be un-provable as neither the person nor the parents were literate and the records identified thus far are inconclusive.

I know the person exists, but not every aspect of the person ought to be “accepted”. But perhaps I am personally “done” working on it, so it is “acceptable” to me. Using the proposed scheme—a Subject-based scheme, is this person in a Hypothetical or Accepted state? The answer is not straightforward.

Probably what I would really wish for Subject is some sort of publishable status (e.g., published, ready-to-publish, working, private). Perhaps such a status is related to concerns raised in #175?

If instead we make the scheme Conclusion-based (meaning we add a property to Conclusion), we can state the death to be proven, the birth to be a working conclusion, and give the name some other status (maybe untenable?). The researchState property name makes sense for such a per-Conclusion status, but the property should only be set in cases when the Conclusion is part of a hypothesis; it would not make sense when I am modeling extracted information.

I still wonder if we would be better off turning extracted from a boolean to an enumeration in case there is some need we are not seeing clearly, but I would start with a two-state enumeration: one to indicate the Subject contains extracted information and one to indicate it represents a hypothesis. But I still have a property name issue for this property...

@nilsbrummond
Copy link

I think research state is an important idea but I think it should apply to the specific work subset (e.x. a single source analysis, a GPS analyisis, Final Conclusions) including both the Document, the extracted conclusions, and source list.

I have generally been thinking of an N-Tier approach where there are 3 tier in the common case. T1: Single Source Analysis, T2: GPS Analysis using T1, T3: Conclusion space as the union of all T2.

I see 3 research states:

  1. Started
  2. Unresolved
  3. Completed
  4. New_Evidence_Requires_Re_Evaluation_Of_The_Analysis

When you create a new source analysis or GPS analysis it should be in the started state. If there is NOT enough information to make conclusions then the user sets it to Unresolved (Forever or until more evidence turns up). If there is enough evidence to make conclusions then the user set the state to Completed when finished with the work subset. Then at anytime if a work subset is modified - then it and all other "things" that depend on it shall be state changed to New_Evidence_Requires_Re_Evaluation_Of_The_Analysis by the system.

New_Evidence_Requires_Re_Evaluation_Of_The_Analysis may not be an actual state, but it is a least a theoretical state that must be possible, either by it being explicit or by modification timestamp analysis of the dependency chain.

An analysis is only valid if it is completed after all work products that it depends on.

@stoicflame
Copy link
Member Author

Okay, after the explanations and comments submitted by @thomast73 and @nilsbrummond, I'm going to back off this proposal. @thomast73 offered a reasonable argument to keep the notion of "extracted" separate from the notion of "research states".

At this point, I have no intention of proposing the formalization of the concept of research states to the first version of the conceptual model because:

  1. I don't think we have enough certainty that we'd get it right given the current state of the industry.
  2. There isn't enough demand for the concept in existing products.
  3. I think they can be added in a later version without backwards-incompatible changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants