Information modeling interview protocol

Ethan Hartzell edited this page Jun 6, 2016 · 3 revisions

Information modeling interview protocol

As we develop draft and final models, we will be validating them with the domain scientists who are collaborators on our grant. The purpose of this validation is four-fold:

  1. Assure face validity and scientific relevance of the models themselves
  2. Prioritize information extraction targets by identifying "most wanted" elements
  3. Identify value proposition by determining elements most difficult to obtain from structured data.
  4. Align the semantic model with the scientific "mental model". This will make downstream aspects of the software (including the visualization) more organic and presumably easier to accept.

Our goal is thus to ask each of our three domain experts the following:

Based on their specific scientific needs,

  1. Review our existing model attributes and constraints/value sets, and prioritize them
  2. Identify missing model attributes and constraints/value sets which will be added to the models
  3. Identify the attributes that are hardest to obtain from existing (mainly structured) data sources
  4. Determine which relationships are most important to represent
  5. Enumerate values within value sets

We propose to use a card sort task to accomplish this.

Materials

  • Index cards with printed labels. Each card will have the attribute and a set of values (in lieu of a definition)
  • Colored Markers to write categories on cards after the sorts
  • Extra blank index cards
  • Script
  • Audio recorder

Script

SESSION 1

PART 1

Thanks for agreeing to participate in this modeling validation. The purpose of this meeting is to help us identify which possible data elements that could be extracted provide the most benefit to you and other researchers like you. This will help us to develop the best possible underlying model, and will also help us focus on creating software that extracts the kinds of information that is most useful to you.

What we will do today is to organize a set of cards. These cards contain data elements that we know to be useful to investigators who work with data. All of these data elements represent clinical and outcomes information on human subjects, as opposed to the kinds of data that you generate in your lab. These clinical data elements will represent information that you would use to classify or categorize patients, for example to correlate them with some molecular information such as copy number or variant.

During the next hour, I will ask you to look through a set of cards and sort them in a specific way. I may ask you some elucidating questions about how you've sorted them. If there is other information you want to give me, feel free to do so.

Here's an example of one card that shows a data element for 'Location of Metastasis' and a set of possible values, in this case just four options. The cards may not include the entire set, but just enough to give you an immediate sense of what the data element is really about, in lieu of an actual definition. Feel free to comment on these values as you go through them, but no need to do a detailed analysis of them. We will come back to these value sets in another session with you.

Does that sound OK? Any questions so far?

PART 2

For the first sort, let me give you this set of cards which contains data elements relevant to . We got these data elements from a variety of sources including models of existing databases here at University of Pittsburgh, as well as published articles and standards. For the first sort, I'd like you to separate these cards into three groups:

  • information YOU would use in a study
  • information other investigators often want for their studies, even if you wouldn't necessarily use them
  • information few investigators are likely to want

A few things to note. You will find dates conspicuously absent from this set. We will be collecting timestamps for everything that is extracted, and using those time stamps to calculate values, for example time from diagnosis to metastasis. That's another topic for our next session, so don't worry about that kind of temporal relationship for now.

Also, you don't need to worry about where the information comes from, how easy it is to get, or how accurate it is. Assume that you can get perfect data very easily and base the card sort only on what you or other investigators might use in your studies.

Finally, this is likely not comprehensive. So these blank cards and pen here can be used to add to the set when you find something missing.

After you finish that sort, I am going to ask you to prioritize those first two groups.

Any questions? Go ahead.

PART 3

For the second sort, let me give you just these first two sets - the data elements that you would use and the ones you thought others would use. Now I'd like you to separate these cards into three groups:

  • information that can be obtained in structured form from existing data sources (EMR, Cancer Registry, or other databases you have access to)
  • information that cannot be obtained from existing data sources in structured form but is still electronic and can be processed (e.g. free text MARS notes). If you are already using these data elements, you may be manually abstracting them from the EMR.
  • information that is not electronic at all (e.g. hand written notes or PDFs of written notes)

The purpose here is to figure out which of these data elements would provide the most value for natural language processing systems to extract.

After you finish that sort, I am going to ask you to prioritize the second group.

Any questions? Go ahead.

###Procedure

  1. Ask for permission to record
  2. Turn on tape recorder
  3. Read Script Part 1 to the participant
  4. Read Script Part 2 to the participant
  5. Ask for any questions
    • Ask expert to separate attributes into three groups (information I want, information other people want, information few investigators want). Note that these do not imply necessarily where the information is coming from.
    • Ask expert to add any other elements that are not included here which he/she would want to collect
    • For first set (information I want), ask expert to order them (roughly) by how useful/important they are.
    • Collect and mark back of cards (I|O|N, # order for I)
  6. Read Script Part 3 to the participant
  7. Ask for any questions
  8. Provide all I or O cards (elements investigator or other investigator want)
  9. Ask expert to separate attributes into three groups ( items that can be retrieved using structured data from EMR, Cancer registry or existing databases, items that can be retrieved from reading and abstracting electronic free text sources e.g. MARS notes, and items that only exist in non-electronic sources).
  10. For second set (those that can only be retrieved using NLP), ask expert to order them (roughly) by how useful/important they would be.
  11. Collect cards and mark backs (S|U|NE, # for order of U)
  12. Thank participant.
  13. Turn off tape recorder.