Skip to content

Notes from codeathon

Sigfried Gold edited this page Nov 23, 2015 · 1 revision

Sigfried, Jeff, Lisa, Erin, Maile,Taylor Swizec Group Notes--

  1. We explored methods of using the dimension sets as a navigation tool. ** Network mapping of dim_sets (where are there linkages that can be leveraged) ** High-value dimensions ( like time, dimensions that are used frequently in measures, measures with lots of results)

  2. Explored available datasets for best substrate - OMOP/Achilles provided the most valuable transformation into the DQ-CDM

  3. Worked through various visual exploration options: network map --> starburst map of dim_sets --> horizontal icicle to visualize use of dim_set within sets ("dim_set sets") 4.. With dim_set --> heat map of dimension ratio of value count to value count total (example condition); if 2 dimensions--> 2 heat maps dim 1 x dim 2 -value 1, dim 1 x dim 2-value 2, dim 1 x dim i. But then need to go dim 2 x dim 1, etc.

  4. Key dimensions --like year, age, gender (consider how to tag, explore)

  5. If you only have your own data to monitor the process of data acquisition - what methods can use to do this. **Statistical Process Control (Statistical Quality Control - need learn more) methods based on data data-time capture (not ETL/db acquisition date). **Example of value: ****Control charts attempt to distinguish between two types of process variation:

    Common cause variation, which is intrinsic to the process and will always be present. Special cause variation, which stems from external sources and indicates that the process is out of statistical control.

** http://asq.org/learn-about-quality/statistical-process-control/overview/overview.html 7. Suggestions: **need conventions for transforming DQ measures to CDM, especially regarding dim_set name, **classifying DQ measures by type - [completeness, fidelity, uniqueness, plausibility] and [verification, validation]

InfoViz DQ Platform

Todo

  • Review datasets (Lisa, Jeff, with sql and python help from Taylor)
    • Look at measures and dimensions
    • How many?
    • Useful?
    • Similar to other datasets?
    • Anything we should just ignore?
    • Where there are a huge number of especially measures, what might we do to help a user find ones of interest?
    • How can dataset data be enriched with terminology/ontology resources? Can we categorize ICD-9 codes, for instance?

Platform

  • Design
    • Lisa, Jeff, probably start by doing some review of datasets and answering questions above. Then play with software a bit, then start defining and prioritizing features. Maile can turn these into specific designs.
    • Lisa Think about language being used in the interface. Dimension, value, dimsetset, measure, record are useful terms for understanding the data in the abstract. Can we increase user understanding without sacrificing sense and power in representing data elements?
  • Implementation
    • Swizec
      • Look over code for big architecture mistakes.
      • SparkBars are pretty much pure D3 inside react. Is that making them slow?
      • Get reload on schema change working right
    • Sigfried
      • Continue coding, possibly out loud with people watching to help folks get to understand existing code base.
      • Add SparkBars to Dimsetsets component
      • Show measures and stats for each dimension
      • Dimension config controls: categorize dimension as Time/ordinal/x
      • Allow nesting of dimensions that (can) have a hierarchical relationship. (Use drag/drop?)
    • Taylor
      • Fix install process/readme so it isn't so hard for people to install software
      • Set up on Heroku
      • Data munge code (described below for pcornet problem)
    • Jeff
      • Bring in terminology/ontology resources as appropriate.
      • Even if you're not contributing much to Javascript/D3/React programming, you could be reviewing architecture with Swizec. You're probably the most serious CS guy here; we should make use of that in addition to your informatics skills.

Datasets

phis

pcornet

Sigfried's datamunge code wasn't able to import all of pcornet. Should we rewrite the code to handle bigger inputs? Should we figure out how to do it in pieces? Taylor, Jeff

ms

chco

From Achilles

synpuf

From Achilles?

Clone this wiki locally