Skip to content

Commit

Permalink
Update paper.md
Browse files Browse the repository at this point in the history
Whitespace updates

Ref #120
  • Loading branch information
hjwilli committed May 11, 2023
1 parent 29d6965 commit a843547
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Loading, cleansing, and organizing data can dominate the time spent on a data sc

Existing extract, transform and load (ETL) technologies such as [Microsoft SQL Server Integration Services](https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-services) help with data staging. Similarly, data manipulation tools like [pandas](https://pandas.pydata.org) facilitate transformation of series and matrix data. **Carnival** distinguishes itself by offering a lightweight data caching mechanism coupled with data manipulation services built on a property graph rather than arrays and data frames. Graphs present an alternative to relational data structures that more naturally represent complex and highly relational data and are more adaptive to change. A property graph database is an implementation of the graph structure that represents data as nodes and directed edges (relationships) between the nodes, where nodes and edges can have properties (key/value pairs) associated with them. Carnival’s combination of features and graph data representation empowers informaticians and programmers working in complex data domains to build pipelines, utilities, and applications that are comparatively richer in semantics and provenance.

Knowledge bases in Resource Description Framework (RDF) triplestores can be valuable tools to harmonize and enrich complex data. Transforming source relational data into RDF triples reflecting a data model is challenging. While there exist relational-to-RDF mappers such as Karma[@10.1007/978-3-662-46641-4_40], the configuration process is labor intensive and the resulting triples may not match a data model particularly one of sufficient complexity.
Knowledge bases in Resource Description Framework (RDF) triplestores can be valuable tools to harmonize and enrich complex data. Transforming source relational data into RDF triples reflecting a data model is challenging. While there exist relational-to-RDF mappers such as Karma [@10.1007/978-3-662-46641-4_40], the configuration process is labor intensive and the resulting triples may not match a data model particularly one of sufficient complexity.

**Carnival** was developed to create domain-specific property graph data models, and provide tools to create robust pipelines to import and manage data in that model. There are two main components to Carnival. The primary component is a layer built on top of [Apache Tinkerpop](https://tinkerpop.apache.org) that seeks to provide more standardized and semantically driven methods of interacting with a property graph. An additional component is a data caching mechanism that supports the efficient aggregation of data from disparate sources.

Expand All @@ -61,7 +61,7 @@ Knowledge bases in Resource Description Framework (RDF) triplestores can be valu
Carnival was initially developed to facilitate the production of analytical data sets for human subjects research. The source data repositories included a relational data warehouse accessible by SQL, a REDCap [@HARRIS2019103208; @HARRIS2009377] installation accessible by API, and manually curated data files in CSV format. Data pertaining to the set of study subjects was distributed across each of these data sources. Using Carnival, a data pipeline was implemented to pull data from the data sources, instantiate them in a property graph, clean and harmonize them, and produce analytical data sets at required intervals.

#### Queries over enriched data
A key challenge of human subjects research is to locate patients to recruit to a study, frequently done by searching a research data set containing raw patient data. Potential recruits need to be stratified by attributes, such as age, race, and ethnicity, matched against inclusion criteria, such as the presence of a diagnosis code, and filtered by exclusion criteria, such as a treatment modality. **Carnival** has been used effectively in this area by loading the relevant raw data into a graph, stratifying and categorizing patients by the relevant criteria, then using graph traversals to extract the patients who are potential recruits[@FREEDMAN2020100086; @carnivalcohort].
A key challenge of human subjects research is to locate patients to recruit to a study, frequently done by searching a research data set containing raw patient data. Potential recruits need to be stratified by attributes, such as age, race, and ethnicity, matched against inclusion criteria, such as the presence of a diagnosis code, and filtered by exclusion criteria, such as a treatment modality. **Carnival** has been used effectively in this area by loading the relevant raw data into a graph, stratifying and categorizing patients by the relevant criteria, then using graph traversals to extract the patients who are potential recruits [@FREEDMAN2020100086; @carnivalcohort].

#### Integration with [OBO Foundry](https://obofoundry.org) Ontologies
We drew upon ontology modeling in the OBO Foundry as inspiration for the Carnival graph data model. For example, a ‘process’, is an event that occurs at some time on some material entity. A ‘planned process’ extends ‘process’ to include a pre-defined plan, participants, inputs, and outputs. In the Carnival graph, healthcare encounters are modeled as planned processes, where participants include the patient and clinician and the outputs may be diagnoses and medications.
Expand Down

0 comments on commit a843547

Please sign in to comment.