Skip to content

Commit

Permalink
Joss paper: fixed spelled
Browse files Browse the repository at this point in the history
  • Loading branch information
th5 committed May 9, 2023
1 parent 812d121 commit ba08d79
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ bibliography: paper.bib
Research activities in data rich areas such as biomedical informatics face many data challenges including harmonizing complex and disparate data, integrating existing knowledge bases into data sets for manual or machine learning analysis, and reproducibility of results. Graphs are a powerful data structure for naturally describing complex data. Information about data provenance can be embedded in the graph itself to aid in quality control and reproducibility. **Carnival** is a semantically driven informatics toolkit that enables the aggregation of data from disparate sources into a unified property graph and provides mechanisms to model and interact with the graph in well-defined ways inspired by the Open Biological and Biomedical Ontology (OBO) Foundry ontologies.

# Statement of need
Loading, cleansing, and organizing data can dominate the time spent on a data science project[@forbes1]. This phenomenon is exacerbated in human subjects research at an academic medical institution where data are very complex, reside in disparate repositories with varying levels of accessibility, are coded by separate yet overlapping coding systems, frequently rely on manual data entry, and change over time. Data provenance and reproducibility of results are important factors in human subjects research. It is no easy task to implement a robust consistent data pipeline with clear data provenance that can be rerun when source data change. While there are several mature libraries and toolkits that enable visualization and statistical computation once the analytical data set is generated, there are comparatively fewer data preperation tools.
Loading, cleansing, and organizing data can dominate the time spent on a data science project[@forbes1]. This phenomenon is exacerbated in human subjects research at an academic medical institution where data are very complex, reside in disparate repositories with varying levels of accessibility, are coded by separate yet overlapping coding systems, frequently rely on manual data entry, and change over time. Data provenance and reproducibility of results are important factors in human subjects research. It is no easy task to implement a robust consistent data pipeline with clear data provenance that can be rerun when source data change. While there are several mature libraries and toolkits that enable visualization and statistical computation once the analytical data set is generated, there are comparatively fewer data preparation tools.

Existing extract, transform and load (ETL) technologies such as [Microsoft SQL Server Integration Services](https://docs.microsoft.com/en-us/sql/integration-services/sql-server-integration-services) help with data staging. Similarly, data manipulation tools like [pandas](https://pandas.pydata.org) facilitate transformation of series and matrix data. **Carnival** distinguishes itself by offering a lightweight data caching mechanism coupled with data manipulation services built on a property graph rather than arrays and data frames. Graphs present an alternative to relational data structures that more naturally represent complex and highly relational data and are more adaptive to change. A property graph database is an implementation of the graph structure that represents data as nodes and directed edges (relationships) between the nodes, where nodes and edges can have properties (key/value pairs) associated with them. Carnival’s combination of features and graph data representation empowers informaticians and programmers working in complex data domains to build pipelines, utilities, and applications that are comparatively richer in semantics and provenance.

Expand Down

0 comments on commit ba08d79

Please sign in to comment.