Skip to content
@SyntheticDataPlatform

Synthetic Data Platform

A platform to enable a multitude of companies to build data assets accroding to their needs using a synthetic data platform

Hello and Welcome to the Synthetic Data Platform Git Hub site 👋

This is the updated work that was started within Project-Herophilus's Data Synthesis project (https://github.com/Project-Herophilus/DataSynthesis). We have moved it to its own organization to help us give it a long term viable future as a free-standing asset. There are many more repositories now and we can focus on how to continue the great work done over the last three years by amazing individuals and companies like Red Hat, IBM, Microsoft and others.

Background

As we thought about how to help healthcare we continue to focus and believe that data being the asset and that must be core as part of our mindset. A key part we want to ensure is a focus on a wide variety of data enablement capabilities. Our logic is simple, for years companies have focused on most aspects of development, from the tooling to developing the next generation of solutions to support their business needs and provide value. However, building great software to help today's modern needs require data, in many cases, massive amounts of data. It is a HUGE business and technical benefit if that data can closely resemble production data. Since data is the electricity that powers business and the cornerstone of companies’ success in the digital era, we wanted to take a more comperehensive focus on enabling organizations around synthetic data.

Synthetic data is defined as: "any production data applicable to a given situation that are not obtained by direct measurement" according to the McGraw-Hill Dictionary of Scientific and Technical Terms; where Craig S. Mullins, an expert in data management, defines production data as "information that is persistently stored and used by professionals to conduct business processes." With these definitions it is easy to understand the creation of synthetic data is an involved process that can be achieved by numerous measures and ways. Our way was to create a platform to synthesize data (Data Synthesis) for multiple needs based on items like industry standards, coded ontologies, vendor data models, custom defined models all in an on-demand manner. With a focus on data and specifically synthetic data we wanted our platform to clearly express our focus, the name we settled on was the Synthetic Data Platform.

The idea for the Synthetic Data Platform is in NO WAY new or unique, it purpose and usage is fueled to help reduce and/or remove the struggle that every organization experiences around their data needs. What we believe makes this plaform unique is our perspective and approach.

  • While there are numerous offerings out their across the open source and paid offerings we wanted to build something that could not only be used to support data integration needs but also support application development and integration needs as well.
  • As part of the Project Herophilus community the intent is for it to be leveraged to both support and enable other capabilities to be developed and leveraged. A complete list of components from Connectivity, Data Real-Time Assets, Data Simulators, Data De-Identification and Anonymization components and more can be found here.
  • Simplicity built for complex data and datasets needs. The Synthetic Data Platform from its inception has been designed to generate and/or build upon a concept of data attributes. There are currently 21 different data attributes it can use to create data structures.
  • Our focus is on enabling massive amounts of data to be used immediately or very quickly. This we feel helps to focus on reducing data breached and information exposure. Why should organizations risk data breaches or the potential leakage of PHI (in healthcare) or PII (In any other industry)? In today's technology world we wanted to enable a new and different way to innovate within a data-driven organization, an extensible
  • Work with implementations industry based data. Our focus is also on enriching the platform with codes and codesets into data thats generated to ensure it matches existing data systems.
  • Generating industry standargs. For Healthcare specifically this is HL7, FHIR, EDI and so forth. We are actively working on implementing FHIR and improving HL7.
  • Helping to create and grow "Data Driven Organization". To be a data-driven organization requires an overarching information culture driven by data. An information culture is not only a deep knowledge of their data but a major understanding how it relates to any specific testing needed or required. broad access and data literacy along with appropriate data-driven decision-making governance and guidance processes. While it sounds complicated it is really about providing businesses a means for data collection, cleansing, hosting and maintenance data while mitigating the risk of a data breach thru comprehensive testing processes and practices. Data-driven organization can innovate continuously because they understand and can embrace new business models quickly. The focus around tooling in these organizations typically is to enable them.

The Synthetic Data Platform Philosophy

This project has always intended to be operated under the open/community source model. the Synthetic Data Platform open source licensing model is Apache-2.0. Our model is not some "freemium" or offering based model with versions and scaled capabilities. Our approach is to provide the assets and have community enhancements and improvements to support the growth of underlying needs for the platform. data access capabilities. The core assets provided include a highly flexible and extensible data tier, APIs that both enable the platform to be accessed as well as extended and at some point there will be a WebUI.

The Synthetic Data Platform: Getting More Familiar

The Synthetic Data Platform was initially designed as part of a large open-source healthcare ecosystem, with a move into its organization we can now better align towards providing more extensive capabilities with multiple repositories.

Data Tier

Area Repository Location
DataTier DDLs https://github.com/SyntheticDataPlatform/DataTier-DDLs/blob/main/README.md
DataTier DataLoaders https://github.com/SyntheticDataPlatform/DataTier-DataLoaders/blob/main/README.md

APIs

Area Repository Location
SpringBoot-APIs https://github.com/SyntheticDataPlatform/APIs-SpringBoot
Node APIs https://github.com/SyntheticDataPlatform/APIs-Node
Quarkus APIs https://github.com/SyntheticDataPlatform/APIs-Quarkus

User Interfaces

Area Repository Location
REACTT UI Future - https://github.com/SyntheticDataPlatform/UIs-Web-REACT
Vue UI https://github.com/SyntheticDataPlatform/UIs-Web-Vue

Enjoy and Happy Coding!!!

Pinned

  1. APIs-Node APIs-Node Public

    Node APIs for accessing Synthetic Data Tier and Platform

    JavaScript

  2. APIs-Quarkus APIs-Quarkus Public

    Quarkus APIs for accessing Synthetic Data Tier and Platform

    Java

  3. APIs-SpringBoot APIs-SpringBoot Public

    SpringBoot APIs for accessing Synthetic Data Tier and Platform

    Java

  4. DataTier-DataLoaders DataTier-DataLoaders Public

    DataLoaders after the Synthetic Data Platform DDL is implemented

    Jupyter Notebook

  5. DataTier-DDLs DataTier-DDLs Public

    All the Needed assets to setup the Synthetic Data Platform's data tier

  6. UIs-Web-REACT UIs-Web-REACT Public

    Web UIs - REACT

    JavaScript

Repositories

Showing 9 of 9 repositories
  • Python Public
    Python 0 Apache-2.0 0 0 0 Updated Jun 6, 2024
  • DataTier-DataLoaders Public

    DataLoaders after the Synthetic Data Platform DDL is implemented

    Jupyter Notebook 0 Apache-2.0 0 0 0 Updated May 6, 2024
  • DataTier-DDLs Public

    All the Needed assets to setup the Synthetic Data Platform's data tier

    0 Apache-2.0 0 0 0 Updated Jan 17, 2024
  • APIs-SpringBoot Public

    SpringBoot APIs for accessing Synthetic Data Tier and Platform

    Java 0 Apache-2.0 0 0 0 Updated Jan 15, 2024
  • APIs-Node Public

    Node APIs for accessing Synthetic Data Tier and Platform

    JavaScript 0 Apache-2.0 0 0 0 Updated Jan 14, 2024
  • APIs-Quarkus Public

    Quarkus APIs for accessing Synthetic Data Tier and Platform

    Java 0 Apache-2.0 0 0 0 Updated Nov 20, 2023
  • UIs-Web-Vue Public

    Web Based User interfaces that leverage the APIs (SpringBoot) to provide UI capabilities

    SCSS 0 Apache-2.0 0 0 0 Updated Nov 20, 2023
  • UIs-Web-REACT Public

    Web UIs - REACT

    JavaScript 0 Apache-2.0 0 0 0 Updated Nov 20, 2023
  • .github Public

    Base Sites - README.md file location

    0 Apache-2.0 0 0 0 Updated Nov 20, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…