Skip to content

Popular repositories

  1. DataGenerator DataGenerator Public

    DataGenerator is a Java library for systematically producing large volumes of data. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among v…

    Java 162 169

  2. herd herd Public

    Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and an…

    Java 134 41

  3. yum-nginx-api yum-nginx-api Public

    yum-nginx-api is a go API for uploading RPMs to yum repositories and configurations for running NGINX to serve them. It is a deployable solution with Docker or a single 8MB statically linked Linux …

    Go 50 23

  4. MegaSparkDiff MegaSparkDiff Public

    A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multipl…

    Scala 49 26

  5. HiveQLUnit HiveQLUnit Public archive

    Test your Hive scripts inside your favorite IDE with HiveQLUnit! Increase your developers productivity by testing on all operating systems including Windows, Linux and Mac OSX. Build continuous int…

    Java 39 13

  6. aphelion aphelion Public

    Aphelion is a web application that captures and visualizes your AWS services usage limits. It continuously collects data in the background and you can visualize the data in easy-to-see graphs and c…

    Java 34 10

Repositories

Showing 10 of 23 repositories
  • herd-mdl Public

    Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.

    Java 15 Apache-2.0 14 9 13 Updated Mar 27, 2024
  • MegaSparkDiff Public

    A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multiple environments enable the user to generate a diff report as a Java/Scala-friendly DataFrame or as a file for future use. Comes with out of the box Spa…

    Scala 49 Apache-2.0 26 17 3 Updated Dec 28, 2023
  • Gatekeeper Public

    Gatekeeper is a self-serviced web application allowing users to make requests for temporary access to EC2 & RDS instances running in AWS and gain access instantly

    Java 28 Apache-2.0 19 10 19 Updated Dec 16, 2023
  • model-validation-toolkit Public

    Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

    Python 29 Apache-2.0 6 1 0 Updated Dec 1, 2023
  • finraos.github.com Public

    FINRA open source projects landing page.

    HTML 8 13 17 0 Updated Oct 27, 2023
  • Fidelius Public

    Fidelius provides an easy-to-use, secure, and organized way to create, view, and modify collections of encrypted secrets in AWS and to manage user/application access to those secrets.

    Java 14 Apache-2.0 14 1 8 Updated Oct 19, 2023
  • maskopy Public

    Automated solution to copy and obfuscate production data to target environments in AWS

    Python 23 Apache-2.0 9 0 1 Updated May 22, 2023
  • MLiy Public

    MLiy (pronounced “Emily”) is a machine-learning platform that allows data scientists to provision and manage processing power in the cloud. It provides an easy-to-use website to install customizable sets of machine learning software for use in data analysis and exploration. This allows data scientists to focus on data analysis rather than how to…

    Shell 10 Apache-2.0 1 3 2 Updated May 22, 2023
  • herd-ui Public

    Herd-UI is a search and discovery tool for business and technical users. Everyone in your organization can use Herd-UI to browse and understand the contents of your Herd managed data lake.

    TypeScript 16 Apache-2.0 10 2 0 Updated Oct 1, 2022
  • herd Public

    Herd is a managed data lake for the cloud. The Herd unified data catalog helps separate storage from compute in the cloud. Manage petabytes of data and make it accessible for data processing and analytical purposes by any cloud compute platform.

    Java 134 Apache-2.0 41 124 2 Updated Oct 1, 2022

Top languages

Loading…

Most used topics

Loading…