Skip to content
Change the repository type filter

All

    Repositories list

    • pymetagen

      Public
      Metadata Generator
      Python
      MIT License
      0010Updated Jun 11, 2025Jun 11, 2025
    • Java
      0001Updated Jun 7, 2025Jun 7, 2025
    • TomlParams: TOML-based parameter files made better
      Python
      MIT License
      0940Updated Jun 3, 2025Jun 3, 2025
    • This repository is part of the Knowledge Transfer Partnership (KTP) between Nottingham Trent University (NTU) and Bigspark. The aim of this project is to address data quality issues in large datasets specifically in Finance using advanced techniques for error detection, error correction, duplicate detection, and beyond.
      Python
      MIT License
      0000Updated Feb 21, 2025Feb 21, 2025
    • Python
      Apache License 2.0
      1000Updated Dec 16, 2024Dec 16, 2024
    • Formats docstrings to follow PEP 257
      Python
      MIT License
      82000Updated Dec 3, 2024Dec 3, 2024
    • Patterns and concepts for building resilient data pipelines in Python and Scala
      0500Updated Aug 27, 2024Aug 27, 2024
    • Repository for PII Anonymizer code package and sample FastAPI API to use it to talk to LLM
      Jupyter Notebook
      0000Updated Jun 21, 2024Jun 21, 2024
    • nuxtjs-template

      Public template
      JavaScript
      0000Updated Jan 21, 2024Jan 21, 2024
    • sso-sync tool to help with the SCIM setup for bigspark.
      Go
      Apache License 2.0
      0000Updated Oct 26, 2023Oct 26, 2023
    • To test glue job
      Python
      2000Updated Aug 1, 2023Aug 1, 2023
    • test_glue

      Public
      To test glue job
      Python
      2000Updated Jul 14, 2023Jul 14, 2023
    • General Purpose repo for NW AI Hackathon 2023
      Jupyter Notebook
      Apache License 2.0
      2001Updated Apr 20, 2023Apr 20, 2023
    • A streamsets dc sample processor for validation records with a specified JSON schema
      Java
      Apache License 2.0
      0101Updated Apr 14, 2023Apr 14, 2023
    • Shell
      0000Updated Dec 1, 2022Dec 1, 2022
    • vcs_demoo

      Public
      0000Updated Sep 12, 2022Sep 12, 2022
    • Mirror of Apache livy (Incubating)
      Scala
      Apache License 2.0
      610000Updated Jul 22, 2022Jul 22, 2022
    • 1000Updated Jun 30, 2022Jun 30, 2022
    • barcode-server

      Public archive
      Java
      0000Updated Apr 20, 2022Apr 20, 2022
    • Shell
      MIT No Attribution
      0000Updated Mar 25, 2022Mar 25, 2022
    • Data profiling example using Snowflake sample datasets and Scala
      Jupyter Notebook
      Apache License 2.0
      0000Updated Feb 15, 2022Feb 15, 2022
    • This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task and stage metrics data.
      Scala
      Apache License 2.0
      151000Updated Feb 1, 2022Feb 1, 2022
    • Shell
      4000Updated Feb 1, 2022Feb 1, 2022
    • JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
      Java
      Other
      342000Updated Dec 2, 2021Dec 2, 2021
    • emr-uber-profiler-notebooks
      0000Updated Nov 19, 2021Nov 19, 2021
    • Java
      21000Updated Sep 9, 2021Sep 9, 2021
    • deequ

      Public
      Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
      Scala
      Apache License 2.0
      555000Updated Jul 8, 2021Jul 8, 2021
    • tutorials

      Public
      StreamSets Tutorials
      Java
      Apache License 2.0
      192000Updated Apr 19, 2021Apr 19, 2021
    • Java
      0000Updated Mar 16, 2021Mar 16, 2021
    • JavaScript
      Other
      1000Updated Jan 10, 2021Jan 10, 2021