data-processing-pipelines

Here are 18 public repositories matching this topic...

NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

python data data-processing data-preparation deduplication data-quality data-curation data-prep fine-tuning fast-data-processing data-processing-pipelines datacuration large-language-models llm llmapps large-scale-data-processing datarecipes semantic-deduplication llm-data-quality

Updated Mar 27, 2025
Jupyter Notebook

westandskif / convtools

Star

convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation

python csv-converter csv parsing transformations conversions data-analysis code-generation data-processing-pipelines

Updated Mar 18, 2025
Python

graphbookai / graphbook

Star

Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python.⭐ Leave a star to support us!

workflow data-science machine-learning framework research ai ml pytorch data-processing data-processing-pipelines

Updated Mar 26, 2025
Python

kaburia / filter-stations

Star

Making it easier to navigate and clean TAHMO weather station data for ML development

pypi-package api-development data-processing-pipelines

Updated Sep 7, 2024
Python

tamasgal / thepipe

Star

A simplistic, general purpose pipeline framework.

python data-science pipelines provenance data-processing hacktoberfest data-processing-pipelines

Updated Jul 21, 2022
Python

Plato-solutions / artifician

Star

Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.

python machine-learning artificial-intelligence data-processing dataset-preparation data-processing-pipelines

Updated Jan 30, 2024
Python

99sbr / Predictive-Customer-Analytics

Star

Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers …

customer-lifetime-value customer-life-cycle data-processing-pipelines

Updated Oct 3, 2020
Jupyter Notebook

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

lhotanok / data-engineering

Star

Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering

rdf provenance data-engineering data-processing dcat-ap apache-airflow data-cube skos-rdf airflow-dags data-processing-pipelines

Updated May 23, 2023
TypeScript

Lucky-akash321 / Resume-Application-Tracker-System-ATS-using-Gemini-Pro-Vision

Star

The Resume Application Tracking System uses Google Gemini Pro Vision to automatically parse, analyze, and categorize resumes for efficient recruitment. It integrates AI-driven vision capabilities to enhance resume processing and candidate selection.

python machine-learning natural-language-processing data-visualization data-processing-pipelines ai-based-resume-parsing google-gemini-pro-vision automated-candidate-categorization cloud-based-solutions

Updated Feb 13, 2025
Python

shuq007 / datascience-notebooks

Star

Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL

finance data-science data artificial-intelligence trading-strategies data-processing data-processing-pipelines

Updated Mar 1, 2024
Jupyter Notebook

AIoT-Group-UoP / crossai

Star

An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis

open-source machine-learning deep-learning digital-signal-processing machine-learning-pipelines data-processing-pipelines

Updated Feb 1, 2024
Jupyter Notebook

mehanix / dhrw

Star

🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs

data-science rabbitmq docker-compose data-visualization data-engineering data-analysis help-wanted data-processing data-pipelines computational-graphs data-pipeline data-processing-system good-first-issue react-flow data-processing-pipelines meteorjs-application computational-graph data-processing-and-analysis

Updated Jan 28, 2025
JavaScript

SayamAlt / Bank-Customer-Churn-Prediction-using-PySpark

Star

Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.