Scalable data pre processing and curation toolkit for LLMs
-
Updated
Mar 27, 2025 - Jupyter Notebook
Scalable data pre processing and curation toolkit for LLMs
convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation
Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python.⭐ Leave a star to support us!
Making it easier to navigate and clean TAHMO weather station data for ML development
A simplistic, general purpose pipeline framework.
Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.
Understanding the customer life cycle Acquiring customer data Applying big data concepts to your customer relationships Finding high propensity prospects Upselling by identifying related products and interests Generating customer loyalty by discovering response patterns Predicting customer lifetime value (CLV) Identifying dissatisfied customers …
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
Homework assignments for MFF UK course NDBI046 - Introduction to Data Engineering
The Resume Application Tracking System uses Google Gemini Pro Vision to automatically parse, analyze, and categorize resumes for efficient recruitment. It integrates AI-driven vision capabilities to enhance resume processing and candidate selection.
Notebooks from finance, general practice and Jovian courses on data analysis, ML and DL
An open-source Python library for processing and developing End-to-End AI pipelines for Time Series Analysis
🎢 IaaS visual editor to create & deploy data processing pipelines - python, rmq, react, meteorjs
Successfully established a machine learning model using PySpark which can accurately classify whether a bank customer will churn or not up to an accuracy of more than 86% on the test set.
Codes for data flow between models, data post-process, and visualization
Dataset
Experimental libraries - Azure Storage, multithreaded Data Processing pipelines, and many more ...
Data Engineering & Software Blog
Add a description, image, and links to the data-processing-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the data-processing-pipelines topic, visit your repo's landing page and select "manage topics."