Low-code framework for building custom LLMs, neural networks, and other AI models
-
Updated
Jun 1, 2024 - Python
Low-code framework for building custom LLMs, neural networks, and other AI models
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
Rust implementation of the Data Distribution Service (DDS)
ndn-hydra: A Python-coded NDN distributed repository with five focused attributes: resiliency, scalability, usability, efficiency, and security.
Simulator framework for analysis of performance, energy consumption, area and cost of multi-node multi-chiplet tile-based manycore designs
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
A curated, but incomplete, list of data-centric AI resources.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
A Data Centric NER annotation tool for your Named Entity Recognition projects
Sample notebooks that use the Openlayer Python API
Quickly set up an image labelling web application for manually tagging images for machine learning tasks.
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"
[ICLR'23] Implementation of "Empowering Graph Representation Learning with Test-Time Graph Transformation"
Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data (NeurIPS 2022)
From local functions to cloud deployed pipelines
Data-SUITE: Data-centric identification of in-distribution incongruous examples (ICML 2022)
Data-centric, cross-platform, multi-language core services library for C++, C#, Python, and Java. This repository includes all languages. Each language also has its own repository, e.g. datacentric-cpp.
Data-centric core services library in C#. For the version supporting multiple languages, see datacentric repo.
Python and Data Centric Development: A full-stack site that allows users to add, edit, delete and search hiking trails in the Province of Andalucia, Spain. They can also upload photos and maps showing their trails. Each route will provide: A title, Address of the trail , Difficulty level, Description, Directions , Photos, Maps
DOMA Skeleton - Document and Setup a DOMA Repository - Clone Me!
Add a description, image, and links to the data-centric topic page so that developers can more easily learn about it.
To associate your repository with the data-centric topic, visit your repo's landing page and select "manage topics."