Here we will keep track of the latest AI Engineer Development Tools, including Database, Vector Database, Data Ingest, Data Processing, Data Warehouse, Model Training, Model Zoo, Model Tuning, Model Evaluation, Model Deployment, Model Monitor and AI Agent Framework! π₯
- Database
- Vector Database
- Data Ingest
- Data Processing
- Data Warehouse
- Model Training
- Model Zoo
- Model Tuning
- Model Evaluation
- Model Deployment
- Model Monitor
- AI Agent Framework
Name | Description | Code |
---|---|---|
MongoDB | MongoDB: The Developer Data Platform. MongoDB Atlas integrates operational and vector data in a single, unified platform. Use vector representations of your data to perform semantic search. | GitHub |
MySQL | MySQL is the world's most popular open source database. | GitHub |
PostgreSQL | PostgreSQL: The World's Most Advanced Open Source Relational Database. PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings. | GitHub |
Name | Description | Code |
---|---|---|
Chroma | Chroma - the AI-native open-source embedding database. Embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal. All in one place. Retrieval that just works. As it should be. | GitHub |
LanceDB | LanceDB is a developer-friendly, open source database for AI. From hyper scalable vector search and advanced retrieval for RAG, to streaming training data and interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application! | GitHub |
Milvus | A cloud-native vector database, storage for next generation AI applications. Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search. Milvus is an open-source vector database built for GenAI applications. Install with pip, perform high-speed searches, and scale to tens of billions of vectors with minimal performance loss. | GitHub |
Pinecone | Pinecone is the leading vector database for building accurate and performant AI applications at scale in production. Pinecone serverless lets you deliver remarkable GenAI applications faster, at up to 50x lower cost. | |
Weaviate | The AI-native database for a new generation of software. Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseβ. | GitHub |
Name | Description | Code |
---|---|---|
Airbyte | Airbyte: Open-Source Data Movement for LLMs. Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases. 20,000+ data and AI professionals manage diverse data across multi-cloud environments with our trusted data movement platform. Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes. | GitHub |
Name | Description | Code |
---|---|---|
Airflow | Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Anyone with Python knowledge can deploy a workflow. Apache Airflow does not limit the scope of your pipelines; you can use it to build ML models, transfer data. | GitHub |
Dagster | Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Ship data pipelines quickly and confidently with the modern data orchestrator built for data engineers building data platforms. | GitHub |
dbt | dbt Labs: Transform Data in Your Warehouse. dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. | GitHub |
Name | Description | Code |
---|---|---|
Databricks | Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The Databricks Platform is the world's first data intelligence platform powered by generative AI. Infuse AI into every facet of your business. | GitHub |
Snowflake | The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI. Snowflake enables organizations to learn, build, and connect with their data-driven peers. Collaborate, build data apps & power diverse workloads in the AI. | GitHub |
Name | Description | Code |
---|---|---|
DeepSpeed | DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. DeepSpeed empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales. | GitHub |
Megatron-LM | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. Ongoing research training transformer models at scale. Megatron-LM serves as a research-oriented framework leveraging Megatron-Core for large language model (LLM) training. Megatron-Core, on the other hand, is a library of GPU optimized training techniques that comes with formal product support including versioned APIs and regular releases. | GitHub |
PyTorch | PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration; Deep neural networks built on a tape-based autograd system. Tensors and Dynamic neural networks in Python with strong GPU acceleration. | GitHub |
TensorFlow | TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications. | GitHub |
Name | Description | Code |
---|---|---|
MLflow | MLflow: A Machine Learning Lifecycle Platform. Build better models and generative AI apps on a unified, end-to-end, open source MLOps platform. MLflow is an open-source platform, purpose-built to assist machine learning practitioners and teams in handling the complexities of the machine learning process. MLflow focuses on the full lifecycle for machine learning projects, ensuring that each phase is manageable, traceable, and reproducible. | GitHub |
Weights & Biases | The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production. Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production machine learning models. | GitHub |
Name | Description | Code |
---|---|---|
Optuna | Optuna: A hyperparameter optimization framework. Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters. | GitHub |
Ray Tune | Ray Tune: Hyperparameter Tuning. Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. You can tune your favorite machine learning framework (PyTorch, XGBoost, TensorFlow and Keras, and more) by running state of the art algorithms such as Population Based Training (PBT) and HyperBand/ASHA. Ray Tune further integrates with a wide range of additional hyperparameter optimization tools, including Ax, BayesOpt, BOHB, Nevergrad, and Optuna. | GitHub |
Name | Description | Code |
---|---|---|
DeepEval | DeepEval: The LLM Evaluation Framework. DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation. | GitHub |
Giskard | Giskard: Open-Source Evaluation & Testing for AI & LLM systems. Giskard is an open-source Python library that automatically detects performance, bias & security issues in AI applications. The library covers LLM-based applications such as RAG agents, all the way to traditional ML models for tabular data. | GitHub |
Name | Description | Code |
---|---|---|
KTransformers | KTransformers: A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations. KTransformers, pronounced as Quick Transformers, is designed to enhance your π€ Transformers experience with advanced kernel optimizations and placement/parallelism strategies. KTransformers is a flexible, Python-centric framework designed with extensibility at its core. By implementing and injecting an optimized module with a single line of code, users gain access to a Transformers-compatible interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ChatGPT-like web UI. | GitHub |
ONNX | ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers. Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. Currently we focus on the capabilities needed for inferencing (scoring). | GitHub |
Triton Inference Server | Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton Inference Server supports inference across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Triton inference Server is part of NVIDIA AI Enterprise, a software platform that accelerates the data science pipeline and streamlines the development and deployment of production AI. | GitHub |
vLLM | vLLM is a fast and easy-to-use library for LLM inference and serving. A high-throughput and memory-efficient inference and serving engine for LLMs. | GitHub |
Name | Description | Code |
---|---|---|
Arize | Arize is the single platform built to help you accelerate development of AI apps and agents β then perfect them in production. A machine learning observability platform that provides real-time monitoring and explainability to help you understand how your models are performing. | GitHub |
Langfuse | Langfuse is an open source LLM engineering platform. Traces, evals, prompt management and metrics to debug and improve your LLM application. It helps teams collaboratively develop, monitor, evaluate, and debug AI applications. Langfuse can be self-hosted in minutes and is battle-tested. | GitHub |
Name | Description | Code |
---|---|---|
CrewAI | The Leading Multi-Agent Platform. Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. | GitHub |
LangChain | LangChain is a composable framework to build with LLMs. LangGraph is the orchestration framework for controllable agentic workflows. π¦π Build context-aware reasoning applications. Get your LLM application from prototype to production. | GitHub |
LlamaIndex | LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. Build production agents that can find information, synthesize insights, generate reports, and take actions over the most complex enterprise data. | GitHub |