Skip to content

Here we will keep track of the latest AI Engineer Development Tools! πŸ”₯

License

Notifications You must be signed in to change notification settings

Yuan-ManX/AI-Engineer-DevTools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 

Repository files navigation

AI Engineer DevTools

Here we will keep track of the latest AI Engineer Development Tools, including Database, Vector Database, Data Ingest, Data Processing, Data Warehouse, Model Training, Model Zoo, Model Tuning, Model Evaluation, Model Deployment, Model Monitor and AI Agent Framework! πŸ”₯

Table of Contents

Project List

Database

Name Description Code
MongoDB MongoDB: The Developer Data Platform. MongoDB Atlas integrates operational and vector data in a single, unified platform. Use vector representations of your data to perform semantic search. GitHub
MySQL MySQL is the world's most popular open source database. GitHub
PostgreSQL PostgreSQL: The World's Most Advanced Open Source Relational Database. PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings. GitHub

^ Back ^

Vector Database

Name Description Code
Chroma Chroma - the AI-native open-source embedding database. Embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal. All in one place. Retrieval that just works. As it should be. GitHub
LanceDB LanceDB is a developer-friendly, open source database for AI. From hyper scalable vector search and advanced retrieval for RAG, to streaming training data and interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application! GitHub
Milvus A cloud-native vector database, storage for next generation AI applications. Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search. Milvus is an open-source vector database built for GenAI applications. Install with pip, perform high-speed searches, and scale to tens of billions of vectors with minimal performance loss. GitHub
Pinecone Pinecone is the leading vector database for building accurate and performant AI applications at scale in production. Pinecone serverless lets you deliver remarkable GenAI applications faster, at up to 50x lower cost.
Weaviate The AI-native database for a new generation of software. Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​. GitHub

^ Back ^

Data Ingest

Name Description Code
Airbyte Airbyte: Open-Source Data Movement for LLMs. Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases. 20,000+ data and AI professionals manage diverse data across multi-cloud environments with our trusted data movement platform. Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes. GitHub

^ Back ^

Data Processing

Name Description Code
Airflow Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. Anyone with Python knowledge can deploy a workflow. Apache Airflow does not limit the scope of your pipelines; you can use it to build ML models, transfer data. GitHub
Dagster Dagster is a cloud-native data pipeline orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Ship data pipelines quickly and confidently with the modern data orchestrator built for data engineers building data platforms. GitHub
dbt dbt Labs: Transform Data in Your Warehouse. dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. GitHub

^ Back ^

Data Warehouse

Name Description Code
Databricks Databricks, Inc. is a global data, analytics, and artificial intelligence (AI) company, founded in 2013 by the original creators of Apache Spark. The Databricks Platform is the world's first data intelligence platform powered by generative AI. Infuse AI into every facet of your business. GitHub
Snowflake The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI. Snowflake enables organizations to learn, build, and connect with their data-driven peers. Collaborate, build data apps & power diverse workloads in the AI. GitHub

^ Back ^

Model Training

Name Description Code
DeepSpeed DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. DeepSpeed empowers ChatGPT-like model training with a single click, offering 15x speedup over SOTA RLHF systems with unprecedented cost reduction at all scales. GitHub
Megatron-LM Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. Ongoing research training transformer models at scale. Megatron-LM serves as a research-oriented framework leveraging Megatron-Core for large language model (LLM) training. Megatron-Core, on the other hand, is a library of GPU optimized training techniques that comes with formal product support including versioned APIs and regular releases. GitHub
PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. PyTorch is a Python package that provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration; Deep neural networks built on a tape-based autograd system. Tensors and Dynamic neural networks in Python with strong GPU acceleration. GitHub
TensorFlow TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications. GitHub

^ Back ^

Model Zoo

Name Description Code
MLflow MLflow: A Machine Learning Lifecycle Platform. Build better models and generative AI apps on a unified, end-to-end, open source MLOps platform. MLflow is an open-source platform, purpose-built to assist machine learning practitioners and teams in handling the complexities of the machine learning process. MLflow focuses on the full lifecycle for machine learning projects, ensuring that each phase is manageable, traceable, and reproducible. GitHub
Weights & Biases The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production. Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production machine learning models. GitHub

^ Back ^

Model Tuning

Name Description Code
Optuna Optuna: A hyperparameter optimization framework. Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters. GitHub
Ray Tune Ray Tune: Hyperparameter Tuning. Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. You can tune your favorite machine learning framework (PyTorch, XGBoost, TensorFlow and Keras, and more) by running state of the art algorithms such as Population Based Training (PBT) and HyperBand/ASHA. Ray Tune further integrates with a wide range of additional hyperparameter optimization tools, including Ax, BayesOpt, BOHB, Nevergrad, and Optuna. GitHub

^ Back ^

Model Evaluation

Name Description Code
DeepEval DeepEval: The LLM Evaluation Framework. DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation. GitHub
Giskard Giskard: Open-Source Evaluation & Testing for AI & LLM systems. Giskard is an open-source Python library that automatically detects performance, bias & security issues in AI applications. The library covers LLM-based applications such as RAG agents, all the way to traditional ML models for tabular data. GitHub

^ Back ^

Model Deployment

Name Description Code
KTransformers KTransformers: A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations. KTransformers, pronounced as Quick Transformers, is designed to enhance your πŸ€— Transformers experience with advanced kernel optimizations and placement/parallelism strategies. KTransformers is a flexible, Python-centric framework designed with extensibility at its core. By implementing and injecting an optimized module with a single line of code, users gain access to a Transformers-compatible interface, RESTful APIs compliant with OpenAI and Ollama, and even a simplified ChatGPT-like web UI. GitHub
ONNX ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers. Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. Currently we focus on the capabilities needed for inferencing (scoring). GitHub
Triton Inference Server Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton Inference Server supports inference across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Triton inference Server is part of NVIDIA AI Enterprise, a software platform that accelerates the data science pipeline and streamlines the development and deployment of production AI. GitHub
vLLM vLLM is a fast and easy-to-use library for LLM inference and serving. A high-throughput and memory-efficient inference and serving engine for LLMs. GitHub

^ Back ^

Model Monitor

Name Description Code
Arize Arize is the single platform built to help you accelerate development of AI apps and agents – then perfect them in production. A machine learning observability platform that provides real-time monitoring and explainability to help you understand how your models are performing. GitHub
Langfuse Langfuse is an open source LLM engineering platform. Traces, evals, prompt management and metrics to debug and improve your LLM application. It helps teams collaboratively develop, monitor, evaluate, and debug AI applications. Langfuse can be self-hosted in minutes and is battle-tested. GitHub

^ Back ^

AI Agent Framework

Name Description Code
CrewAI The Leading Multi-Agent Platform. Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. GitHub
LangChain LangChain is a composable framework to build with LLMs. LangGraph is the orchestration framework for controllable agentic workflows. πŸ¦œπŸ”— Build context-aware reasoning applications. Get your LLM application from prototype to production. GitHub
LlamaIndex LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. Build production agents that can find information, synthesize insights, generate reports, and take actions over the most complex enterprise data. GitHub

^ Back ^

Releases

No releases published

Packages

No packages published