Stars
Measure and optimize the energy consumption of your AI applications!
Large Language Model (LLM) Systems Paper List
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
FedScale is a scalable and extensible open-source federated learning (FL) platform.
Aequitas enables RPC-level QoS in datacenter networks.
Hydra adds resilience and high availability to remote memory solutions.
Justitia provides RDMA isolation between applications with diverse requirements.
Oort: Efficient Federated Learning via Guided Participant Selection
A Generic Resource-Aware Hyperparameter Tuning Execution Engine
Prefetching and efficient data path for memory disaggregation
A Federated Execution Engine for Fast Distributed Computation Over Slow Networks
Fine-grained GPU sharing primitives
Tiresias is a GPU cluster manager for distributed deep learning training.
📚 👓 A collection of research papers, codes, tutorials and blogs on Federated Computing/Learning.
Infiniswap enables unmodified applications to efficiently use disaggregated memory.
mosharaf / sinbad
Forked from facebookarchive/hadoop-20Facebook's Realtime Distributed FS based on Apache Hadoop 0.20-append
Facebook's Realtime Distributed FS based on Apache Hadoop 0.20-append