{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":643793540,"defaultBranch":"main","name":"Awesome-Efficient-LLM","ownerLogin":"horseee","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-05-22T07:07:49.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/22924514?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1684945695.006468","currentOid":""},"activityList":{"items":[{"before":"280c30f2483090f9fac1221c0832166c5cff967d","after":"9f8064b6a6390c23a5623e5440ad399f6505ae73","ref":"refs/heads/main","pushedAt":"2024-06-12T04:29:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"Update README.md","shortMessageHtmlLink":"Update README.md"}},{"before":"de5da3b30c50f915a425091e68e725012f56ff90","after":"280c30f2483090f9fac1221c0832166c5cff967d","ref":"refs/heads/main","pushedAt":"2024-06-12T04:26:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations & When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models & Block Transformer: Global-to-Local Language Modeling for Fast Inference & Loki: Low-Rank Keys for Efficient Sparse Attention","shortMessageHtmlLink":"[ADD] MoreauPruner: Robust Pruning of Large Language Models against W…"}},{"before":"991606266dfe417b2883c9ae8c5e872fa2bd2a14","after":"de5da3b30c50f915a425091e68e725012f56ff90","ref":"refs/heads/main","pushedAt":"2024-06-12T03:59:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism & Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning & QuickLLaMA: Query-aware Inference Acceleration for Large Language Models","shortMessageHtmlLink":"[ADD] Speculative Decoding via Early-exiting for Faster LLM Inference…"}},{"before":"b4cedd3ff55fe77993ef053b70586fbcee2ded26","after":"991606266dfe417b2883c9ae8c5e872fa2bd2a14","ref":"refs/heads/main","pushedAt":"2024-06-11T10:42:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models & Adversarial Moment-Matching Distillation of Large Language Models & Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity","shortMessageHtmlLink":"[ADD] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for …"}},{"before":"b8e3e271510d8d74c7340d7b9db9f55f4b69f51f","after":"b4cedd3ff55fe77993ef053b70586fbcee2ded26","ref":"refs/heads/main","pushedAt":"2024-06-11T10:17:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[UPDATE] title and figure for Exploring and Improving Drafts in Blockwise Parallel Decoding","shortMessageHtmlLink":"[UPDATE] title and figure for Exploring and Improving Drafts in Block…"}},{"before":"93bc41549a4519abadf4f250f1a0df30aeb9d8ef","after":"b8e3e271510d8d74c7340d7b9db9f55f4b69f51f","ref":"refs/heads/main","pushedAt":"2024-06-11T10:06:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning & Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters & QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead","shortMessageHtmlLink":"[ADD] VTrans: Accelerating Transformer Compression with Variational I…"}},{"before":"c743f277c1887eb0e88b8ca55feadda8c8406487","after":"93bc41549a4519abadf4f250f1a0df30aeb9d8ef","ref":"refs/heads/main","pushedAt":"2024-06-11T09:12:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Low-Rank Quantization-Aware Training for LLMs & PowerInfer-2: Fast Large Language Model Inference on a Smartphone & ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization","shortMessageHtmlLink":"[ADD] Low-Rank Quantization-Aware Training for LLMs & PowerInfer-2: F…"}},{"before":"4c0f0b7beab8a218e8cc9743ac90bac34a25a135","after":"c743f277c1887eb0e88b8ca55feadda8c8406487","ref":"refs/heads/main","pushedAt":"2024-06-05T11:49:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[Update] Main ReadMe","shortMessageHtmlLink":"[Update] Main ReadMe"}},{"before":"b06097f165daedac36bf1b0ca48397c97251de58","after":"4c0f0b7beab8a218e8cc9743ac90bac34a25a135","ref":"refs/heads/main","pushedAt":"2024-06-05T11:49:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[Update] Main ReadMe","shortMessageHtmlLink":"[Update] Main ReadMe"}},{"before":"3565cae063d283df374a6df9f13b57edb29b3bac","after":"b06097f165daedac36bf1b0ca48397c97251de58","ref":"refs/heads/main","pushedAt":"2024-06-05T11:47:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] LCQ: Low-Rank Codebook based Quantization for Large Language Models' & Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs","shortMessageHtmlLink":"[ADD] LCQ: Low-Rank Codebook based Quantization for Large Language Mo…"}},{"before":"ef201e5c6eb15f652beb741a894eccfd79bfb14a","after":"3565cae063d283df374a6df9f13b57edb29b3bac","ref":"refs/heads/main","pushedAt":"2024-06-05T11:41:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Effective Interplay between Sparsity and Quantization: From Theory to Practice & Large Language Model Pruning'","shortMessageHtmlLink":"[ADD] Effective Interplay between Sparsity and Quantization: From The…"}},{"before":"072409146a6eda5e7af4767beaac882dcece32fe","after":"ef201e5c6eb15f652beb741a894eccfd79bfb14a","ref":"refs/heads/main","pushedAt":"2024-06-05T11:35:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Demystifying the Compression of Mixture-of-Experts Through a Unified Framework & LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning & MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization","shortMessageHtmlLink":"[ADD] Demystifying the Compression of Mixture-of-Experts Through a Un…"}},{"before":"731d143c8fd504a1d0f4f8e9f44f5cbb9e9a332e","after":"072409146a6eda5e7af4767beaac882dcece32fe","ref":"refs/heads/main","pushedAt":"2024-05-31T09:25:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Faster Cascades via Speculative Decoding & MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models","shortMessageHtmlLink":"[ADD] Faster Cascades via Speculative Decoding & MoNDE: Mixture of Ne…"}},{"before":"3c9843887e369df39c5f96851368f632b5d47471","after":"731d143c8fd504a1d0f4f8e9f44f5cbb9e9a332e","ref":"refs/heads/main","pushedAt":"2024-05-31T09:20:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Parrot: Efficient Serving of LLM-based Applications with Semantic Variable & Compressing Large Language Models using Low Rank and Low Precision Decomposition & Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference","shortMessageHtmlLink":"[ADD] Parrot: Efficient Serving of LLM-based Applications with Semant…"}},{"before":"c3ee29535eb749b468ebfe504b39f29a6e3c01fe","after":"3c9843887e369df39c5f96851368f632b5d47471","ref":"refs/heads/main","pushedAt":"2024-05-31T09:05:54.000Z","pushType":"pr_merge","commitsCount":2,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"Merge pull request #20 from compressionOrg/main\n\nUpdate pruning md","shortMessageHtmlLink":"Merge pull request #20 from compressionOrg/main"}},{"before":"57ab920277d2ab12a9e84dfaef0f7b4346082ce6","after":"c3ee29535eb749b468ebfe504b39f29a6e3c01fe","ref":"refs/heads/main","pushedAt":"2024-05-29T07:23:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"Update README.md","shortMessageHtmlLink":"Update README.md"}},{"before":"0a9714bbcdbbb5998a27c9973fba37e089b91bf3","after":"57ab920277d2ab12a9e84dfaef0f7b4346082ce6","ref":"refs/heads/main","pushedAt":"2024-05-29T06:38:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models","shortMessageHtmlLink":"[ADD] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficie…"}},{"before":"f111c5141cde928538b5b2150496435f77cfaeaf","after":"0a9714bbcdbbb5998a27c9973fba37e089b91bf3","ref":"refs/heads/main","pushedAt":"2024-05-29T06:36:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification & I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models & SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models","shortMessageHtmlLink":"[ADD] ZipCache: Accurate and Efficient KV Cache Quantization with Sal…"}},{"before":"7c0a3b45dd7a5e7edac563e87b640dc26d42ba37","after":"f111c5141cde928538b5b2150496435f77cfaeaf","ref":"refs/heads/main","pushedAt":"2024-05-29T06:27:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models & PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression & Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs","shortMessageHtmlLink":"[ADD] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Larg…"}},{"before":"0299a111584e7ea1f196478c49652d39eb82e7fc","after":"7c0a3b45dd7a5e7edac563e87b640dc26d42ba37","ref":"refs/heads/main","pushedAt":"2024-05-29T06:11:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts & SpinQuant -- LLM quantization with learned rotations & SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs","shortMessageHtmlLink":"[ADD] A Provably Effective Method for Pruning Experts in Fine-tuned S…"}},{"before":"af99ac0891e5a064d08f7eb630dd284a2de1f298","after":"0299a111584e7ea1f196478c49652d39eb82e7fc","ref":"refs/heads/main","pushedAt":"2024-05-29T05:58:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[UPDATE] layout for pruning.md and low_rank_decomposition.md","shortMessageHtmlLink":"[UPDATE] layout for pruning.md and low_rank_decomposition.md"}},{"before":"876385aa343b1c4fcc3dcc7078bd8ace4b84c031","after":"af99ac0891e5a064d08f7eb630dd284a2de1f298","ref":"refs/heads/main","pushedAt":"2024-05-29T05:53:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"Add the section efficient_architecture_llm. add figure","shortMessageHtmlLink":"Add the section efficient_architecture_llm. add figure"}},{"before":"535567cc7d157bda9f8aeedb2c09324c241695c7","after":"876385aa343b1c4fcc3dcc7078bd8ace4b84c031","ref":"refs/heads/main","pushedAt":"2024-05-29T05:50:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[***UPDATE***] Reorganize th list. Add three papers: FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models & Exploiting LLM Quantization & CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs","shortMessageHtmlLink":"[***UPDATE***] Reorganize th list. Add three papers: FinerCut: Finer-…"}},{"before":"838e59b37abb80803e57a8d707567ee3ae4ea1c1","after":"535567cc7d157bda9f8aeedb2c09324c241695c7","ref":"refs/heads/main","pushedAt":"2024-05-26T10:02:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning & SirLLM: Streaming Infinite Retentive LLM","shortMessageHtmlLink":"[ADD] OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Mult…"}},{"before":"a4317f6487ca279f041876d3721864b2d82e6768","after":"838e59b37abb80803e57a8d707567ee3ae4ea1c1","ref":"refs/heads/main","pushedAt":"2024-05-26T09:44:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Language-Specific Pruning for Efficient Reduction of Large Language Models & Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization & Surgical Feature-Space Decomposition of LLMs: Why, When and How?","shortMessageHtmlLink":"[ADD] Language-Specific Pruning for Efficient Reduction of Large Lang…"}},{"before":"cc4b89b3da4b0076949abdbd84dda9b0340aba58","after":"a4317f6487ca279f041876d3721864b2d82e6768","ref":"refs/heads/main","pushedAt":"2024-05-26T09:35:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Distributed Speculative Inference of Large Language Models & RDRec: Rationale Distillation for LLM-based Recommendation & A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models","shortMessageHtmlLink":"[ADD] Distributed Speculative Inference of Large Language Models & RD…"}},{"before":"d1a9987c8a182512dd863d98f18e6123dbb91e96","after":"cc4b89b3da4b0076949abdbd84dda9b0340aba58","ref":"refs/heads/main","pushedAt":"2024-05-26T06:59:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning & Reducing Transformer Key-Value Cache Size with Cross-Layer Attention & Layer-Condensed KV Cache for Efficient Inference of Large Language Models","shortMessageHtmlLink":"[ADD] Distilling Instruction-following Abilities of Large Language Mo…"}},{"before":"d3021ee97e406f533525d66b6f2b68cb06c886fc","after":"d1a9987c8a182512dd863d98f18e6123dbb91e96","ref":"refs/heads/main","pushedAt":"2024-05-26T06:52:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"Add a new section KV Cache Compression. Add MiniCache: KV Cache Compression in Depth Dimension for Large Language Models & Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression & PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference","shortMessageHtmlLink":"Add a new section KV Cache Compression. Add MiniCache: KV Cache Compr…"}},{"before":"ece891721c12f7e0b4fca9cf50c5728bc2d27be6","after":"d3021ee97e406f533525d66b6f2b68cb06c886fc","ref":"refs/heads/main","pushedAt":"2024-05-17T20:11:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models & SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models' & Pruning as a Domain-specific LLM Extractor & EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models","shortMessageHtmlLink":"[ADD] LLM-QBench: A Benchmark Towards the Best Practice for Post-trai…"}},{"before":"df0b19aa3b228eea3ae6f0cd2ffc44aeb9a739a5","after":"ece891721c12f7e0b4fca9cf50c5728bc2d27be6","ref":"refs/heads/main","pushedAt":"2024-05-12T07:01:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"horseee","name":"Horseee","path":"/horseee","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/22924514?s=80&v=4"},"commit":{"message":"[ADD] Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge & Accelerating Speculative Decoding using Dynamic Speculation Length","shortMessageHtmlLink":"[ADD] Clover: Regressive Lightweight Speculative Decoding with Sequen…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEYuI_9gA","startCursor":null,"endCursor":null}},"title":"Activity · horseee/Awesome-Efficient-LLM"}