Implement cache shifting for Llama models #341

EricLBuehler · 2024-05-22T00:53:13Z

If this works, we can extend it to the other models.

Hopefully, this will fix the problem in #339 for models without sliding window attention.

github-actions · 2024-05-22T00:54:24Z

Code Metrics Report

  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                    5            9            9            0            0
 Python                 21          741          622           21           98
 TOML                   15          390          353            1           36
-------------------------------------------------------------------------------
 Jupyter Notebooks       1            0            0            0            0
 |- Markdown             1           60           30           22            8
 |- Python               1           96           87            1            8
 (Total)                            156          117           23           16
-------------------------------------------------------------------------------
 Markdown               15         1026            0          758          268
 |- BASH                 6          205          192            0           13
 |- Python               6          121          110            0           11
 |- Rust                 3          185          172            9            4
 (Total)                           1537          474          767          296
-------------------------------------------------------------------------------
 Rust                   84        27822        25504          356         1962
 |- Markdown            40          419            0          407           12
 (Total)                          28241        25504          763         1974
===============================================================================
 Total                 144        30464        26882         1136         2446
===============================================================================

Implement cache shifting for llama models

c574026

EricLBuehler mentioned this pull request May 22, 2024

Garbled output on very long prompts #339

Open

EricLBuehler closed this May 25, 2024

EricLBuehler deleted the llama_cache_shifting branch May 25, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cache shifting for Llama models #341

Implement cache shifting for Llama models #341

EricLBuehler commented May 22, 2024

github-actions bot commented May 22, 2024

Implement cache shifting for Llama models #341

Implement cache shifting for Llama models #341

Conversation

EricLBuehler commented May 22, 2024

github-actions bot commented May 22, 2024