Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kv optimization - up to 3x performance on larger context #58

Merged
merged 1 commit into from
Jul 13, 2023

Conversation

cmp-nct
Copy link
Owner

@cmp-nct cmp-nct commented Jul 13, 2023

KV cache is now cyclic split into permuted V variant
The ggml_tensor_print function has been completely reworked to output proper 1-4dim tensors with data.

Example:

+======================+======================+======================+======================+
| :0
| V                                [f32 type]
+----------------------+----------------------+----------------------+----------------------+
| Dimensions           | Strides              | Layer id             | Backend              |
| 3                    | 4x16x1024            | 0                    | CPU                  |
+----------------------+----------------------+----------------------+----------------------+
| Elements             | Src0                 | Src1                 | Operation            |
| 4 x 64 x 2           | 4 x 64 x 2           | N/A                  | CONT                 |
+----------------------+----------------------+----------------------+----------------------+
| Transposed:      No  | Permuted:        No  | Contiguous:      Yes | Size:        0.00 MB |
| Src0 name:           | cache_v (view) (permuted)                                          |
+----------------------+----------------------+----------------------+----------------------+

+-------------------------------------------------------------------------------------------+
| Content of src0 "cache_v (view) (permuted)" (3 dim)

+-------------------------------------------------------------------------------------------+
| Content of src0 "cache_v (view) (permuted)" (3 dim)
| Total Elements : [ Row:4   Col:64  Layer:2   ]
+-------------------------------------------------------------------------------------------+
| Row 1: [0.302  , 0.010  ] [-0.238 , 0.680  ] [0.305  , 0.206  ] [-0.013 , 0.436  ] [-0.074 , -0.698 ] [-0.153 , -0.067 ]
| Row 2: [0.091  , 0.199  ] [0.253  , 0.151  ] [-0.557 , 0.089  ] [0.298  , -0.272 ] [-0.149 , 0.232  ] [-0.217 , 0.193  ]
| Row 3: [-0.085 , -0.014 ] [0.225  , 0.089  ] [-0.338 , 0.072  ] [0.416  , -0.186 ] [-0.071 , 0.110  ] [0.467  , 0.497  ]
| Row 4: [-0.336 , 0.471  ] [-0.144 , 0.070  ] [-0.062 , 0.520  ] [0.093  , 0.217  ] [-0.332 , -0.205 ] [0.012  , 0.335  ]
+-------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------+
| Content of dst "V" (3 dim)

+-------------------------------------------------------------------------------------------+
| Content of dst "V" (3 dim)
| Total Elements : [ Row:4   Col:64  Layer:2   ]
+-------------------------------------------------------------------------------------------+
| Row 1: [0.302  , 0.010  ] [-0.238 , 0.680  ] [0.305  , 0.206  ] [-0.013 , 0.436  ] [-0.074 , -0.698 ] [-0.153 , -0.067 ]
| Row 2: [0.091  , 0.199  ] [0.253  , 0.151  ] [-0.557 , 0.089  ] [0.298  , -0.272 ] [-0.149 , 0.232  ] [-0.217 , 0.193  ]
| Row 3: [-0.085 , -0.014 ] [0.225  , 0.089  ] [-0.338 , 0.072  ] [0.416  , -0.186 ] [-0.071 , 0.110  ] [0.467  , 0.497  ]
| Row 4: [-0.336 , 0.471  ] [-0.144 , 0.070  ] [-0.062 , 0.520  ] [0.093  , 0.217  ] [-0.332 , -0.205 ] [0.012  , 0.335  ]
+-------------------------------------------------------------------------------------------+
+======================+======================+======================+======================+

KV cache is now cyclic split into permuted V variant
The ggml_tensor_print function has been completely reworked to output proper 1-4dim tensors with data.
Example:
```
+======================+======================+======================+======================+
| :0
| V                                [f32 type]
+----------------------+----------------------+----------------------+----------------------+
| Dimensions           | Strides              | Layer id             | Backend              |
| 3                    | 4x16x1024            | 0                    | CPU                  |
+----------------------+----------------------+----------------------+----------------------+
| Elements             | Src0                 | Src1                 | Operation            |
| 4 x 64 x 2           | 4 x 64 x 2           | N/A                  | CONT                 |
+----------------------+----------------------+----------------------+----------------------+
| Transposed:      No  | Permuted:        No  | Contiguous:      Yes | Size:        0.00 MB |
| Src0 name:           | cache_v (view) (permuted)                                          |
+----------------------+----------------------+----------------------+----------------------+

+-------------------------------------------------------------------------------------------+
| Content of src0 "cache_v (view) (permuted)" (3 dim)

+-------------------------------------------------------------------------------------------+
| Content of src0 "cache_v (view) (permuted)" (3 dim)
| Total Elements : [ Row:4   Col:64  Layer:2   ]
+-------------------------------------------------------------------------------------------+
| Row 1: [0.302  , 0.010  ] [-0.238 , 0.680  ] [0.305  , 0.206  ] [-0.013 , 0.436  ] [-0.074 , -0.698 ] [-0.153 , -0.067 ]
| Row 2: [0.091  , 0.199  ] [0.253  , 0.151  ] [-0.557 , 0.089  ] [0.298  , -0.272 ] [-0.149 , 0.232  ] [-0.217 , 0.193  ]
| Row 3: [-0.085 , -0.014 ] [0.225  , 0.089  ] [-0.338 , 0.072  ] [0.416  , -0.186 ] [-0.071 , 0.110  ] [0.467  , 0.497  ]
| Row 4: [-0.336 , 0.471  ] [-0.144 , 0.070  ] [-0.062 , 0.520  ] [0.093  , 0.217  ] [-0.332 , -0.205 ] [0.012  , 0.335  ]
+-------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------+
| Content of dst "V" (3 dim)

+-------------------------------------------------------------------------------------------+
| Content of dst "V" (3 dim)
| Total Elements : [ Row:4   Col:64  Layer:2   ]
+-------------------------------------------------------------------------------------------+
| Row 1: [0.302  , 0.010  ] [-0.238 , 0.680  ] [0.305  , 0.206  ] [-0.013 , 0.436  ] [-0.074 , -0.698 ] [-0.153 , -0.067 ]
| Row 2: [0.091  , 0.199  ] [0.253  , 0.151  ] [-0.557 , 0.089  ] [0.298  , -0.272 ] [-0.149 , 0.232  ] [-0.217 , 0.193  ]
| Row 3: [-0.085 , -0.014 ] [0.225  , 0.089  ] [-0.338 , 0.072  ] [0.416  , -0.186 ] [-0.071 , 0.110  ] [0.467  , 0.497  ]
| Row 4: [-0.336 , 0.471  ] [-0.144 , 0.070  ] [-0.062 , 0.520  ] [0.093  , 0.217  ] [-0.332 , -0.205 ] [0.012  , 0.335  ]
+-------------------------------------------------------------------------------------------+
+======================+======================+======================+======================+
```
@cmp-nct cmp-nct merged commit 2cae279 into master Jul 13, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant