Falcon support #890

KexinFeng · 2023-06-30T22:57:02Z

The falcon models tiiuae/falcon-40b tiiuae/falcon-7b require special handling of the kv_cache, done in FalconBlock.

# concatenate along seq_length dimension:
# [batch, seq, num_heads*kvDim = 73 * 64]
#  - key: [batch_size * self.num_kv, seq, kvDim]. [2, 6, 64]
#  - value: [batch_size * self.num_kv, seq, kvDim]. [2, 6, 64]

# Falcon
# fused_qkv: [batch, seq, (num_heads=71 + num_kv=1 + num_kv=1) * kvDim]
# query_layer: [batch*num_heads=71,  seq, kvDim]
# key_layer  : [batch*num_kv, seq, kvDim].  num_kv=1
# value_layer: [batch*num_kv, seq, kvDim].  num_kv=1
# hidden_dim = 4544

Note:
The tokenizer of falcon models have bug. It cannot tokenize two strings with different lengths.

KexinFeng added 3 commits June 30, 2023 15:19

FalconBlock

8766d36

fix

94dc7e3

fix

8816b47

KexinFeng requested review from zachgk, frankfliu and a team as code owners June 30, 2023 22:57

KexinFeng requested review from lanking520 and sindhuvahinis June 30, 2023 23:00

KexinFeng added 2 commits June 30, 2023 16:05

tested

dce3995

BlackSamorez/falcon-40b-tiny-testing

f1b45f9

sindhuvahinis approved these changes Jul 1, 2023

View reviewed changes

KexinFeng merged commit bba1578 into deepjavalibrary:master Jul 2, 2023
8 checks passed

KexinFeng added a commit to KexinFeng/djl-serving-forked that referenced this pull request Aug 16, 2023

Falcon support (deepjavalibrary#890)

c38bd5f

KexinFeng added a commit to KexinFeng/djl-serving-forked that referenced this pull request Aug 16, 2023

Falcon support (deepjavalibrary#890)

9633e14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon support #890

Falcon support #890

KexinFeng commented Jun 30, 2023

Falcon support #890

Falcon support #890

Conversation

KexinFeng commented Jun 30, 2023