Add Qwen support #850

chenht2026 · 2023-12-29T03:10:05Z

It works.
Qwen's tokenizer is based on tiktoken, I add the tokenizer(tokenization_qwen.py) from its huggingface repo without any revision. This make the code a little complicate, so maybe do not merge.

May be some one needs it.

Closes #840

lantiga · 2024-01-08T20:24:05Z

Thank you @chenht2026, sorry for the wait. A few of us took a break :-)

lantiga

Thanks for the contributions! One comment on the config flag

lantiga · 2024-01-08T20:30:32Z

lit_gpt/config.py

@@ -27,6 +27,8 @@ class Config:
    rotary_percentage: float = 0.25
    parallel_residual: bool = True
    bias: bool = True
+    # just for Qwen
+    is_Qwen: Optional[bool] = None


I think we should avoid this in favor of something that characterizes Qwen, like having bias only in c_attn.
For the time being we could rename this as attn_bias, and then in the future turn bias into a Option[bool, List[str]] if there's a need for it.

carmocca

Can you also add a tests/test_model.py test?

carmocca · 2024-01-09T18:03:44Z

lit_gpt/tokenization_qwen.py

+# Copyright (c) Alibaba Cloud.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.


If you didn't write the code from this file (which I assume you were not since you did add this license), you should link it to the original source. Is the original from PaddlePaddle? https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/qwen/tokenizer.py

I would advice that you create a version that only implements that few methods required by tokenizer.py

I just copy it from their huggingface repo. tokenization_qwen.py

carmocca · 2024-01-09T18:06:36Z

lit_gpt/tokenization_qwen.py

+from typing import Collection, Dict, List, Set, Tuple, Union
+
+import tiktoken
+from transformers import PreTrainedTokenizer, AddedToken


This project doesn't have transformers as a dependency, so this import is not possible

carmocca · 2024-01-09T18:09:22Z

lit_gpt/tokenizer.py

@@ -91,6 +95,8 @@ def encode(
            tokens = self.processor.encode(string).ids
        elif self.backend == "sentencepiece":
            tokens = self.processor.encode(string)
+        elif self.backend == "tiktoken":
+            tokens = self.processor.encode(string)


It doesn't seem like the new processor implements this method. Also, what about decoding?

carmocca · 2024-01-09T18:11:00Z

tutorials/download_Qwen.md

It would be good mention qwen's recommended languages.

Chinese and English. I'll add it.

samuelazran · 2024-02-10T21:09:48Z

It works. Qwen's tokenizer is based on tiktoken, I add the tokenizer(tokenization_qwen.py) from its huggingface repo without any revision. This make the code a little complicate, so maybe do not merge.

May be some one needs it.

Closes #840

Would it work with Qwen 2 (Qwen/Qwen1.5-7B-Chat)? if not what needs to be added?
Qwen1.5 improve a lot on its predecessor in performance.
https://huggingface.co/Qwen/Qwen1.5-7B-Chat/tree/main

rasbt · 2024-07-08T18:20:55Z

Hey just pinging to see if you are still interested in pursuing this PR. Personally, I think it'd be awesome to support the Qwen models (1.5 and especially 2) in LitGPT. There have been some improvements in the tokenizer in LitGPT recently that could now make this more easily possible.

Btw if rebasing here based on the main branch (which changed a lot) is too messy, you could also just open a fresh PR.

chenht2026 requested review from awaelchli, carmocca and lantiga as code owners December 29, 2023 03:10

chenht2026 force-pushed the main branch 3 times, most recently from 4823c6a to 2fe1ec3 Compare January 2, 2024 09:15

init Qwen

4d5ee86

chenht2026 force-pushed the main branch from 2fe1ec3 to 4d5ee86 Compare January 2, 2024 09:18

Merge branch 'main' into main

ea1c006

lantiga reviewed Jan 8, 2024

View reviewed changes

lantiga added the enhancement New feature or request label Jan 8, 2024

carmocca reviewed Jan 9, 2024

View reviewed changes

This was referenced Jan 13, 2024

BiasMap: individual bias for each module #878

Closed

Can you support the chatglm3-6b model? #884

Open

carmocca marked this pull request as draft January 18, 2024 23:36

Update README.md

91251df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen support #850

Add Qwen support #850

chenht2026 commented Dec 29, 2023 •

edited

Loading

lantiga commented Jan 8, 2024

lantiga left a comment

lantiga Jan 8, 2024

carmocca left a comment

carmocca Jan 9, 2024

chenht2026 Jan 12, 2024

carmocca Jan 9, 2024

carmocca Jan 9, 2024

carmocca Jan 9, 2024

chenht2026 Jan 12, 2024

samuelazran commented Feb 10, 2024

rasbt commented Jul 8, 2024

Add Qwen support #850

Are you sure you want to change the base?

Add Qwen support #850

Conversation

chenht2026 commented Dec 29, 2023 • edited Loading

lantiga commented Jan 8, 2024

lantiga left a comment

Choose a reason for hiding this comment

lantiga Jan 8, 2024

Choose a reason for hiding this comment

carmocca left a comment

Choose a reason for hiding this comment

carmocca Jan 9, 2024

Choose a reason for hiding this comment

chenht2026 Jan 12, 2024

Choose a reason for hiding this comment

carmocca Jan 9, 2024

Choose a reason for hiding this comment

carmocca Jan 9, 2024

Choose a reason for hiding this comment

carmocca Jan 9, 2024

Choose a reason for hiding this comment

chenht2026 Jan 12, 2024

Choose a reason for hiding this comment

samuelazran commented Feb 10, 2024

rasbt commented Jul 8, 2024

chenht2026 commented Dec 29, 2023 •

edited

Loading