Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phi3-mini #1461

Open
kyakuno opened this issue Apr 24, 2024 · 12 comments
Open

Add phi3-mini #1461

kyakuno opened this issue Apr 24, 2024 · 12 comments
Assignees

Comments

@kyakuno
Copy link
Collaborator

kyakuno commented Apr 24, 2024

MicrosoftのminiサイズのLLM。
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

公式でonnxが提供されるかも。
https://onnxruntime.ai/blogs/accelerating-phi-3

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

公式でonnxが提供された。
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

generate apiはpythonで書く必要がある。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

推論コードの例。
microsoft/onnxruntime#20448

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

onnxruntimeのベータ版であれば下記で動く。

import onnxruntime_genai as og
import argparse
import time

model = og.Model(".\Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()


def input_llm(text):
    print("Question:",text)
    input_tokens = tokenizer.encode(text)
    params = og.GeneratorParams(model)
    params.try_use_cuda_graph_with_max_batch_size(1)
    params.input_ids = input_tokens
    generator = og.Generator(model, params)
    return generator

def output_llm(generator):
    print("Answer:")
    stt = time.time()
    list_error = []
    list_sentence = []
    while not generator.is_done():
        generator.compute_logits()
        generator.generate_next_token()
        new_token = generator.get_next_tokens()[0]
        if not new_token in list_error:
            try:
                list_sentence.append(tokenizer_stream.decode(new_token))
            except:
                list_error.append(new_token)
                list_sentence.append(new_token)
    print(list_sentence)
    fin = time.time()
    print(fin-stt)
    return list_error

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

onnxruntime_genaiのコード。
https://github.com/microsoft/onnxruntime-genai

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

generateはC++で書かれているので、Pytorch向けの実装を持ってきた方が良さそう。

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

とりあえずtokenizerはtransformersを使うと良さそう。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

messages = [
    {"role": "system", "content": "You are a helpful digital assistant. Please provide safe, ethical and accurate information to the user."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

@kyakuno
Copy link
Collaborator Author

kyakuno commented Apr 24, 2024

SentencePieceの一般的なTokenizerに見える。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant