New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for "mistralai/Mistral-7B-Instruct-v0.1" model #1501
Comments
|
Just use the llama converter. |
|
Most people using Mistral will be using it for RAG, meaning it'll probably break without the sliding window attention. |
|
RAG ? |
|
Retrieval augmented generation, as in creating a vector database and querying it for results, then appending those results to a user's query that are both sent to an LLM for an answer. It lets one ask for an answer from an LLM on specific information that is after a model's knowledge cutoff date, for example. Very powerful. |
|
and what, is the common usage of this with seq length higher than 4096 ? |
|
You can certainly do RAG decently under 4096 but typically, the point of RAG is to make use of as much context as possible. |
|
but again, the sliding window is only for the attention mask. it does mean that it will "break". |
|
You are right, I misunderstood their article. My apologies. |
What would be the command to use llama convertor for Mistral? |
|
I've uploaded the converted model to Hugging Face. See here. |
When I do this It outputs Maybe need to change model type too or what? |
|
did you try to change here: https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1197 |
|
@winstxnhdw possible to share how u did the conversion? i am getting the same error |
I solved it. I went to https://github.com/OpenNMT/CTranslate2/blob/master/python/ctranslate2/converters/transformers.py#L1197 copied llama_loader, created a new function and registered MistralConfig with the new function. Basically copy llama loader and register mistral config |
|
Just a nice reminder, this will behave 100% as Mistral as long as the the sequence length is <=4096 tokens. |
When will ctranslate2 support SWA? |
Can you please post your code for me instead of a picture of it?? |
@register_loader("MistralConfig")
class MistralLoader(ModelLoader):
@property
def architecture_name(self):
return "MistralForCausalLM"
def get_model_spec(self, model):
num_layers = model.config.num_hidden_layers
num_heads = model.config.num_attention_heads
num_heads_kv = getattr(model.config, "num_key_value_heads", num_heads)
if num_heads_kv == num_heads:
num_heads_kv = None
spec = transformer_spec.TransformerDecoderModelSpec.from_config(
num_layers,
num_heads,
activation=common_spec.Activation.SWISH,
pre_norm=True,
ffn_glu=True,
rms_norm=True,
rotary_dim=0,
rotary_interleave=False,
num_heads_kv=num_heads_kv,
)
self.set_decoder(spec.decoder, model.model)
self.set_linear(spec.decoder.projection, model.lm_head)
return spec
def get_vocabulary(self, model, tokenizer):
tokens = super().get_vocabulary(model, tokenizer)
extra_ids = model.config.vocab_size - len(tokens)
for i in range(extra_ids):
tokens.append("<extra_id_%d>" % i)
return tokens
def set_vocabulary(self, spec, tokens):
spec.register_vocabulary(tokens)
def set_config(self, config, model, tokenizer):
config.bos_token = tokenizer.bos_token
config.eos_token = tokenizer.eos_token
config.unk_token = tokenizer.unk_token
config.layer_norm_epsilon = model.config.rms_norm_eps
def set_layer_norm(self, spec, layer_norm):
spec.gamma = layer_norm.weight
def set_decoder(self, spec, module):
spec.scale_embeddings = False
self.set_embeddings(spec.embeddings, module.embed_tokens)
self.set_layer_norm(spec.layer_norm, module.norm)
for layer_spec, layer in zip(spec.layer, module.layers):
self.set_layer_norm(
layer_spec.self_attention.layer_norm, layer.input_layernorm
)
self.set_layer_norm(
layer_spec.ffn.layer_norm, layer.post_attention_layernorm
)
wq = layer.self_attn.q_proj.weight
wk = layer.self_attn.k_proj.weight
wv = layer.self_attn.v_proj.weight
wo = layer.self_attn.o_proj.weight
layer_spec.self_attention.linear[0].weight = torch.cat([wq, wk, wv])
layer_spec.self_attention.linear[1].weight = wo
self.set_linear(layer_spec.ffn.linear_0, layer.mlp.gate_proj)
self.set_linear(layer_spec.ffn.linear_0_noact, layer.mlp.up_proj)
self.set_linear(layer_spec.ffn.linear_1, layer.mlp.down_proj)
delattr(layer, "self_attn")
delattr(layer, "mlp")
gc.collect()Here's a snippet which I succesfully conducted the convertion. Not sure if it's good to send out a PR - given the sliding window support is not there yet. |
Awesome, any change we can get a bfloat ctranslate2 edition since the model is originally in bfloat16? that way we can use quantizations at run time other than int8? |
Speaking of RAG. My other posts have been inquiring about getting ctranslate2 to work with the "instructor" class of embedding models like instructor-xl, for example. I'm being serious here, since you successfully converted Mistral by modifying the ctranslate2 scripts, I will actually pay you (or anyone) if they either modify the ctranslate2 codebase or customize the scripts for me personally. This is very important to me, so hit me up if you want to discuss. I'd be happy to share my credentials, law firm website, or whatever it takes so we can do this and make payment remotely...Thanks. |
we second this, althought we are focussed on healthcare i..e the pay part. ctranslate2 is awesome. |
Let's do this, we'll split the cost 50/50 for whatever freelance programmer actually does it. We'll need to discuss the amount of time and first of course. ;-) |
|
confirmed. we are also looking into fine tuning of this model, althought it does not need very much. from our tests this model works the best out of the box vanilla with a variety of tests we have dor our use case. |
I agree, and even though it's a resource hog (relative to other embedding models) it's worth it IMHO. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
I've just noticed that it performs significantly better when I use it. Not sure why exactly, I know that different models perform differently depending on the type of text being fed it, but that's just what I've noticed. Any interest? |
|
will check out the leaderboard and runs some tests thx. |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
I'm sorry, are you saying that bge-en-large-1.5 allows you to enter instructions like instructor-xl does? |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@winstxnhdw do you have the use case to test #1528 it would require passing a very long prompt ( > 4096, maybe double this) and see if it outputs consistent completion. |
|
Yeah, easily but I am really busy this week. I can maybe test something this weekend. Will update. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
|
I closed #1528 and worked with @minhthuc2502 on #1524. still WIP, not good so far. |
|
We just merged #1524 great team work with @minhthuc2502 |


Hi,
Would it be possible to add support for "mistralai/Mistral-7B-Instruct-v0.1" model?
The text was updated successfully, but these errors were encountered: