-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Leonard edited this page Jul 4, 2024
·
7 revisions
Here we publish ongoing notes for Shisa, our attempt to efficiently train better open-source performing JA/EN bilingual general-purpose chat models.
We are currently in the process of publishing more, so check back soon
- 2024-05 shisa-v1 ablations - WIP - the Llama 3 ones are quite good
- shisa-gamma-7b-v1 - a non-tokenizer extended model based off of the Japanese Stable LM Base Gamma 7B, that ended up being quite strong (sitting at the top of JP open models on the Nejumi LLM Leaderboard for months)
-
shisa-7b-v1 - first version of our fine-tuned general chat model
- shisa-7B-v1-AWQ - AWQ quants courtesy of @TheBloke
- shisa-7B-v1-GPTQ - GPTQ quants courtesy of @TheBloke
- shisa-7b-v1-gguf - GGUF quants courtesy of @mmnga who had a fancy custom convert.py that works around llama.cpp's BPE issues
- shisa-base-7b-v1 - our base model w/ an extended tokenizer and additional JA pre-training
- shisa-pretrain-en-ja-v1 - our pre-training data set
- ultra-orca-boros-en-ja - a synthetically generated, machine-translated, programmatically validated JA/EN fine-tuning dataset
- shisa-en-ja-dpo-v1 - Small subset of DPO pairs from ultrafeedback, along with JA DPO pairs using GPT-4 generated items as the chosen value, and outputs from our preliminary 7b model as the rejected values
- Shisa repository - this includes our translation, dataset generation, training, and evaluation code
- 2024-03-21 Sakana.ai used our shisa-gamma-7b-v1 model as the base JA LLM for their exciting Evolutionary Model Merge work and all their "Evo" models. Be sure to check out the paper, Evolutionary Optimization of Model Merging Recipes
- 2024-01-15 Congrats to Lightblue on their recent Karasu/Qarasu model release - the 7B builds off of shisa-7b-v1 and both incorporate a subset of our ultra-orca-boros training set. See also Peter Devine's recent writeups (EN/JA)
- 2023-12-09 npaka shares instructions on getting Shisa 7B running on Google Colab (JA): Google Colab で Shisa 7B を試す
- 2023-12-04 A Review of Public Japanese Training Sets - our review of existing datasets we found and our analysis of the data quality of some of the most commonly used instruct tuning datasets