Home

Shisa Project Documentation

Here we publish ongoing notes for Shisa, our attempt to efficiently train better open-source performing JA/EN bilingual general-purpose chat models.

We are currently in the process of publishing more, so check back soon

2024-05 shisa-v1 ablations - WIP - the Llama 3 ones are quite good
shisa-gamma-7b-v1 - a non-tokenizer extended model based off of the Japanese Stable LM Base Gamma 7B, that ended up being quite strong (sitting at the top of JP open models on the Nejumi LLM Leaderboard for months)
shisa-7b-v1 - first version of our fine-tuned general chat model
- shisa-7B-v1-AWQ - AWQ quants courtesy of @TheBloke
- shisa-7B-v1-GPTQ - GPTQ quants courtesy of @TheBloke
- shisa-7b-v1-gguf - GGUF quants courtesy of @mmnga who had a fancy custom convert.py that works around llama.cpp's BPE issues
shisa-base-7b-v1 - our base model w/ an extended tokenizer and additional JA pre-training
shisa-pretrain-en-ja-v1 - our pre-training data set
ultra-orca-boros-en-ja - a synthetically generated, machine-translated, programmatically validated JA/EN fine-tuning dataset
shisa-en-ja-dpo-v1 - Small subset of DPO pairs from ultrafeedback, along with JA DPO pairs using GPT-4 generated items as the chosen value, and outputs from our preliminary 7b model as the rejected values
Shisa repository - this includes our translation, dataset generation, training, and evaluation code

2024-03-21 Sakana.ai used our shisa-gamma-7b-v1 model as the base JA LLM for their exciting Evolutionary Model Merge work and all their "Evo" models. Be sure to check out the paper, Evolutionary Optimization of Model Merging Recipes
2024-01-15 Congrats to Lightblue on their recent Karasu/Qarasu model release - the 7B builds off of shisa-7b-v1 and both incorporate a subset of our ultra-orca-boros training set. See also Peter Devine's recent writeups (EN/JA)
2023-12-09 npaka shares instructions on getting Shisa 7B running on Google Colab (JA): Google Colab で Shisa 7B を試す
2023-12-04 A Review of Public Japanese Training Sets - our review of existing datasets we found and our analysis of the data quality of some of the most commonly used instruct tuning datasets