Skip to content

Latest commit

 

History

History
259 lines (155 loc) · 41 KB

original.md

File metadata and controls

259 lines (155 loc) · 41 KB

Original image from Wikimedia Commons, licensed under the Creative Commons Attribution-Share Alike 3.0 Unported, 2.5 Generic, 2.0 Generic and 1.0 Generic license.

Former titles:

  • "Open-source chatbot companions"
  • "Crataco's guide to locally runnable, unfiltered open-source AI."

Crataco's Spaghetti Guide (for Legacy Open-Source Chatbots)

As seen on Reddit.

Excluding small updates, I officially stopped updating this on the 2nd of April, 2023. It's not completely obsolete, but you're unlikely to get any up-to-date information from here, as the AI world is developing rapidly.

INTRODUCTION

This post will list AI frontends and models that can be downloaded and run on your own PC. It includes information on frontends, models, character creation, model settings, how to contribute, and interesting findings.

These text generation AI can be open-source alternatives of the following you may be familiar with:

  • Replika, Character.AI, Kajiwoto, Anima, Chai, SimSimi, Paradot, or Mitsuku (Kuki), if you're a chat user
  • AI Dungeon and NovelAI, if you prefer an adventure mode and/or co-writing
  • GPT-3, not by quality alone (unless LLaMA 13B's initial evaluation tests are to be believed), but by the flexibility it offers

You may also have an easier time understanding the vocabulary of this guide if you know about Stable Diffusion or NovelAI already.

Unlike my second guide, which I claim is a softer introduction, this one is a crash course, a barrage of information all on a single page.

FRONTENDS

A frontend is the interface you use. There are three of them officially supported by this guide: KoboldAI/KoboldCpp, TavernAI, and Oobabooga, with several others (including Project Akiko and LiteVN) that I've yet to test thoroughly. I refer to them as "standalone" if you can use them on their own, or "gateway" if they depend on another.

If your computer can't run the model you want to use, you can use Google Colab, an online service which allows you to use Google's cloud computers for free for a while, making them perfect for trying out bigger models. You usually still have an option to save conversations on your Google Drive and download them for offline use.

STANDALONE FRONTENDS

  • Oobabooga's Text Generation Web UI - Standalone frontend with bleeding-edge model support and a brutalist UI. You'll feel right at home if you're familiar with AUTO1111's Stable Diffusion UI, but it's not very user-friendly. You can run it locally or on Colab.

  • KoboldAI - Standalone frontend with a professional exterior and a sweet community. KAI was created as an AI co-writer similar to NovelAI. They later included a chat setting, and more recently a Discord-esque chat UI as pictured. While it's more user-friendly than Oobabooga's, I find the "pretty" chat mode to be pretty buggy, so if you intend on using it for chatting, combine it with a gateway frontend (I recommend TavernAI). You can run it locally or on Colab.

  • KoboldCpp - A sister project of KoboldAI that uses a different backend compatible with llama.cpp (and related) models, which are friendlier towards those who don't have the hardware to run Oobabooga/KoboldAI's "main" models, with similar-quality results.

GATEWAY FRONTENDS

  • TavernAI - Gateway frontend with a pretty, friendly chat-optimized interface. I feel its main advantage is its "anchors" that are added to every message, which will help avoid standalone KAI's message-shortening pitfalls on non-Pygmalion models. I tried it again recently and learned that it even includes an online character database. You can run it locally or on Colab.

  • SillyTavern - A fork of TavernAI "for nerds." It's a huge rewrite with extensions alongside a whole laundry list of new features.

  • Project Akiko - [official screenshot] Both standalone(?) and gateway frontend, with its interface taking inspiration from visual novels. Just discovered this one via the KoboldAI Discord server. It's a very recent project that seems to go above and beyond for an open-source chatbot frontend, with planned support for Live2D, text-to-speech, multi-chat, and Discord. It reminds me of miku.gg, but with a more open ecosystem. I might bump this up as "supported" once I give it a shot, but it's too early to tell what direction the project's going to go in.

  • Laika's LiteVN UI for KoboldAI - [official screenshot] Gateway frontend with a VN-esque interface. Compatible with KoboldAI. Also discovered this one via KoboldAI's Discord server. Much like Miku and Akiko, it takes a more visual novel-esque approach to talking to your AI companion. Pretty cool! Its source code is available to download and it's developed in Godot (think Unity but open-source), but the ready-to-run download looks to be Windows-only at the moment. I'll try it out.

There are other projects (such as Gravital, Eliza, rwkv_chatbot, ChatRWKV), and miku.gg (as mentioned earlier), but they're beyond the scope of this post due to their lack of community, complicated setup, or dependence on closed-source infrastructure.

MODELS

A model is the AI brain that generates your responses. Each model has different knowledge based on what data they were trained on. Bigger models are usually smarter, but require better hardware.

My recommendations:

  • If you want a generic, flexible model for an AI "notebook" (like Talk to Transformer or TextSynth), LLaMA and possibly OpenLLaMA are currently the best in the business. Pythia Deduped has more choices in model size, but are considered undertrained by today's standards.

  • If you want a chatting partner (like Replika or CharacterAI), I've had amazing results with gpt4-x-alpaca, which follows character descriptions very well, but Pygmalion (NSFW) is popular and has a bigger community. Erebus (NSFW) is an old favorite.

  • If you want a no-nonsense AI assistant (like ChatGPT), Vicuna 7B and Vicuna 13B are great options. However, there's been new options, like WizardLM 7B (said to be competitive with Vicuna 13B) and WizardVicuna 13B.

  • If you want something trained on storywriting to use as a co-writer (like NovelAI), old-fashioned Janeway and once again Erebus (NSFW) are good options if you're used to their particular workflow. If you value quality, Metharme may be looking into; I personally haven't tried it out yet.

  • If you want a text adventure (like AI Dungeon), Nerys, Skein, and Adventure 1.3B / 2.7B / 6B are classics. But once again I recommend gpt4-x-alpaca with a prompt that begins with "This is a text adventure. User input is preceded by the > symbol." I recall it worked in regular LLaMA, and may also work in Vicuna, but I haven't yet tested it out.

For more clarification, there are different "types" of models. What I say is very simplified:

  • Transformers types are the main used in the two major frontends (KoboldAI and Oobabooga). They have the best compatibility and most series are available (GPT-2, GPT-Neo, BLOOM, Pythia, etc). A 2.7B model would require ~16GB of RAM, for example, or ~8GB of VRAM, but you can split it between your system memory and GPU memory if you have a compatible graphics card.

  • The RWKV type is exclusive to the RWKV model series, but as far as I know it works on Oobabooga's interface and KoboldAI. It can hold more in its memory, and has support for compressing the model down to 8-bit precision, cutting down RAM usage substantially.

  • The ggml type is currently exclusive to llama.cpp, but it has been included in Oobabooga and KoboldCpp. As I mentioned earlier, I recommend these models if you have an average computer. A 4-bit (q4 or q5) 7B model can run under 8GB of RAM, and a 13B (q4 or q5) model can run under 16GB of RAM. If you have a toaster with too little RAM, don't worry; I've also converted smaller Pythia Deduped models to the q5 and q8 formats for use with KoboldCpp here, and the OpenLLaMA project plans on training a 3B model which should run under ~4GB of RAM.

DETAILED INFORMATION

Series Sizes Dataset License My thoughts
GPT-2 124M, 355M, 774M, 1.5B WebText Modified MIT License 2019 - By OpenAI. Original architecture. GPT-2 kickstarted modern AI text generation, and was most notably used for the original AI Dungeon 2. I don't recommend it today due to its small size and small context length (1024 tokens).
GPT-Neo 125M, 350M, 1.3B, 2.7B, 6B, 20B The Pile MIT (125M to 2.7B), Apache-2.0 (6B to 20B) 2021 - By EleutherAI. Original architecture (GPT-Neo for 125M to 2.7B, GPT-J for 6B, GPT-NeoX for 20B). By this time, EleutherAI was a then-new grassroots collective that responded to GPT-3 with their debut model series, GPT-Neo. Outside of LLaMA, NeoX may be the most-supported model architecture.
Fairseq 125M, 355M, 1.3B, 2.7B, 6.7B, 13B Training Data Likely MIT 2022 - By Meta AI. Based on the XGLM architecture. Released in a Transformers-compatible format by the KoboldAI community (correct me if I'm wrong). This picked up steam as an alternative to GPT-Neo, and was included alongside Neo with the NovelAI service. Being used in a commercial service makes me believe it uses a commercial license.
OPT 125M, 350M, 1.3B, 2.7B, 6.7B, 13B, 30B, 66B, 175B Training Data Non-commercial OPT-175B license 2022 - By Meta AI. Original architecture. Succeeds Fairseq, their previous series. It slightly outperforms Neo. It comes second to Neo in terms of the sheer quantity of finetunes and community experience. There's a 175B, but it's only available via request.
RWKV 169M, 430M, 1.5B, 3B, 7B, 14B The Pile, RedPajama (PilePlus) Apache-2.0 2022 - By BlinkDL. Original architecture. A promising series trained on Neo's dataset that intends to be faster and lighter without compromising on quality. Quality-wise I believe it's on par with GPT-Neo, but has more potential with the new RedPajama finetuning project. It's one of the earliest of open-source models to introduce a context length of 4096 or higher.
BLOOM 560M, 1.1B, 1.7B, 3B, 7.1B, 176B BigScienceCorpus BigScience BLOOM RAIL 1.0 2022 - By BigScience. Original architecture. According to evalutation, this model underperforms compared to the rest. I personally don't recommend it, but I've heard some users get good results with it.
GALACTICA 125M, 1.3B, 6.7B, 30B, 120B Training data CC BY-NC 4.0 2022 - By Meta AI. Based on the OPT architecture. I know very little of these models, but I do know they are trained on scientific texts.
Pythia Deduped 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, 12B The Pile (deduplicated) Apache 2.0 2022 - By EleutherAI. Based on the GPT-NeoX architecture. Trained on Neo's dataset, but deduplicated, which means that according to official evaluations it outperforms GPT-Neo and OPT.
LLaMA 7B, 13B, 30B, 65B Training data Unknown (repo says GPL 3.0, but others say non-commercial) 2023 - By Meta AI. Original architecture. Only officially available to researchers, but has been converted and reuploaded everywhere. AI evaluation says 13B is on par with GPT-3 175B, and 7B overtakes even Pythia 12B and OPT 66B.
Cerebras-GPT 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, 13B The Pile Apache 2.0 2023 - By Cerebras. Architecture may be based on "gpt2". According to their description their models use The Pile, but unlike Pythia Deduped it isn't deduplicated. Its smaller models seem to underperform according to evaluations, but I haven't tried it myself.
MPT 1B, 7B RedPajama (1B), Training data (7B) Apache 2.0 (base model) By MosiacML. Its own architecture. Impressively, according to official benchmarks, "MPT-7B matches the quality of LLaMA-7B and outperforms other open source 7B - 20B models on standard academic tasks", and apparently has a context length up to checks notes 84k tokens?! It has yet to come to GGML at the time of writing.
OpenLLaMa 3B, 7B (preview) RedPajama Apache 2.0 2023 - By OpenLM Research, using the RedPajama dataset by Together. Based on the LLaMA architecture. A reproduction of LLaMA intended to be released under a truly-open license. Compatible with llama.cpp as a result.
StableLM Alphas of 3B, 7B currently available; 15B, 30B, 65B, 175B to come later "a new experimental dataset built atop The Pile [...] three times larger" CC BY-SA 4.0 (base model) 2023 - By StabilityAI. Based on the GPT-NeoX architecture. Like RedPajama, I added it to this list for completion, but being a very recent model it's still training and there are no official evaluation results comparing it to the previous in this list. Current evaluation results are underwhelming, with 7B apparently performing around the level of Pythia Deduped 410M. Granted, it should be early in training.
RedPajama-INCITE Early releases of 3B and 7B RedPajama Apache-2.0 2023 - By Together, following the namesake of their released dataset. Their models are based on the GPT-NeoX architecture and are said to outperform Pythia according to their blog post.

SCREENSHOTS

These are screenshots I've taken while testing chat mode under Oobabooga's Text Generation Web UI. They're inconsistent, so I highly recommend you try them yourself.

Generic models:

Model screenshot Settings Comments
BLOOM 1.1B "Naive" preset with rep_penalty of 1.1 Sometimes it's there. Sometimes it's not.
BLOOM 1.7B "Naive" preset with rep_penalty of 1.05 This one actually surprised me, as I rarely ever use BLOOM and doubted its capabilities.
BLOOM 3B "Naive" preset with rep_penalty of 1.02; may need to be between 1.03-1.04 The AI is a bit hit-and-miss, but that's to be expected for a 3B model. I was caught off-guard by its acknowledgement of My Hero Academia.
GPT-Neo 1.3B In screenshot This, too, surprised me; I remember GPT-Neo 1.3B being much worse. This does a decent job at following conversation.
GPT-Neo 2.7B "Naive" preset with rep_penalty of 1.5 Somehow as bad as I remember. It could just be the settings, though.
LLaMA 7B (HF) "Naive" preset Naive preset plays it safe, but the responses are short. I notice LLaMA doesn't require repetition penalty as much as other models do; 1.0 is just fine here.
LLaMA 7B (HF) "Sphinx Moth" preset The Sphinx Moth preset is the opposite: the responses are lengthy yet very risk-taking (it hallucinated websites that don't exist, but I just went with it). I might need to find a preset that's kind of in-between.
LLaMA 13B (GPTQ) "Naive" preset Unusual case. I had to use GPTQ and a quantized version of the model to run it on my hardware. I feel like GPTQ gives worse results than HF for LLaMA 7B; maybe it's a settings thing or something? Well, either way, this one does the best job at following the character's introduction prompt (as Alice introduces herself as a fairy). The results are decent, but I feel like "Naive" keeps it too safe.
LLaMA 13B (GPTQ) Custom preset Results are great, but need further tweaking.
LLaMA 13B (GPTQ) Custom preset I've turned up top-p to 1.0. The responses seem longer, but it went off the rails REAL quick.
LLaMA 13B (GPTQ) Custom preset To make up for it, I lowered the temperature to 0.6 and raised the repetition penalty to 1.05. Temperature might need to be lowered some more since it keeps making up new things, but that's to be expected with the little information I've given the AI. (I think top-k 1.0 contributes to longer messages at the cost of making more assumptions?)
Pythia 70M Deduped In screenshot Unusable.
Pythia 160M Deduped In screenshot I might've just gotten lucky with this one.
Pythia 410M Deduped In screenshot I think it's getting better!
Pythia 1B Deduped In screenshot Probably the smallest model I can use that feels like a decent conversational partner. Quality is on par (maybe slightly better than) GPT-Neo 1.3B.
Pythia 1.4B Deduped In screenshot No comment.
Pythia 2.8B Deduped In screenshot No comment.
Pythia 6.9B Deduped In screenshot It may just be me, but I notice a decline in Pythia's performance and start to prefer, say, GPT-J 6B and LLaMA 7B.
Pythia 12B Deduped "Sphinx Moth" preset May be the most impressive results I've ever gotten with an open-source, self-hostable chatbot.

Chat-friendly models:

Model screenshot Finetuned on Settings Comments
ConvoGPT 1.3B GPT-Neo 1.3B "Default" preset Surprisingly good, but may just be luck.
ConvoGPT 2.7B GPT-Neo 2.7B "Default" preset 2.7B is barely an upgrade. I don't think it warrants twice as much RAM usage.
ConvoGPT 6B GPT-J 6B "Default" preset Short responses, but never goes off-topic. I might retest ConvoGPT 6B soon with a better prompt.
ConvoGPT 6B GPT-J 6B "Sphinx Moth" preset Responses are a little longer. High temperature means it also goes off-topic.
gpt4-x-alpaca LLaMA 13B Default preset with temp 0.5 and rep 1.1 Screenshot taken with KoboldCpp (llama.cpp). This may without a doubt be my favorite model for chat mode right now due to just how well it follows characters.
Pygmalion 6B GPT-J 6B Pygmalion preset Experiment 2, the current "main" version. This is the first time I give Pygmalion a proper test, and so far it surprised me.
Pygmalion 6B GPT-J 6B Pygmalion preset Experiment 7, Part 4/10. Pretty good!
MythoMax 13B (q6_K) Llama 2 13B Mirostat mode 2, tau 5, eta 0.1 (temp 0.7, rep pen 1.2) For fun, I've decided to add a model that came out 7 months after Pygmalion 6B (5 months after gpt4-x-alpaca). Frontend was changed to SillyTavern, and currently using the "Roleplay" instruct preset with include names enabled.

Prompts used.

For more information on generation quality and memory usage:

[Model evaluation statistics] (higher is better)

[Memory and performance benchmark]

[Memory usage (Oobabooga's UI)]

The community's recommendations:

  • PygmalionAI's models. They're the closest we've got to an open-source CharacterAI. Their model is still training and relies on user contributions. It has been updated several times and is available in two branches: the main branch and the development branch. They also document their training progress via logbooks.

  • Mr. Seeker's Erebus models, available in sizes as small as 350M for weak hardware, to as big as 20B for cloud computers. It was popular in the KoboldAI community for NSFW writing and even chatting because of its manually-curated NSFW dataset, and thus its better understanding of intimacy compared to generic models. If Pygmalion doesn't work too well for your use case, Erebus is worth trying out instead.

Other ones I've toyed around with:

  • Pythia Deduped is my primary choice if I value coherence and have limited hardware. I've seen good results as early as 1B [screenshot]. Or 410M [screenshot]. Or maybe even 160M [screenshot]? It's trained on a cleaner dataset, and its low-end models are the best I've seen (if you have an old, slow, or exotic computer like a Pi or an Android phone running Termux). It's a generic model, a jack-of-all-trades and master of none, meaning it's not trained on a chat-oriented dataset. It may require a push in the right direction for it to chat how you want it to. But it's usually going to be "smarter" than a Neo model of the same size (6.9B is debatable, as I've had better luck with GPT-J).

  • RWKV-4 is one I'm currently experimenting with. Oobabooga's Web UI recently added mostly-complete support for it, but it hasn't been getting much attention. It but introduces some great features such as bigger context length (memory) and experimental quantization (which saves memory). From my personal testing, performance seems closer to GPT-Neo's.

  • convoGPT (trained on top of GPT-Neo/GPT-J) if I value better conversational knowledge and/or have better hardware. convoGPT 6B is strangely-enough referred to as "litv2-6B-rev2" (with "Lit" being harubaru's novel model) in its config. But yeah, I'd consider it tied with Erebus. It feels like the spiritual successor of the older "convo-6B" model released by the same person, and was used as the base model for Pygmalion. While I'm at it, Lotus 12B may count as part of the same series, if not a successor, but it's reached nowhere near the popularity Pygmalion has.

  • GPT-J 6B Janeway is my previous secondary. It's oddly satisfying using a model trained on literature.

If you don't speak English or instead prefer to speak in your native tongue, Oobabooga's text generation web UI has an extension supporting on-the-fly Google Translate so you can take advantage of the English-focused models. This is a screenshot with Pythia 1.4B Deduped using the Google Translate extension for Japanese. The alternative would be to search for models in your native language (i.e. Spanish or Japanese for starters).

CHAT MODE: CREATE YOUR CHARACTER / BROWSE FOR OTHERS

To create your own characters:

To download other people's characters:

The following is my own guide, and applies to KoboldAI and non-Pygmalion models (LLaMA seems to follow them the best).

There are two primary methods to create characters (let's call her "Alice" for the sake of this example). You type these in the model's "memory" or "context":

  • Everyday language. "Alice is an android woman. She's cheerful and loves video games." It's a universally understood format, and Pygmalion models prefer them, referring to it as the character's "persona." - [Example on KoboldAI]

  • Programming-style syntax (W++). Most models trained on The Pile (GPT-Neo, likely OPT and Pythia too) are said to follow it better than everyday language. - [Example on KoboldAI]

To learn more about W++, the KoboldAI GitHub has a write-up on it, and if you aren't using United's built-in W++ editor, Noli provides an online interface you can use to easily create characters with said syntax.

After creating the character, you'll have to provide an example conversation. This means simulating a conversation beforehand, with you talking with and for the character, giving you slightly more control in advance on how you want your character to talk, how lengthy you want their sentences to be, etc. You may be lucky, but I cannot stress this enough -- writing a short example conversation is important if you want the AI to have an idea on how you want your companion to talk.

Also, as an extra precautionary measure, I like to add this into the context (AI's memory):

This is a conversation between you and Alice.

or, if you prefer to use your own chat name:

This is a conversation between and Alice.

If the AI makes too much assumptions on your behalf (i.e. talking about meeting you at school, working for a company), I'd imagine it'd help to add something like:

This conversation is the first time you and Alice meet.

Or something a little more IRC-like, but I haven't done any testing:

You entered the chat.

Alice entered the chat.

MODEL SETTINGS

You can influence how the models generate by changing your settings. For well-supported models, try the defaults. If not the defaults, try the presets. For LLaMA, use r/LocalLLaMA's guide.

This guide focuses on the settings present in KoboldAI and proxy frontends. They're also available in the Text Generation UI, but instead go by their Huggingface codenames.

  • "Temperature" (temperature), which affects how creative the AI's responses are. If the AI plays it too safe for you, turn it up. If the AI is too chaotic, turn it down. On KoboldAI, I find 0.5-0.7 a good balance for GPT-Neo and OPT. On Oobabooga's, I was able to get good results as high as 2.0 using larger Pythia models just as long as top-k is enabled (without it I get word salad).

  • "Repetition penalty" (repetition_penalty). If you find the AI repeating you or itself, turn it up and regenerate. If the AI starts spouting random garbage like a run-on sentence, turn it down and regenerate. Sometimes, you may want to tweak it just to give the AI a bit more diversity even if it isn't repeating a sentence. I found 1.5-2.0 to be good for GPT-Neo 2.7B, 1.2 to be good for GPT-J, 1.1 to be good for OPT, and smaller values (between 1.0 to 1.1) to be good for Pythia Deduped and LLaMA.

  • "Context Tokens" (KoboldAI) refers to how much of the conversation ("tokens") the AI will remember. A token is about ~4 characters. The max is 2,048 (about ~8,000-9,000 characters) for all models (excluding GPT-2 and the latest RWKV models), and while you can increase the maximum tokens beyond that, it is unsupported and will likely break.

  • "Output Length" (max_new_tokens) is the upper limit of how long the AI's response can be, in tokens.

  • "Gens Per Action" (KoboldAI) is how many messages the AI will generate, which you can then choose between. Much like CAI's "swiping" feature.

  • "Typical sampling" (typical_p). Definitely not the most talked-about setting, but my favorite. My understanding is that with the setting below 1, the AI chooses less-likely words, which keep its sentences diverse and less predictable. It gives me a better illusion of "intelligence" by making a slight impact into the AI's vocabulary. While the research paper recommends a value of 0.2, I find 0.9/0.95 being good enough for most models I've used.

  • Others (such as top-p, top-k, top-a, and tail-free sampling) can also influence your output, but are difficult to explain. These settings all change which words the AI would predict next, I should think.

After doing quite a bit of research, PygmalionAI actually has some documentation and tells you how you can contribute your CharacterAI chatlogs. You have the option whether or not you want to include your data as part of their private dataset, or a public dataset. You can help advance the effort this way and make their Pygmalion models functionally close to CharacterAI as it is now (minus the "Chat Errors"), even if the quality isn't perfectly on par.

If you have some Python knowledge, you could always contribute to the projects I've listed above, and if you don't, you could always report bugs via its GitHub. You could also participate in Kobold's Discord and, if you're somewhat knowledgeable, you could help people out, or report your results when investigating new models, like the new Pythia or Pygmalion models.

And lastly, if you know how to finetune, have a dataset and the right hardware, you can contribute your own finetunes to Hugging Face.

SPECIAL THANKS!

I've created this section to give a shout-out to those who I feel my post wouldn't have been the same without:

  • u/DarthReplicant for laying the foundation of open-source chatbots by creating Project Replikant

  • u/mrseeker for active contribution to the open-source AI community by openly releasing Janeway, Nerys, Erebus, etc.

  • u/aid_throwaway for creating KoboldAI, u/henk717 (former developer of AI Dungeon 2 Unleashed) for maintaining it, and the rest of the KoboldAI developers and community for keeping the project alive and up-to-date

  • u/bo_peng for developing and training RWKV, which motivated me to add a "Special Thanks" section

  • u/Udongeein for training and releasing c1-6B, convo-6B, and the convoGPT model series

  • 0x000011b (not sure of their Reddit) for developing the CharacterAI Dumper and for playing a big part in the Pygmalion project

That's about it. Thanks for reading!