Write together with a small LLM in the terminal, running locally on your computer. Inference based on llama2.c by Andrej Karpathy.
Press TAB to let the model complete a token, or hold it to complete continiously. Edit the text at any point using regular controls.
write_together_demo.mp4
Note: This program only works on Linux, due to using readline to manipulate text in the terminal.
You can find the pre-built executable in the Releases. To compile it, run the command:
gcc run.c -o run -lreadline -lm -Ofast-Ofast, should speed up the program without causing problems, but you can replace it with -O3 to be safe. See the original llama2.c repo for more details.
Download a model (https://huggingface.co/karpathy/tinyllamas) or train your own (more fun this way) and start:
./run model.bin -z tokenizer.binAdjust the parameters model.bin and tokenizer.bin to match the names of the files you downloaded.
- Smarter caching that only discards changed tokens, rather than all of them
- A way to see different probable tokens
- Windows compatibility
- Would be cool to have this into a physical device