This is the D version of llama2.c by Andrej Karpathy. It runs inference for the llama2 model architecture recently published by Meta.
Initial code was generated by ctod tool and saved as ctod_initial.d
Some small manual adjustments:
- added cast(float*) to calloc and mmap
- because of lack clock_gettime on Darwin OS, it was changed with MonoTime from core.time
- commented out pragmas for OpenMP
To build inference:
dub build -b=release
To run example:
./llama2_d stories15M.bin -i "your_prompt"
Tested on:
- macOS (M1)
- Linux
- Windows
- Make code more iDiomatic
- Improve performance
- Add Windows support (port win.h/win.c files from original repo)
- Parallelize the code with std.parallel and SIMD
Any form of contribution is welcome. Feel free to open an issue or create a pull request. If you are contributing optimizations, please provide benchmarks and/or performance comparisons as well as the code to reproduce them.
- Andrej Karpathy for the original llama2.c implementation
- Dennis Korpel for great ctod tool
- cgbur for ideas for optimizations and readme structure