llama2.d

This is the D version of llama2.c by Andrej Karpathy. It runs inference for the llama2 model architecture recently published by Meta.

Initial code was generated by ctod tool and saved as ctod_initial.d

Some small manual adjustments:

added cast(float*) to calloc and mmap
because of lack clock_gettime on Darwin OS, it was changed with MonoTime from core.time
commented out pragmas for OpenMP

To build inference:

dub build -b=release

To run example:

./llama2_d stories15M.bin -i "your_prompt"

Supported platforms

Tested on:

macOS (M1)
Linux
Windows

Todo

Make code more iDiomatic
Improve performance
Add Windows support (port win.h/win.c files from original repo)
Parallelize the code with std.parallel and SIMD

Contributing

Any form of contribution is welcome. Feel free to open an issue or create a pull request. If you are contributing optimizations, please provide benchmarks and/or performance comparisons as well as the code to reproduce them.

Credits

Andrej Karpathy for the original llama2.c implementation
Dennis Korpel for great ctod tool
cgbur for ideas for optimizations and readme structure

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
extra_versions		extra_versions
source		source
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dub.json		dub.json
stories15M.bin		stories15M.bin
tokenizer.bin		tokenizer.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama2.d

Supported platforms

Todo

Contributing

Credits

About

Releases

Packages

Languages

License

cyrusmsk/llama2.d

Folders and files

Latest commit

History

Repository files navigation

llama2.d

Supported platforms

Todo

Contributing

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages