LLaMa MPS fork

This is a fork of https://github.com/markasoftware/llama-cpu which is a fork of https://github.com/facebookresearch/llama. The goal of this fork is to use GPU acceleration on Apple M1/M2 devices.

LLaMa-adapter support has been added in a separate branch!
Multi-modal LLaMa-adapter support has been added in a separate branch!
Llama v2 support has been added in a separate branch

Please check the original repos for installation instructions. After you're done, run this torchrun example.py --ckpt_dir ../7B --tokenizer_path ../tokenizer.model --max_batch_size=1 with correct paths to the models. You might need to set up env. variable PYTORCH_ENABLE_MPS_FALLBACK=1

This fork is experimental, currently at the stage which allows to run a full non-quantized model with MPS.

After the model is loaded, inference for max_gen_len=20 takes about 3 seconds on a 24-core M1 Max vs 12+ minutes on a CPU (running on a single core). For 7B model, it always goes above 32gb of RAM, writing 2-4gb to ssd (swap) on every launch, but consumes less memory after it is loaded.

If you notice, that the output of the model has empty/repetitive text, try using a fresh version of python/pytorch. For me it was giving bad outputs with Python 3.8.15 and pytorch 1.12.1. After trying it with python3.10 and torch 2.1.0.dev20230309 the model worked as expected and produced high-quality outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
llama		llama
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FAQ.md		FAQ.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
download.sh		download.sh
example.py		example.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama

llama

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

FAQ.md

FAQ.md

LICENSE

LICENSE

MODEL_CARD.md

MODEL_CARD.md

README.md

README.md

download.sh

download.sh

example.py

example.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

LLaMa MPS fork

About

Languages

License

remixer-dec/llama-mps

Folders and files

Latest commit

History

Repository files navigation

LLaMa MPS fork

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages