This is a nanoGPT-like implementation of llama3 architecture model from scratch using apple's mlx python library. Most of the things are implemented from scratch and no nn module used. It will also work if you just swap in numpy.
- Install packages.
All we need is just mlx for our model.
pip install mlxHowever, we will need these packages for converting pytorch weight and loading tokenization
pip install numpy torch llama_models-
Download the model. I have only tested the
Llama3.2-1B. You can download it from https://www.llama.com/llama-downloads/ -
Update the weight, param and tokenizer paths in
main.pyto your download destination.
python main.pyYou can directly specify your prompt, temperature, topk, etc. in main.py
playground.ipynb is just me studying and experimenting with the model components. It's kind of documenting the learning journey, and I think will be helpful for somebody who is trying to start doing something similar. So I committed it here.
Architecture understanding
Implementation references