LLM-Tokenizer

Introduction

My personal implementation of an LLM Tokenizer.

As of now I'm not entirely sure what the scope of this repo will be, I'll follow the lecture and will consider where to go next afterwards.

Using Python version 3.11.6

This project was mainly inspired by the great video lectures on Neural Networks by Andrej Karpathy.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. arXiv preprint arXiv:1906.05231. https://doi.org/10.48550/arXiv.1906.05231
Touvron, H., Martin, L., Stone, K., ... & Scialom, T. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src		src
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
get_data.sh		get_data.sh
main.py		main.py
requirements.txt		requirements.txt