Skip to content

Applied LLM engineering for developers: building language models from the ground up with PyTorch, focusing on tokenization, embeddings, and sequential modeling.

Notifications You must be signed in to change notification settings

aadharshbalaji/llm-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMs from Scratch

This repository is a hands-on exploration of building Large Language Models (LLMs) from scratch using Python and PyTorch. The goal is to understand the fundamentals of language modeling, tokenization, embeddings, and transformer architectures through practical coding exercises.

Repository Structure


llms-from-scratch/
├── data/               
│   ├── the-verdict.txt                 # Raw text data
├── notebooks/                          # Jupyter notebooks with step-by-step experiments
│   ├── 01 - Basic NN - Revision
│       ├──--── 01_neural_network_basics.ipynb
│   ├── 02 - Working with Text Data
│       ├──--── 01_tokenization_learning_module.ipynb
│       ├──--── 02_main_dataloader.ipynb
│   ├── 03 - Attention Mechanisms
│       ├──--── 01_attention_basics.ipynb
├── .gitignore
├── README.md           # This file
├── requirements.txt 

Getting Started

  1. Clone the repository:
git clone https://github.com/your-username/llms-from-scratch.git
cd llms-from-scratch
  1. Install dependencies:
pip install -r requirements.txt
  1. Open notebooks in notebooks/ to explore step-by-step.

Contributing

Feel free to fork, experiment, and create pull requests. Open discussions for ideas and improvements are welcome.

About

Applied LLM engineering for developers: building language models from the ground up with PyTorch, focusing on tokenization, embeddings, and sequential modeling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published